Unix Administration

May 28, 2016 | Author: Rahul Singh | Category: Types, Instruction manuals

Share Embed Donate

Report this link

Short Description

Download Unix Administration...

Description

UNIX Administration Course Day 1: Part 1:

Introduction to the course. Introduction to UNIX. History of UNIX and key features. Comparison with other OSs.

Part 2:

The basics: files, UNIX shells, editors, commands. Regular Expressions and Metacharacters in Shells.

Part 3:

File ownership and access permissions. Online help (man pages, etc.)

Day 2: Part 1:

System identity (system name, IP address, etc.) Software: vendor, commercial, shareware, freeware (eg. GNU). Hardware features: auto-detection, etc. UNIX Characteristics: integration, stability, reliability, security, scalability, performance.

Part 2:

Shell scripts.

Part 3:

System monitoring tools and tasks.

Part 4:

Further shell scripts. Application development tools: compilers, debuggers, GUI toolkits, high-level APIs.

Day 3: Part 1:

Installing an OS and software: inst, swmgr. OS updates, patches, management issues.

Part 2:

Organising a network with a server. NFS. Quotas. Installing/removing internal/external hardware. SGI OS/software/hardware installation. Network setup.

Part 3:

Daily system administration tasks, eg. data backup. System bootup and shutdown, events, daemons.

Part 4:

Security/Access control: the law, firewalls, ftp. Internet access: relevant files and services. Course summary.

Part 5:

Exploring administration issues, security, hacking, responsibility, end-user support, the law (discussion). Indy/Indy attack/defense using IRIX 5.3 vs. IRIX 6.5 (two groups of 3 or 4 each).

Figures

Day 1: Part 1:

Introduction to the course. Introduction to UNIX. History Of UNIX and key features. Comparison with other OSs.

Introduction to UNIX and the Course. The UNIX operating system (OS) is widely used around the world, eg.   

The backbone of the Internet relies on UNIX-based systems and services, as do the systems used by most Internet Service Providers (ISPs). Major aspects of everyday life are managed using UNIX-based systems, eg. banks, booking systems, company databases, medical records, etc. Other 'behind the scenes' uses concern data-intensive tasks, eg. art, design, industrial design, CAD and computer animation to real-time 3D graphics, virtual reality, visual simulation & training, data visualisation, database management, transaction processing, scientific research, military applications, computational challenges, medical modeling, entertainment and games, film/video special effects, live on-air broadcast effects, space exploration, etc.

As an OS, UNIX is not often talked about in the media, perhaps because there is no single large company such as Microsoft to which one can point at and say, "There's the company in charge of UNIX." Most public talk is of Microsoft, Bill gates, Intel, PCs and other more visible aspects of the computing arena, partly because of the home-based presence of PCs and the rise of the Internet in the public eye. This is ironic because OSs like MS-DOS, Win3.1, Win95 and WinNT all draw many of their basic features from UNIX, though they lack UNIX's sophistication and power, mainly because they lack so many key features and a lengthy development history. In reality, a great deal of the everyday computing world relies on UNIX-based systems running on computers from a wide variety of vendors such as Compaq (Digital Equipment Corporation, or DEC), Hewlett Packard (HP), International Business Machines (IBM), Intel, SGI (was Silicon Graphics Inc., now just 'SGI'), Siemens Nixdorf, Sun Microsystems (Sun), etc. In recent years, many companies which previously relied on DOS or Windows have begun to realise that UNIX is increasingly important to their business, mainly because of what UNIX has to offer and why, eg. portability, security, reliability, etc. As demands for handling data grow, and companies embrace new methods of manipulating data (eg. data mining and visualisation), the need for systems that can handle these problems forces companies to look at solutions that are beyond the Wintel platform in performance, scalability and power. Oil companies such as Texaco [1] and Chevron [2] are typical organisations which already use UNIX systems extensively because of their data-intensive tasks and a need for extreme reliability and scalability. As costs have come down, along with changes in the types of available UNIX system (newer low-end designs, eg. Ultra5, O2, etc.), small and medium-sized companies are

looking towards UNIX solutions to solve their problems. Even individuals now find that older 2nd-hand UNIX systems have significant advantages over modern Wintel solutions, and many companies/organisations have adopted this approach too [3].

This course serves as an introduction to UNIX, its history, features, operation, use and services, applications, typical administration tasks, and relevant related topics such as the Internet, security and the Law. SGI's version of UNIX, called IRIX, is used as an example UNIX OS. The network of SGI Indys and an SGI Challenge S server I admin is used as an example UNIX hardware platform.

The course lasts three days, each day consisting of a one hour lecture followed by a two hour practical session in the morning, and then a three hour practical session in the afternoon; the only exceptions to this are Day 1 which begins with a two hour lecture, and Day 3 which has a 1 hour afternoon lecture. Detailed notes are provided for all areas covered in the lectures and the practical sessions. With new topics introduced step-by-step, the practical sessions enable first-hand familiarity with the topics covered in the lectures. As one might expect of an OS which has a vast range of features, capabilities and uses, it is not possible to cover everything about UNIX in three days, especially the more advanced topics such as kernel tuning which most administrators rarely have to deal with. Today, modern UNIX hardware and software designs allow even very large systems with, for example, 64 processors to be fully setup at the OS level in little more than an hour [4]. Hence, the course is based on the author's experience of what a typical UNIX user and administrator (admin) has to deal with, rather than attempting to present a highly compressed 'Grand Description of Everything' which simply isn't necessary to enable an admin to perform real-world system administration on a daily basis. For example, the precise nature and function of the Sendmail email system on any flavour of UNIX is not immediately easy to understand; looking at the various files and how Sendmail works can be confusing. However, in the author's experience, due to the way UNIX is designed, even a default OS installation without any further modification is sufficient to provide users with a fully functional email service [5], a fact which shouldn't be of any great surprise since email is a built-in aspect of any UNIX OS. Thus, the presence of email as a fundamental feature of UNIX is explained, but configuring and customising Sendmail is not.

History of UNIX Key: BTL = Bell Telephone Laboratories GE = General Electric WE = Western Electric MIT = Massachusetts Institute of Technology BSD = Berkeley Standard Domain

Summary History: 1957: BTL creates the BESYS OS for internal use. 1964: BTL needs a new OS, develops Multics with GE and MIT. 1969: UNICS project started at BTL and MIT; OS written using the B language. 1970: UNICS project well under way; anonymously renamed to UNIX. 1971: UNIX book published. 60 commands listed. 1972: C language completed (a rewritten form of B). Pipe concept invented. 1973: UNIX used on 16 sites. Kernel rewritten in C. UNIX spreads rapidly. 1974: Work spreads to Berkeley. BSD UNIX is born. 1975: UNIX licensed to universities for free. 1978: Two UNIX styles, though similar and related: System V and BSD. 1980s: Many companies launch their versions of UNIX, including Microsoft.

A push towards cross-platform standards: POSIX/X11/Motif Independent organisations with cross-vendor membership Control future development and standards. IEEE included. 1990s: 64bit versions of UNIX released. Massively scalable systems. Internet springs to life, based on UNIX technologies. Further Standardisation efforts (OpenGL, UNIX95, UNIX98).

Detailed History. UNIX is now nearly 40 years old. It began life in 1969 as a combined project run by BTL, GE and MIT, initially created and managed by Ken Thompson and Dennis Ritchie [6]. The goal was to develop an operating system for a large computer which could support hundreds of simultaneous users. The very early phase actually started at BTL in 1957 when work began on what was to become BESYS, an OS developed by BTL for their internal needs. In 1964, BTL started on the third generation of their computing resources. They needed a new operating system and so initiated the MULTICS (MULTIplexed operating and Computing System) project in late 1964, a combined research programme between BTL, GE and MIT. Due to differing design goals between the three groups, Bell pulled out of the project in 1969, leaving personnel in Bell's Computing Science and Research Center with no usable computing environment. As a response to this move, Ken Thompson and Dennis Ritchie offered to design a new OS for BTL, using a PDP-7 computer which was available at the time. Early work was done in a language designed for writing compilers and systems programming, called BCPL (Basic Combined Programming Language). BCPL was quickly simplified and revised to produce a better language called B. By the end of 1969 an early version of the OS was completed; a pun at previous work on Multics, it was named UNICS (UNIplexed operating and Computing System) - an "emasculated Multics". UNICS included a primitive kernel, an editor, assembler, a simple shell command interpreter and basic command utilities such as rm, cat and cp. In 1970, extra funding arose from BTL's internal use of UNICS for patent processing; as a result, the researchers obtained a DEC PDP-11/20 for further work (24K RAM). At that time, the OS used 12K, with the remaining 12K used for user programs and a RAM disk (file size limit was 64K, disk size limit was 512K). BTL's Patent Department then took over the project, providing funding for a newer machine, namely a PDP-11/45. By this time, UNICS had been abbreviated to UNIX - nobody knows whose idea it was to change the name (probably just phonetic convenience).

In 1971, a book on UNIX by Thompson and Ritchie described over 60 commands, including:

             

b

(compile a B program)

chdir

(change working directory)

chmod

(change file access permissions)

chown

(change file ownership)

cp

(copy a file)

ls

(list directory contents)

who

(show who is on the system)

Even at this stage, fundamentally important aspects of UNIX were already firmly in place as core features of the overall OS, eg. file ownership and file access permissions. Today, other operating systems such as WindowsNT do not have these features as a rigorously integrated aspect of the core OS design, resulting in a plethora of overhead issues concerning security, file management, user access control and administration. These features, which are very important to modern computing environments, are either added as convoluted bolt-ons to other OSs or are totally nonexistent (NT does have a concept of file ownership, but it isn't implemented very well; regrettably, much of the advice given by people from VMS to Microsoft on how to implement such features was ignored). In 1972, Ritchie and Thompson rewrote B to create a new language called C. Around this time, Thompson invented the 'pipe' - a standard mechanism for allowing the output of one program or process to be used as the input for another. This became the foundation of the future UNIX OS development philosophy: write programs which do one thing and do it well; write programs which can work together and cooperate using pipes; write programs which support text streams because text is a 'universal interface' [6]. By 1973, UNIX had spread to sixteen sites, all within AT&T and WE. First made public at a conference in October that year, within six months the number of sites using UNIX had tripled. Following a publication of a version of UNIX in 'Communications of the ACM' in July 1974, requests for the OS began to rapidly escalate. Crucially at this time, the fundamentals of C were complete and much of UNIX's 11000 lines of code were rewritten in C - this was a major breakthrough in operating systems design: it meant that the OS could be used on virtually any computer platform since C was hardware independent. In late 1974, Thompson went to University of California at Berkeley to teach for a year. Working with Bill Joy and Chuck Haley, the three developed the 'Berkeley' version of UNIX (named BSD, for Berkeley Software Distribution), the source code of which was widely distributed to

students on campus and beyond, ie. students at Berkeley and elsewhere also worked on improving the OS. BTL incorporated useful improvements as they arose, including some work from a user in the UK. By this time, the use and distribution of UNIX was out of BTL's control, largely because of the work at Berkeley on BSD. Developments to BSD UNIX added the vi editor, C-based shell interpreter, the Sendmail email system, virtual memory, and support for TCP/IP networking technologies (Transmission Control Protocol/Internet Protocol). Again, a service as important as email was now a fundamental part of the OS, eg. the OS uses email as a means of notifying the system administrator of system status, problems, reports, etc. Any installation of UNIX for any platform automatically includes email; by complete contrast, email is not a part of Windows3.1, Win95, Win98 or WinNT email for these OSs must be added separately (eg. Pegasus Mail), sometimes causing problems which would not otherwise be present. In 1975, a further revision of UNIX known as the Fifth Edition was released and licensed to universities for free. After the release of the Seventh Edition in 1978, the divergence of UNIX development along two separate but related paths became clear: System V (BTL) and BSD (Berkeley). BTL and Sun combined to create System V Release 4 (SVR4) which brought together System V with large parts of BSD. For a while, SVR4 was the more rigidly controlled, commercial and properly supported (compared to BSD on its own), though important work occurred in both versions and both continued to be alike in many ways. Fearing Sun's possible domination, many other vendors formed the Open Software Foundation (OSF) to further work on BSD and other variants. Note that in 1979, a typical UNIX kernel was still only 40K. Because of a legal decree which prevented AT&T from selling the work of BTL, AT&T allowed UNIX to be widely distributed via licensing schemas at minimal or zero cost. The first genuine UNIX vendor, Interactive Systems Corporation, started selling UNIX systems for automating office work. Meanwhile, the work at AT&T (various internal design groups) was combined, then taken over by WE, which became UNIX System Laboratories (now owned by Novell). Later releases included Sytem III and various releases of System V. Today, most popular brands of UNIX are based either on SVR4, BSD, or a combination of both (usually SVR4 with standard enhancements from BSD, which for example describes SGI's IRIX version perfectly). As an aside, there never was a System I since WE feared companies would assume a 'system 1' would be bug-ridden and so would wait for a later release (or purchase BSD instead!). It's worth noting the influence from the superb research effort at Xerox Parc, which was working on networking technologies, electronic mail systems and graphical user interfaces, including the proverbial 'mouse'. The Apple Mac arose directly from the efforts of Xerox Parc which, incredibly and much against the wishes of many Xerox Parc employees, gave free demonstrations to people such as Steve Jobs (founder of Apple) and sold their ideas for next to nothing ($50000). This was perhaps the biggest financial give-away in history [7]. One reason why so many different names for UNIX emerged over the years was the practice of AT&T to license the UNIX software, but not the UNIX name itself. The various flavours of UNIX may have different names (SunOS, Solaris, Ultrix, AIX, Xenix, UnixWare, IRIX, Digital UNIX, HP-UX, OpenBSD, FreeBSD, Linux, etc.) but in general the differences between them

are minimal. Someone who learns a particular vendor's version of UNIX (eg. Sun's Solaris) will easily be able to adapt to a different version from another vendor (eg. DEC's Digital UNIX). Most differences merely concern the names and/or locations of particular files, as opposed to any core underlying aspect of the OS. Further enhancements to UNIX included compilation management systems such as make and Imake (allowing for a single source code release to be compiled on any UNIX platform) and support for source code management (SCCS). Services such as telnet for remote communication were also completed, along with ftp for file transfer, and other useful functions. In the early 1980s, Microsoft developed and released its version of UNIX called Xenix (it's a shame this wasn't pushed into the business market instead of DOS). The first 32bit version of UNIX was released at this time. SCO developed UnixWare which is often used today by Intel for publishing performance ratings for its x86-based processors [8]. SGI started IRIX in the early 1980s, combining SVR4 with an advanced GUI. Sun's SunOS sprang to life in 1984, which became widely used in educational institutions. NeXT-Step arrived in 1989 and was hailed as a superb development platform; this was the platform used to develop the game 'Doom', which was then ported to DOS for final release. 'Doom' became one of the most successful and influential PC games of all time and was largely responsible for the rapid demand for better hardware graphics systems amongst home users in the early 1990s - not many people know that it was originally designed on a UNIX system though. Similarly, much of the development work for Quake was done using a 4-processor Digital Alpha system [9]. During the 1980s, developments in standardised graphical user interface elements were introduced (X11 and Motif) along with other major additional features, especially Sun's Networked File System (NFS) which allows multiple file systems, from multiple UNIX machines from different vendors, to be transparently shared and treated as a single file structure. Users see a single coherant file system even though the reality may involve many different systems in different physical locations. By this stage, UNIX's key features had firmly established its place in the computing world, eg. Multi-tasking and multi-user (many independent processes can run at once; many users can use a single system at the same time; a single user can use many systems at the same time). However, in general, the user interface to most UNIX variants was poor: mainly text based. Most vendors began serious GUI development in the early 1980s, especially SGI which has traditionally focused on visual-related markets [10]. From the point of view of a mature operating system, and certainly in the interests of companies and users, there were significant moves in the 1980s and early 1990s to introduce standards which would greatly simplify the cross-platform use of UNIX. These changes, which continue today, include: 



The POSIX standard [6], begun in 1985 and released in 1990: a suite of application programming interface standards which provide for the portability of application source code relating to operating system services, managed by the X/Open group. X11 and Motif: GUI and windowing standards, managed by the X Consortium and OSF.

 





UNIX95, UNIX98: a set of standards and guidelines to help make the various UNIX flavours more coherant and cross-platform. OpenGL: a 3D graphics programming standard originally developed by SGI as GL (Graphics Library), then IrisGL, eventually released as an open standard by SGI as OpenGL and rapidly adopted by all other vendors. Journaled file systems such as SGI's XFS which allow the creation, management and use of very large file systems, eg. multiple terabytes in size, with file sizes from a single byte to millions of terabytes, plus support for real-time and predictable response. EDIT (2008): Linux can now use XFS. Interoperability standards so that UNIX systems can seamlessly operate with non-UNIX systems such as DOS PCs, WindowsNT, etc.

Standards Notes POSIX: X/Open eventually became UNIX International (UI), which competed for a while with OSF. The US Federal Government initiated POSIX (essentially a version of UNIX), requiring all government contracts to conform to the POSIX standard - this freed the US government from being tied to vendor-specific systems, but also gave UNIX a major boost in popularity as users benefited from the industry's rapid adoption of accepted standards.

X11 and Motif: Programming directly using low-level X11/Motif libraries can be non-trivial. As a result, higher level programming interfaces were developed in later years, eg. the ViewKit library suite for SGI systems. Just as 'Open Inventor' is a higher-level 3D graphics API to OpenGL, ViewKit allows one to focus on developing the application and solving the client's problem, rather than having to wade through numerous low-level details. Even higher-level GUI-based toolkits exist for rapid application development, eg. SGI's RapidApp.

UNIX95, UNIX98: Most modern UNIX variants comply with these standards, though Linux is a typical exception (it is POSIX-compliant, but does not adhere to other standards). There are several UNIX variants available for PCs, excluding Alpha-based systems which can also use NT (MIPS CPUs could once be used with NT as well, but Microsoft dropped NT support for MIPS due to competition fears from Intel whose CPUs were not as fast at the time [11]): 

Linux insecure.

Open-architecture, free, global development,

      

OpenBSD

More rigidly controlled, much more secure.

FreeBSD

Somewhere inbetween the above two.

UnixWare

More advanced. Scalable. Not free.

There are also commercial versions of Linux which have additional features and services, eg. Red Hat Linux and Calderra Linux. Note that many vendors today are working to enable the various UNIX variants to be used with Intel's CPUs - this is needed by Intel in order to decrease its dependence on the various Microsoft OS products.

OpenGL: Apple was the last company to adopt OpenGL. In the 1990s, Microsoft attempted to force its own standards into the marketplace (Direct3D and DirectX) but this move was doomed to failure due to the superior design of OpenGL and its ease of use, eg. games designers such as John Carmack (Doom, Quake, etc.) decided OpenGL was the much better choice for games development. Compared to Direct3D/DirectX, OpenGL is far superior for seriously complex problems such as visual simulation, military/industrial applications, image processing, GIS, numerical simulation and medical imaging. In a move to unify the marketplace, SGI and Microsoft signed a deal in the late 1990s to merge DirectX and Direct3D into OpenGL - the project, called Fahrenheit, will eventually lead to a single unified graphics programming interface for all platforms from all vendors, from the lowest PC to the fastest SGI/Cray supercomputer available with thousands of processors. To a large degree, Direct3D will simply either be phased out in favour of OpenGL's methods, or focused entirely on consumer-level applications, though OpenGL will dominate in the final product for the entertainment market. OpenGL is managed by the OpenGL Architecture Review Board, an independent organisation with member representatives from all major UNIX vendors, relevant companies and institutions.

Journaled file systems: File systems like SGI's XFS running on powerful UNIX systems like CrayOrigin2000 can easily support sustained data transfer rates of hundreds of gigabytes per second. XFS has a maximum file size limit of 9 million terabytes.

The end result of the last 30 years of UNIX development is what is known as an 'Open System', ie. a system which permits reliable application portability, interoperability between different systems and effective user portability between a wide variety of different vendor hardware and software platforms. Combined with a modern set of compliance standards, UNIX is now a mature, well-understood, highly developed, powerful and very sophisticated OS. Many important features of UNIX do not exist in other OSs such as WindowsNT and will not do so for years to come, if ever. These include guaranteeable reliability, security, stability, extreme scalability (thousands of processors), proper support for advanced multi-processing with unified shared memory and resources (ie. parallel compute systems with more than 1 CPU), support for genuine real-time response, portability and an ever-increasing ease-of-use through highly advanced GUIs. Modern UNIX GUIs combine the familiar use of icons with the immense power and flexibility of the UNIX shell command line which, for example, supports full remote administration (a significant criticism of WinNT is the lack of any real command line interface for remote administration). By contrast, Windows2000 includes a colossal amount of new code which will introduce a plethora of new bugs and problems.

A summary of key UNIX features would be:   







Multi-tasking: many different processes can operate independently at once. Multi-user: many users can use a single machine at the same time; a single user can use multiple machines at the same time. Multi-processing: most commercial UNIX systems scale to at least 32 or 64 CPUs (Sun, IBM, HP), while others scale to hundreds or thousands (IRIX, Unicos, AIX, etc.; Blue Mountain [12], Blue Pacific, ASCI Red). Today, WindowsNT cannot reliably scale to even 8 CPUs. Intel will not begin selling 8-way chip sets until Q3 1999. Multi-threading: automatic parallel execution of applications across multiple CPUs and graphics systems when programs are written using the relevant extensions and libraries. Some tasks are naturally non-threadable, eg. Rendering animation frames for movies (each processor computes a single frame using a round-robin approach), while others lend themselves very well to parallel execution, eg. Computational Fluid Dynamics, Finite Element Analysis, Image Processing, Quantum Chronodynamics, weather modeling, database processing, medical imaging, visual simulation and other areas of 3D graphics, etc. Platform independence and portability: applications written on UNIX systems will compile and run on other UNIX systems if they're developed with a standards-based approach, eg. the use of ANSI C or C++, Motif libraries, etc.; UNIX hides the hardware architecture from the user, easing portability. The close relationship between UNIX and C, plus the fact that the UNIX shell is based on C, provides for a powerful development environment. Today, GUI-based development environments for UNIX systems also exist, giving even greater power and flexibility, eg. SGI's WorkShop Pro CASE tools and RapidApp. Full 64bit environment: proper support for very large memory spaces, up to hundreds of GB of RAM, visible to the system as a single combined memory space. Comparison: NT's current maximum limit is 4GB; IRIX's current commercial limit is 512GB, though









Blue Mountain's 6144-CPU SGI system has a current limit of 12000GB RAM (twice that if the CPUs were upgraded to the latest model). Blue Mountain has 1500GB RAM installed at the moment. Inter-system communication: services such as telnet, Sendmail, TCP/IP, remote login (rlogin), DNS, NIS, NFS, etc. Sophisticated security and access control. Features such as email and telnet are a fundamental part of UNIX, but they must be added as extras to other OSs. UNIX allows one to transparently access devices on a remote system and even install the OS using a CDROM, DAT or disk that resides on a remote machine. Note that some of the development which went into these technologies was in conjunction with the evolution of ArpaNet (the early Internet that was just for key US government, military, research and educational sites). File identity and access: unique file ownership and a logical file access permission structure provide very high-level management of file access for use by users and administrators alike. OSs which lack these features as a core part of the OS make it far too easy for a hacker or even an ordinary user to gain administrator-level access (NT is a typical example). System identity: every UNIX system has a distinct unique entity, ie. a system name and an IP (Internet Protocol) address. These offer numerous advantages for users and administrators, eg. security, access control, system-specific environments, the ability to login and use multiple systems at once, etc. Genuine 'plug & play': UNIX OSs already include drivers and support for all devices that the source vendor is aware of. Adding most brands of disks, printers, CDROMs, DATs, Floptical drives, ZIP or JAZ drives, etc. to a system requires no installation of any drivers at all (the downside of this is that a typical modern UNIX OS installation can be large, eg. 300MB). Detection and name-allocation to devices is largely automatic - there is no need to assign specific interrupt or memory addresses for devices, or assign labels for disk drives, ZIP drives, etc. Devices can be added and removed without affecting the long-term operation of the system. This also often applies to internal components such as CPUs, video boards, etc. (at least for SGIs).

UNIX Today. In recent years, one aspect of UNIX that was holding it back from spreading more widely was cost. Many vendors often charged too high a price for their particular flavour of UNIX. This made its use by small businesses and home users prohibitive. The ever decreasing cost of PCs, combined with the sheer marketing power of Microsoft, gave rise of the rapid growth of Windows and now WindowsNT. However, in 1993, Linus Torvalds developed a version of UNIX called Linux (he pronounces it rather like 'leenoox', rhyming with 'see-books') which was free and ran on PCs as well as other hardware platforms such as DEC machines. In what must be one of the most astonishing developments of the computer age, Linux has rapidly grown to become a highly popular OS for home and small business use and is now being supported by many major companies too, including Oracle, IBM, SGI, HP, Dell and others.

Linux does not have the sophistication of the more traditional UNIX variants such as SGI's IRIX, but Linux is free (older releases of IRIX such as IRIX 6.2 are also free, but not the very latest release, namely IRIX 6.5). This has resulted in the rapid adoption of Linux by many people and businesses, especially for servers, application development, home use, etc. With the recent announcement of support for multi-processing in Linux for up to 8 CPUs, Linux is becoming an important player in the UNIX world and a likely candidate to take on Microsoft in the battle for OS dominance. However, it'll be a while before Linux will be used for 'serious' applications since it does not have the rigorous development history and discipline of other UNIX versions, eg. Blue Mountain is an IRIX system consisting of 6144 CPUs, 1500GB RAM, 76000GB disk space, and capable of 3000 billion floating-point operations per second. This level of system development is what drives many aspects of today's UNIX evolution and the hardware which supports UNIX OSs. Linux lacks this top-down approach and needs a lot of work in areas such as security and support for graphics, but Linux is nevertheless becoming very useful in fields such as render-farm construction for movie studios, eg. a network of cheap PentiumIII machines, networked and running the free Linux OS, reliable and stable. The film "Titanic" was the first major film which used a Linuxbased render-farm, though it employed many other UNIX systems too (eg. SGIs, Alphas), as well as some NT systems. EDIT (2008): Linux is now very much used for serious work, running most of the planet's Internet servers, and widely used in movie studios for Flame/Smoke on professional x86 systems. It's come a long way since 1999, with new distributions such as Ubuntu and Gentoo proving very popular. At the high-end, SGI offers products that range from its shared-memory Linux-based Altix 4700 system with up to 1024 CPUs, to the Altix ICE, a highly expandable XEON/Linux cluster system with some sites using machines with tens of thousands of cores. UNIX has come a long way since 1969. Thompson and Ritchie could never have imagined that it would spread so widely and eventually lead to its use in such things as the control of the Mars Pathfinder probe which last year landed on Mars, including the operation of the Internet web server which allowed millions of people around the world to see the images brought back as the Martian event unfolded [13]. Today, from an administrator's perspective, UNIX is a stable and reliable OS which pretty much runs itself once it's properly setup. UNIX requires far less daily administration than other OSs

such as NT - a factor not often taken into account when companies form purchasing decisions (salaries are a major part of a company's expenditure). UNIX certainly has its baggage in terms of file structure and the way some aspects of the OS actually work, but after so many years most if not all of the key problems have been solved, giving rise to an OS which offers far superior reliability, stability, security, etc. In that sense, UNIX has very well-known baggage which is absolutely vital to safety-critical applications such as military, medical, government and industrial use. Byte magazine once said that NT was only now tackling OS issues which other OSs had solved years before [14]. Thanks to a standards-based and top-down approach, UNIX is evolving to remove its baggage in a reliable way, eg. the introduction of the NSD (Name Service Daemon) to replace DNS (Domain Name Service), NIS (Network Information Service) and aspects of NFS operation; the new service is faster, more efficient, and easier on system resources such as memory and network usage. However, in the never-ending public relations battle for computer systems and OS dominance, NT has firmly established itself as an OS which will be increasingly used by many companies due to the widespread use of the traditional PC and the very low cost of Intel's mass-produced CPUs. Rival vendors continue to offer much faster systems than PCs, whether or not UNIX is used, so I expect to see interesting times ahead in the realm of OS development. Companies like SGI bridge the gap by releasing advanced hardware systems which support NT (eg. the Visual Workstation 320 [15]), systems whose design is born out of UNIX-based experience. One thing is certain: some flavour of UNIX will always be at the forefront of future OS development, whatever variant it may be.

References 1. Texaco processes GIS data in order to analyse suitable sites for oil exploration. Their models can take several months to run even on large multi-processor machines. However, as systems become faster, companies like Texaco simply try to solve more complex problems, with more detail, etc. 2. Chevron's Nigerian office has, what was in mid-1998, the fastest supercomputer in Africa, namely a 16-processor SGI POWER Challenge (probably replaced by now with a modern 64-CPU Origin2000). A typical data set processed by the system is about 60GB which takes around two weeks to process, during which time the system must not go wrong or much processing time is lost. For individual work, Chevron uses Octane workstations which are able to process 750MB of volumetric GIS data in less than three seconds. Solving these types of problems with PCs is not yet possible. 3. The 'Tasmania Parks and Wildlife Services' (TPWS) organisation is responsible for the management and environmental planning of Tasmania's National Parks. They use modern systems like the SGI O2 and SGI Octane for modeling and simulation (virtual park models to aid in decision making and planning), but have found that much older systems such as POWER Series Predator and Crimson RealityEngine (SGI systems dating from

1992) are perfectly adequate for their tasks, and can still outperform modern PCs. For example, the full-featured pixel-fill rate of their RealityEngine system (320M/sec), which supports 48bit colour at very high resolutions (1280x2048 with 160MB VRAM), has still not been bettered by any modern PC solution. Real-time graphics comparisons at http://www.blender.nl/stuff/blench1.html show Crimson RE easily outperforming many modern PCs which ought to be faster given RE is 7 years old. Information supplied by Simon Pigot (TPWS SysAdmin). 4. "State University of New York at Buffalo Teams up with SGI for Next-Level Supercomputing Site. New Facility Brings Exciting Science and Competitive Edge to University": http://www.sgi.com/origin/successes/buffalo.html

5. Even though the email-related aspects of the Computing Department's SGI network have not been changed in any way from the default settings (created during the original OS installation), users can still email other users on the system as well as send email to external sites. 6. Unix history: http://virtual.park.uga.edu/hc/unixhistory.html

A Brief History of UNIX: http://pantheon.yale.edu/help/unixos/unix-intro.html

UNIX Lectures: http://www.sis.port.ac.uk/~briggsjs/csar4/U2.htm

Basic UNIX: http://osiris.staff.udg.mx/man/ingles/his.html

POSIX: Portable Operating System Interface: http://www.pasc.org/abstracts/posix.htm

7. "The Triumph of the Nerds", Channel 4 documentary. 8. Standard Performance Evaluation Corporation: http://www.specbench.org/

Example use of UnixWare by Intel for benchmark reporting: http://www.specbench.org/osg/cpu95/results/res98q3/cpu95-980831-03026.html http://www.specbench.org/osg/cpu95/results/res98q3/cpu95-980831-03023.html 9. "My Visit to the USA" (id Software, Paradigm Simulation Inc., NOA): http://doomgate.gamers.org/dhs/dhs/usavisit/dallas.html

10. Personal IRIS 4D/25, PCW Magazine, September 1990, pp. 186: http://www.futuretech.vuurwerk.nl/pcw9-90pi4d25.html

IndigoMagic User Environment, SGI, 1993 [IND-MAGIC-BRO(6/93)]. IRIS Indigo Brochure, SGI, 1991 [HLW-BRO-01 (6/91)]. "Smooth Operator", CGI Magazine, Vol4, Issue 1, Jan/Feb 1999, pp. 41-42. Digital Media World '98 (Film Effects and Animation Festival, Wembley Conference Center, London). Forty six pieces of work were submitted to the conference magazine by company attendees. Out of the 46 items, 43 had used SGIs; of these, 34 had used only SGIs.

11. "MIPS-based PCs fastest for WindowsNT", "MIPS Technologies announces 200MHz R4400 RISC microprocessor", "MIPS demonstrates Pentium-class RISC PC designs", all from IRIS UK, Issue 1, 1994, pp. 5. 12. Blue Mountain, Los Alamos National Laboratory: 13. http://www.lanl.gov/asci/ 14. http://www.lanl.gov/asci/bluemtn/ASCI_fly.pdf 15. http://www.lanl.gov/asci/bluemtn/bluemtn.html 16. http://www.lanl.gov/asci/bluemtn/t_sysnews.shtml http://www.lanl.gov/orgs/pa/News/111298.html#anchor263034

17. "Silicon Graphics Technology Plays Mission-Critical Role in Mars Landing" http://www.sgi.com/newsroom/press_releases/1997/june/jplmars_release.html "Silicon Graphics WebFORCE Internet Servers Power Mars Web Site, One of the World's Largest Web Events" http://www.sgi.com/newsroom/press_releases/1997/july/marswebforce_release.html "PC Users Worldwide Can Explore VRML Simulation of Mars Terrain Via the Internet" http://www.sgi.com/newsroom/press_releases/1997/june/vrmlmars_release.html 18. "Deja Vu All Over Again"; "Windows NT security is under fire. It's not just that there are holes, but that they are holes that other OSes patched years ago", Byte Magazine, Vol 22 No. 11, November 1997 Issue, pp. 81 to 82, by Peter Mudge and Yobie Benjamin. 19. VisualWorkstation320 Home Page: http://visual.sgi.com/

Day 1: Part 2:

The basics: files, UNIX shells, editors, commands. Regular Expressions and Metacharacters in Shells.

UNIX Fundamentals: Files and the File System. At the lowest level, from a command-line point of view, just about everything in a UNIX environment is treated as a file - even hardware entities, eg. Printers, disks and DAT drives. Such items might be described as 'devices' or with other terms, but at the lowest level they are visible to the admin and user as files somewhere in the UNIX file system (under /dev in the case of hardware devices). Though this structure may seem a little odd at first, it means that system commands can use a common processing and communication interface no matter what type of file they're dealing with, eg. Text, pipes, data redirection, etc. (these concepts are explained in more detail later). The UNIX file system can be regarded as a top-down tree of files and directories, starting with the top-most 'root' directory. A directory can be visualised as a filing cabinet, other directories as folders within the cabinet and individual files as the pieces of paper within folders. It's a useful analogy if one isn't familiar with file system concepts, but somewhat inaccurate since a directory in a computer file system can contain files on their own as well as other directories, ie. Most office filing cabinets don't have loose pieces of paper outside of folders. UNIX file systems can also have 'hidden' files and directories. In DOS, a hidden file is just a file with a special attribute set so that 'dir' and other commands do not show the file; by contrast, a hidden file in UNIX is any file which begins with a dot '.' (period) character, ie. the hidden status is a result of an aspect of the file's name, not an attribute that is bolted onto the file's general existence. Further, whether or not a user can access a hidden file or look inside a hidden directory has nothing to do with the fact that the file or directory is hidden from normal view (a hidden file in DOS cannot be written to). Access permissions are a separate aspect of the fundamental nature of a UNIX file and are dealt with later. The 'ls' command lists files and directories in the current directory, or some other part of the file system by specifying a 'path' name. For example: ls /

Will show the contents of the root directory, which may typically contain the following: CDROM bin debug

dev dumpster etc

home lib lib32

mapleson nsmail opt

proc root.home sbin

stand tmp unix

usr var

Figure 1. A typical root directory shown by 'ls'. Almost every UNIX system has its own unique root directory and file system, stored on a disk within the machine. The exception is a machine with no internal disk, running off a remote server in some way;

such systems are described as 'diskless nodes' and are very rare in modern UNIX environments, though still used if a diskless node is an appropriate solution.

Some of the items in Fig 1. Are files, while others are directories? If one uses an option '-F' with the ls command, special characters are shown after the names for extra clarity: /

- directory

*

- executable file

@

- link to another file or directory Elsewhere in the file system

Thus, using 'ls -F' gives this more useful output: CDROM/ Bin/ Debug/

dev/ dumpster/ etc/

home/ lib/ lib32/

mapleson/ nsmail/ opt/

proc/ root.home sbin/

stand/ tmp/ unix*

usr/ var/

Figure 2. The root directory shown by 'ls -F /'. Fig 2 shows that most of the items are in fact other directories. Only two items are ordinary files: 'unix' and 'root.home'. 'UNIX' is the main UNIX kernel file and is often several megabytes in size for today's modern UNIX systems - this is partly because the kernel must often include support for 64bit as well as older 32bit system components. 'root.home' is merely a file created when the root user accesses the WWW using Netscape, ie. an application-specific file.

Important directories in the root directory: /bin

- many as-standard system commands are here (links to /usr/bin)

/dev

- device files for keyboard, disks, printers, etc.

/etc

- system configuration files

/home

- user accounts are here (NFS mounted)

/lib

- library files used by executable programs

/sbin

- user applications and other commands

/tmp

- temporary directory (anyone can create files here). This directory is normally erased on bootup

/usr

- Various product-specific directories, system resource directories, locations of online help (/usr/share), header files of application development (usr/include), further system configuration files relating to low-level

hardware which are rarely touched even by an administrator (eg. /usr/cpu and /usr/gfx). /var

- X Windows files (/var/X11), system services files (eg. software licenses in /var/flexlm), various application related files (/var/netscape, /var/dmedia), system administration files and data (/var/adm, /var/spool) and a second temporary directory (/var/tmp) which is not normally erased on bootup (an administrator can alter the behaviour of both /tmp and /var/tmp).

/mapleson

- (non-standard) my home account is here, NFSmounted from the admin Indy called Milamber.

Figure 3. Important directories in the root directory. Comparisons with other UNIX variants such as HP-UX, SunOS and Solaris can be found in the many FAQ (Frequently Asked Questions) files available via the Internet [1].

Browsing around the UNIX file system can be enlightening but also a little overwhelming at first. However, an admin never has to be concerned with most parts of the file structure; lowlevel system directories such as /var/cpu are managed automatically by various system tasks and programs. Rarely, if ever, does an admin even have to look in such directories, never mind alter their contents (the latter is probably an unwise thing to do). From the point of view of a novice admin, the most important directory is /etc. It is this directory which contains the key system configuration files and it is these files which are most often changed when an admin wishes to alter system behaviour or properties. In fact, an admin can get to grips with how a UNIX system works very quickly, simply by learning all about the following files to begin with: /etc/sys_id

- the name of the system (may include full domain)

/etc/hosts

- summary of full host names (standard file, added to by the administrator)

/etc/fstab

- list of file systems to mount on bootup

/etc/passwd

- password file, contains user account information

/etc/group

- group file, contains details of all user groups

Figure 4. Key files for the novice administrator. Note that an admin also has a personal account, ie. an ordinary user account, which should be used for any task not related to system administration. More precisely, an admin should only be logged in as root when it is strictly necessary, mainly to avoid unintended actions, eg. accidental use of the 'rm' command.

A Note on the 'man' Command. The manual pages and other online information for the files shown in Fig 4 all list references to other related files, eg. the man page for 'fstab' lists 'mount' and 'xfs' in its 'SEE ALSO' section, as well as an entry called 'filesystems' which is a general overview document about UNIX file systems of all types, including those used by CDROMs and floppy disks. Modern UNIX releases contain a large number of useful general reference pages such as 'filesystems'. Since one may not know what is available, the 'k' and 'f' options can be used with the man command to offer suggestions, eg. 'man -f file' gives this output (the -f option shows all man page titles for entries that begin with the word 'file'): ferror, feof, clearerr, fileno (3S) file (1) file (3Tcl) File::Compare (3) File::Copy (3) File::DosGlob (3) File::Path (3) File::stat (3) filebuf (3C++) FileCache (3) fileevent (3Tk) FileHandle (3) filename_to_devname (2) filename_to_drivername (2) fileparse (3) files (7P) FilesystemManager (1M) filesystems: cdfs, dos, fat, EFS, hfs, mac, iso9660, cd-rom, kfs, nfs, XFS, rockridge (4) filetype (5) filetype, fileopen, filealtopen, wstype (1) routeprint, fileconvert (1)

stream status inquiries determine file type Manipulate file names and attributes Compare files or filehandles Copy files or filehandles DOS like globbing and then some create or remove a series of directories by-name interface to Perl's built-in stat() functions buffer for file I/O. keep more files open than the system permits Execute a script when a file becomes readable or writable supply object methods for filehandles determine the device name for the device file determine the device name for the device file split a pathname into pieces local files name service parser library view and manage filesystems

IRIX filesystem types K-AShare's filetype specification file determine filetype of specified file or files convert file to printer or to specified filetype

Figure 5. Output from 'man -f file'. 'man -k file' gives a much longer output since the '-k' option runs a search on every man page title containing the word 'file'. So a point to note: judicious use of the man command along with other online information is an effective way to learn how any UNIX system works and how to make changes to

system behaviour. All man pages for commands give examples of their use, a summary of possible options, syntax, further references, a list of any known bugs with appropriate workarounds, etc.

The next most important directory is probably /var since this is where the configuration files for many system services are often housed, such as the Domain Name Service (/var/named) and Network Information Service (/var/yp). However, small networks usually do not need these services which are aimed more at larger networks. They can be useful though, for example in aiding Internet access. Overall, a typical UNIX file system will have over several thousand files. It is possible for an admin to manage a system without ever knowing what the majority of the system's files are for. In fact, this is a preferable way of managing a system. When a problem arises, it is more important to know where to find relevant information on how to solve the problem, rather than try to learn the solution to every possible problem in the first instance (which is impossible). I once asked an experienced SGI administrator (the first person to ever use the massive Cray T3D supercomputer at the Edinburgh Parallel Computing Centre) what the most important thing in his daily working life was. He said it was a small yellow note book in which he had written where to find information about various topics. The book was an index on where to find facts, not a collection of facts in itself. Hidden files were described earlier. The '-a' option can be used with the ls command to show hidden files: ls -a /

gives: ./ ../ .Acroread.License .Sgiresources .cshrc .desktop-yoda/ .ebtpriv/ .expertInsight .insightrc .jotrc* .login .netscape/ .profile .rhosts

.sgihelprc .ssh/ .varupdate .weblink .wshttymode .zmailrc CDROM/ bin/ debug/ dev/ dumpster/ etc/ floppy/ home/

lib/ lib32/ mapleson/ nsmail/ opt/ proc/ sbin/ stand/ swap/ tmp/ unix* usr/ var/

Figure 6. Hidden files shown with 'ls -a /'. For most users, important hidden files would be those which configure their basic working environment when they login: .cshrc .login .profile

Other hidden files and directories refer to application-specific resources such as Netscape, or GUI-related resources such as the .desktop-sysname directory (where 'sysname' is the name of the host). Although the behaviour of the ls command can be altered with the 'alias' command so that it shows hidden files by default, the raw behaviour of ls can be accessed by using an absolute directory path to the command: /bin/ls

Using the absolute path to any file in this way allows one to ignore any aliases which may have been defined, as well as the normal behaviour of the shell to search the user's defined path for the first instance of a command. This is a useful technique when performing actions as root since it ensures that the wrong command is not executing by mistake.

Network File System (NFS) An important feature of UNIX is the ability to access a particular directory on one machine from another machine. This service is called the 'Network File System' (NFS) and the procedure itself is called 'mounting'. For example, on the machines in Ve24, the directory /home is completely empty - no files are in it whatsoever (except for a README file which is explained below). When one of the Indys is turned on, it 'mounts' the /home directory from the server 'on top' of the /home directory of the local machine. Anyone looking in the /home directory actually sees the contents of /home on the server. The 'mount' command is used to mount a directory on a file system belonging to a remote host onto some directory on the local host's filesystem. The remote host must 'export' a directory in order for other hosts to locally mount it. The /etc/exports file contains a list of directories to be exported. For example, the following shows how the /home directory on one of the Ve24 Indys (akira) is mounted off the server, yet appears to an ordinary user to be just another part of akira's overall file system (NB: the '#' indicates these actions are being performed as root; an ordinary user would not be able to use the mount command in this way): AKIRA 1# mount | grep YODA YODA:/var/www on /var/www type nfs (vers=3,rw,soft,intr,bg,dev=c0001) YODA:/var/mail on /var/mail type nfs (vers=3,rw,dev=c0002) YODA:/home on /home type nfs (vers=3,rw,soft,intr,bg,dev=c0003) AKIRA 1# ls /home dist/ projects/ pub/ staff/ students/ tmp/ yoda/ AKIRA 2# umount /home AKIRA 1# mount | grep YODA YODA:/var/www on /var/www type nfs (vers=3,rw,soft,intr,bg,dev=c0001) YODA:/var/mail on /var/mail type nfs (vers=3,rw,dev=c0002)

AKIRA 3# README AKIRA 4# AKIRA 5# dist/ AKIRA 6# CDROM/ bin/ debug/

ls /home mount /home ls /home projects/ ls / dev/ dumpster/ etc/

pub/

staff/

students/

tmp/

yoda/

home/ lib/ lib32/

mapleson/ nsmail/ opt/

proc/ root.home sbin/

stand/ tmp/ unix*

usr/ var/

Figure 7. Manipulating an NFS-mounted file system with 'mount'.

Each Indy has a README file in its local /home, containing: The /home filesystem from Yoda is not mounted for some reason. Please contact me immediately! Ian Mapleson, Senior Technician. 3297 (internal) [email protected]

After /home is remounted in Fig 7, the ls command no longer shows the README file as being present in /home, ie. when /home is mounted from the server, the local contents of /home are completely hidden and inaccessible. When accessing files, a user never has to worry about the fact that the files in a directory which has been mounted from a remote system actually reside on a physically separate disk, or even a different UNIX system from a different vendor. Thus, NFS gives a seamless transparent way to merge different files systems from different machines into one larger structure. At the department where I studied years ago [2], their UNIX system included Hewlett Packard machines running HP-UX, Sun machines running SunOS, SGIs running IRIX, DEC machines running Digital UNIX, PCs running an X-Windows emulator called Windows Exceed, and some Linux PCs. All the machines had access to a single large file structure so that any user could theoretically use any system in any part of the building (except where deliberately prevented from doing so via local system file alterations). Another example is my home directory /mapleson - this directory is mounted from the admin Indy (Technicians' office Ve48) which has my own extra external disk locally mounted. As far as the server is concerned, my home account just happens to reside in /mapleson instead of /home/staff/mapleson. There is a link to /mapleson from /home/staff/mapleson which allows other staff and students to access my directory without having to ever be aware that my home account files do not physically reside on the server. Every user has a 'home directory'. This is where all the files owned by that user are stored. By default, a new account would only include basic files such as .login, .cshrc and .profile. Admin customisation might add a trash 'dumpster' directory, user's WWW site directory for public access, email directory, perhaps an introductory README file, a default GUI layout, etc.

UNIX Fundamentals: Processes and process IDs. As explained in the UNIX history, a UNIX OS can run many programs, or processes, at the same time. From the moment a UNIX system is turned on, this process is initiated. By the time a system is fully booted so that users can login and use the system, many processes will be running at once. Each process has its own unique identification number, or process ID. An administrator can use these ID numbers to control which processes are running in a very direct manner. For example, if a user has run a program in the background and forgotten to close it down before logging off (perhaps the user's process is using up too much CPU time) then the admin can shutdown the process using the kill command. Ordinary users can also use the kill command, but only on processes they own. Similarly, if a user's display appears frozen due to a problem with some application (eg. Netscape) then the user can logon to a different system, login to the original system using rlogin, and then use the kill command to shutdown the process at fault either by using the specific process ID concerned, or by using a general command such as killall, eg.: killall netscape

This will shutdown all currently running Netscape processes, so using specific ID numbers is often attempted first. Most users only encounter the specifics of processes and how they work when they enter the world of application development, especially the lower-level aspects of inter-process communication (pipes and sockets). Users may often run programs containing bugs, perhaps leaving processes which won't close on their own. Thus, kill can be used to terminate such unwanted processes. The way in which UNIX manages processes and the resources they use is extremely tight, ie. it is very rare for a UNIX system to completely fall over just because one particular process has caused an error. 3rd-party applications like Netscape are usually the most common causes of process errors. Most UNIX vendors vigorously test their own system software to ensure they are, as far as can be ascertained, error-free. One reason why alot of work goes into ensuring programs are bug free is that bugs in software are a common means by which hackers try to gain root (admin) access to a system: by forcing a particular error condition, a hacker may be able to exploit a bug in an application. For an administrator, most daily work concerning processes is about ensuring that system resources are not being overloaded for some reason, eg. a user running a program which is forking itself repeatedly, slowing down a system to a crawl. In the case of the SGI system I run, staff have access to the SGI server, so I must ensure that staff do not carelessly run processes which hog CPU time. Various means are available by which an administrator can restrict the degree to which any particular process can utilise system resources, the most important being a process priority level (see the man pages for 'nice' and 'renice').

The most common process-related command used by admins and users is 'ps', which displays the current list of processes. Various options are available to determine which processes are displayed and in what output format, but perhaps the most commonly used form is this: ps -ef

which shows just about everything about every process, though other commands exist which can give more detail, eg. the current CPU usage for each process (osview). Note that other UNIX OSs (eg. SunOS) require slightly different options, eg. 'ps -aux' - this is an example of the kind of difference which users might notice between System V and BSD derived UNIX variants.

The Pipe. An important aspect of processes is inter-process communication. From an every day point of view, this involves the concept of pipes. A pipe, as the name suggests, acts as a communication link between two processes, allowing the output of one processes to be used as the input for another. The pipe symbol is a vertical bar '|'. One can use the pipe to chain multiple commands together, eg.: cat *.txt | grep pattern | sort | lp

The above command sequence dumps the contents of all the files in the current directory ending in .txt, but instead of the output being sent to the 'standard output' (ie. the screen), it is instead used as the input for the grep operation which scans each incoming line for any occurence of the word 'pattern' (grep's output will only be those lines which do contain that word, if any). The output from grep is then sorted by the sort program on a line-by-line basis for each file found by cat (in alphanumeric order). Finally, the output from sort is sent to the printer using lp. The use of pipes in this way provides an extremely effective way of combining many commands together to form more powerful and flexible operations. By contrast, such an ability does not exist in DOS, or in NT. Processes are explained further in a later lecture, but have been introduced now since certain process-related concepts are relevant when discussing the UNIX 'shell'.

UNIX Fundamentals: The Shell Command Interface. A shell is a command-line interface to a UNIX OS, written in C, using a syntax that is very like the C language. One can enter simple commands (shell commands, system commands, userdefined commands, etc.), but also more complex sequences of commands, including expressions and even entire programs written in a scripting language called 'shell script' which is based on C and known as 'sh' (sh is the lowest level shell; rarely used by ordinary users, it is often used by admins and system scripts). Note that 'command' and 'program' are used synonymously here.

Shells are not in any way like the PC DOS environment; shells are very powerful and offer users and admins a direct communication link to the core OS, though ordinary users will find there is a vast range of commands and programs which they cannot use since they are not the root user. Modern GUI environments are popular and useful, but some tasks are difficult or impossible to do with an iconic interface, or at the very least are simply slower to perform. Shell commands can be chained together (the output of one command acts as the input for another), or placed into an executable file like a program, except there is no need for a compiler and no object file - shell 'scripts' are widely used by admins for system administration and for performing common tasks such as locating and removing unwanted files. Combined with the facility for full-scale remote administration, shells are very flexible and efficient. For example, I have a single shell script 'command' which simultaneously reboots all the SGI Indys in Ve24. These shortcuts are useful because they minimise keystrokes and mistakes. An admin who issues lengthy and complex command lines repeatedly will find these shortcuts a handy and necessary time-saving feature. Shells and shell scripts can also use variables, just as a C program can, though the syntax is slightly different. The equivalent of if/then statements can also be used, as can case statements, loop structures, etc. Novice administrators will probably not have to use if/then or other more advanced scripting features at first, and perhaps not even after several years. It is certainly true that any administrator who already knows the C programming language will find it very easy to learn shell script programming, and also the other scripting languages which exist on UNIX systems such as perl (Practical Extraction and Report Language), awk (pattern scanning and processing language) and sed (text stream editor). perl is a text-processing language, designed for processing text files, extracting useful data, producing reports and results, etc. perl is a very powerful tool for system management, especially combined with other scripting languages. However, perl is perhaps less easy to learn for a novice; the perl man page says, "The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal)." I have personally never had to write a perl program as yet, or a program using awk or sed. This is perhaps a good example if any were required of how largely automated modern UNIX systems are. Note that the perl man page serves as the entire online guide to the perl language and is thus quite large. An indication of the fact that perl and similar languages can be used to perform complex processing operations can be seen by examining the humourous closing comment in the perl man page: "Perl actually stands for Pathologically Eclectic Rubbish Lister, but don't tell anyone I said that."

Much of any modern UNIX OS actually operates using shell scripts, many of which use awk, sed and perl as well as ordinary shell commands and system commands. These scripts can look quite complicated, but in general they need not be of any concern to the admin; they are often quite old (ie. written years ago), well understood and bug-free. Although UNIX is essentially a text-based command-driven system, it is perfectly possible for most users to do the majority or even all of their work on modern UNIX systems using just the

GUI interface. UNIX variants such as IRIX include advanced GUIs which combine the best of both worlds. It's common for a new user to begin with the GUI and only discover the power of the text interface later. This probably happens because most new users are already familiar with other GUI-based systems (eg. Win95) and initially dismiss the shell interface because of prior experience of an operating system such as DOS, ie. they perceive a UNIX shell to be just some weird form of DOS. Shells are not DOS, ie.:  

DOS is an operating system. Win3.1 is built on top of DOS, as is Win95, etc. UNIX is an operating system. Shells are a powerful text command interface to UNIX and not the OS itself. A UNIX OS uses shell techniques in many aspects of its operation.

Shells are thus nothing like DOS; they are closely related to UNIX in that the very first version of UNIX included a shell interface, and both are written in C. When a UNIX system is turned on, a shell is used very early in the boot sequence to control what happens and execute actions.

Because of the way UNIX works and how shells are used, much of UNIX's inner workings are hidden, especially at the hardware level. This is good for the user who only sees what she or he wants and needs to see of the file structure. An ordinary user focuses on their home directory and certain useful parts of the file system such as /var/tmp and /usr/share, while an admin will also be interested in other directories which contain system files, device files, etc. such as /etc, /var/adm and /dev. The most commonly used shells are: bsh

- Bourne Shell; standard/job control - command programming language

ksh

- modern alternative to bsh, but still restricted

csh

- Berkeley's C Shell; a better bsh - with many additional features

tcsh

- an enhanced version of csh

Figure 8. The various available shells. These offer differing degrees of command access/history/recall/editing and support for shell script programming, plus other features such as command aliasing (new names for user-defined sequences of one or more commands). There is also rsh which is essentially a restricted version of the standard command interpreter sh; it is used to set up login names and execution environments whose capabilities are more controlled than those of the standard shell. Shells such as csh and tcsh execute the file /etc/cshrc before reading the user's own .cshrc, .login and perhaps .tcshrc file if that exists.

Shells use the concept of a 'path' to determine how to find commands to execute. The 'shell path variable', which is initially defined in the user's .cshrc or .tcshrc file, consists of a list of directories, which may be added to by the user. When a command is entered, the shell environment searches each directory listed in the path for the command. The first instance of a file which matches the command is executed, or an error is given if no such executable command

is found. This feature allows multiple versions of the same command to exist in different locations (eg. different releases of a commercial application). The user can change the path variable so that particular commands will run a file from a desired directory. Try: echo $PATH

The list of directories is given. WARNING: the dot '.' character at the end of a path definition means 'current directory'; it is dangerous to include this in the root user's path definition (this is because a root user could run an ordinary user's program(s) by mistake). Even an ordinary user should think twice about including a period at the end of their path definition. For example, suppose a file called 'la' was present in /tmp and was set so that it could be run by any user. Enterting 'la' instead of 'ls' by mistake whilst in /tmp would fail to find 'la' in any normal system directory, but a period in the path definition would result in the shell finding la in /tmp and executing it; thus, if the la file contained malicious commands (eg. '/bin/rm -rf $HOME/mail'), then loss of data could occur.

Typical commands used in a shell include (most useful commands listed first): cd ls rm mv cat more find grep man mkdir rmdir pwd cmp lp df du mail passwd

-

change directory list contents of directory delete a file (no undelete!) move a file dump contents of a file display file contents in paged format search file system for files/directories scan file(s) using pattern matching read/search a man page (try 'man man') create a directory remove directory ('rm -r' has the same effect) print current absolute working directory show differences between two files print a file show disk usage show space used by directories/files send an email message to another user change password (yppasswd for systems with NIS)

Figure 9. The commands used most often by any user. Editors: vi

xedit jot nedit

- ancient editor. Rarely used (arcane), but occasionally useful, especially for remote administration.

- GUI editors (jot is old, nedit is - newer, xedit is very simple).

Figure 10. Editor commands. Most of these are not built-in shell commands. Enter 'man csh' or 'man tcsh' to see which commands are part of the shell and hence which are other system programs, eg. 'which' is a shell command, but 'grep' is not; 'cd' is a shell command, but 'ls' is not.

vi is an ancient editor developed in the very early days of UNIX when GUI-based displays did not exist. It is not used much today, but many admins swear by it - this is only really because they know it so well after years of experience. The vi editor can have its uses though, eg. for remote administration: if you happen to be using a Wintel PC in an Internet cafe and decide to access a remote UNIX system via telnet, the vi editor will probably be the only editor which you can use to edit files on the remote system. Jot has some useful features, especially for programmers (macros, "Electric C Mode"), but is old and contains an annoying colour map bug; this doesn't affect the way jot works, but does sometimes scramble on-screen colours within the jot window. SGI recommends nedit be used instead. xedit is a very simple text editor. It has an extremely primitive file selection interface, but has a rather nice search/replace mechanism. nedit is a newer GUI editor with more modern features. jot is specific to SGI systems, while vi, xedit and nedit exist on any UNIX variant (if not by default, then they can be downloaded in source code or executable format from relevant anonymous ftp sites).

Creating a new shell: sh, csh, tcsh, bsh, ksh - use man pages to see differences

I have configured the SGI machines in Ve24 to use tcsh by default due to the numerous extra useful features in tcsh, including file name completion (TAB), command-line editing, alias support, file listing in the middle of a typed command (CTRL-D), command recall/reuse, and many others (the man page lists 36 main extras compared to csh). Further commands: which chown chgrp chmod who rusers sleep sort spell

- show location of a command based on current path definition - change owner ID of a file - change group ID of a file - change file access permissions - show who is on the local system - show all users on local network - pause for a number of seconds - sort data into a particular order - run a spell-check on a file

split strings cut tr wc whoami write wall talk to_dos to_unix su

- split a file into a number of pieces - show printable text strings in a file - cut out selected fields of each line of a file - substitute/delete characters from a text stream or file - count the number of words in a file - show user ID - send message to another user - broadcast to all users on local system - request 1:1 communication link with another user - convert text file to DOS format (add CTRL-M and CTRL-Z) - convert text file to UNIX format (opposite of to_dos) - adopt the identity of another user (password usually required)

Figure 11. The next most commonly used commands.

Of the commands shown in Fig 11, only 'which' is a built-in shell command. Any GUI program can also be executed via a text command (the GUI program is just a highlevel interface to the main program), eg. 'fm' for the iconic file manager/viewer, 'apanel' for the Audio Panel, 'printers' for the Printer Manager, 'iconbook' for Icon Catalog, 'mouse' for customise mouse settings, etc. However, not all text commands will have a GUI equivalent - this is especially true of many system administration commands. Other categories are shown in Figs 12 to 17 below. fx mkfs mount ln tar gzip compress pack head tail

-

repartition a disk, plus other functions make a file system on a disk mount a file system (NFS) create a link to a file or directory create/extract an archive file compress a file (gunzip) compress a file (uncompress). Different format from gzip. - a further compression method (eg. used with man pages and release notes) - show the first few lines in a file - show the last few lines in a file

Figure 12. File system manipulation commands.

The tar command is another example where slight differences between UNIX variants exist with respect to default settings. However, command options can always be used to resolve such differences.

hinv uname gfxinfo sysinfo gmemusage ps top kill killall osview startconsole

-

show hardware inventory (SGI specific) show OS version show graphics hardware information (SGI-specific) print system ID (SGI-specific) show current memory usage display a snapshot of running process information constantly updated process list (GUI: gr_top) shutdown a process shutdown a group of processes system resource usage (GUI: gr_osview) system console, a kind of system monitoring xterm which applications will echo messages into

Figure 13. System Information and Process Management Commands.

inst swmgr versions

- install software (text-based) - GUI interface to inst (the preferred method; easier to use) - show installed software

Figure 14. Software Management Commands.

cc, CC, gcc make xmkmf lint cvd

- compile program (further commands may exist for other languages) - run program compilation script - Use imake on an Imakefile to create vendor-specific make file - check a C program for errors/bugs - CASE tool, visual debugger for C programs (SGI specific)

Figure 15. Application Development Commands.

relnotes man insight infosearch

-

software release notes (GUI: grelnotes) manual pages (GUI: xman) online books searchable interface to the above three (IRIX 6.5 and later)

Figure 16. Online Information Commands (all available from the 'Toolchest')

telnet ftp

- open communication link - file transfer

ping traceroute nslookup finger

-

send test packets display traced route to remote host translate domain name into IP address probe remote host for user information

Figure 17. Remote Access Commands.

This is not a complete list! And do not be intimidated by the apparent plethora of commands. An admin won't use most of them at first. Many commands are common to any UNIX variant, while those that aren't (eg. hinv) probably have equivalent commands on other UNIX platforms. Shells can be displayed in different types of window, eg. winterm, xterm. xterms comply with the X11 standard and offer a wider range of features. xterms can be displayed on remote displays, as can any X-based application (this includes just about every program one ever uses). Security note: the remote system must give permission or be configured to allow remote display (xhost command). If one is accessing a UNIX system via an older text-only terminal (eg. VT100) then the shell operates in 'terminal' mode, where the particular characteristics of the terminal in use determine how the shell communicates with the terminal (details of all known terminals are stored in the /usr/lib/terminfo directory). Shells shown in visual windows (xterms, winterms, etc.) operate a form of terminal emulation that can be made to exactly mimic a basic text-only terminal if required. Tip: if one ever decides to NFS-mount /usr/lib to save space (thus normally erasing the contents of /usr/lib on the local disk), it is wise to at least leave behind the terminfo directory on the local disk's /usr/lib; thus, should one ever need to logon to the system when /usr/lib is not mounted, terminal communication will still operate normally.

The lack of a fundamental built-in shell environment in WindowsNT is one of the most common criticisms made by IT managers who use NT. It's also why many high-level companies such as movie studios do not use NT, eg. no genuine remote administration makes it hard to manage clusters of several dozen systems all at once, partly because different systems may be widely dispersed in physical location but mainly because remote administration makes many tasks considerably easier and more convenient.

Regular Expressions and Metacharacters. Shell commands can employ regular expressions and metacharacters which can act as a means for referencing large numbers of files or directories, or other useful shortcuts. Regular expressions are made up of a combination of alphanumeric characters and a series of punctuation characters that have special meaning to the shell. These punctuation characters are called metacharacters when they are used for their special meanings with shell commands.

The most common metacharacter is the wildcard '*', used to reference multiple files and/or directories, eg.: Dump the contents of all files in the current directory to the display: cat *

Remove all object files in the current directory: rm *.o

Search all files ending in .txt for the word 'Alex': grep Alex *.txt

Print all files beginning with 'March' and ending in '.txt': lp March*.txt

Print all files beginning with 'May': lp May*

Note that it is not necessary to use 'May*.*' - this is because the dot is just another character that can be a valid part of any UNIX file name at any position, ie. a UNIX file name may include multiple dots. For example, the Blender shareware animation program archive file is called: blender1.56_SGI_6.2_ogl.tar.gz

By contrast, DOS has a fixed file name format where the dot is a rigid aspect of any file name. UNIX file names do not have to contain a dot character, and can even contain spaces (though such names can confuse the shell unless one encloses the entire name in quotes ""). Other useful metacharacters relate to executing previously entered commands, perhaps with modification, eg. the '!' is used to recall a previous command, as in: !! !grep

- Repeat previous command - Repeat the last command which began with 'grep'

For example, an administrator might send 20 test packets to a remote site to see if the remote system is active: ping -c 20 www.sgi.com

Following a short break, the administrator may wish to run the same command again, which can be done by entering '!!'. Minutes later, after entering other commands, the admin might want to run the last ping test once more, which is easily possible by entering '!ping'. If no other command had since been entered beginning with 'p', then even just '!p' would work.

The '^' character can be used to modify the previous command, eg. suppose I entered: grwp 'some lengthy search string or whatever' *

grep has been spelled incorrectly here, so an error is given ('gwrp: Command not found'). Instead of typing the whole line again, I could enter: ^wê

The shell searches the previous command for the first appearance of 'w', replaces that letter with 'e', displays the newly formed command as a means of confirmation and then executes the command. Note: the '^' operator can only search for the first occurrence of the character or string to be changed, ie. in the above example, the word 'whatever' is not changed to 'ehatever'. The parameter to search for, and the pattern to replace any targets found, can be any standard regular expression, ie. a valid sequence of ASCII characters. In the above example, entering '^grwp^grep^' would have had the same effect, though is unnecessarily verbose. Note that characters such as '!' and '^' operate entirely within the shell, ie. they are not 'memorised' as discrete commands. Thus, within a tcsh, using the Up-Arrow key to recall the previous command after the '^wê' command sequence does not show any trace of the '^wê' action. Only the corrected, executed command is shown. Another commonly used character is the '&' symbol, normally employed to control whether or not a process executed from with a shell is run in the foreground or background. As explained in the UNIX history, UNIX can run many processes at once. Processes employ a parental relationship whereby a process which creates a new process (eg. a shell running a program) is said to be creating a child process. The act of creating a new process is called forking. When running a program from within a shell, the prompt may not come back after the command is entered - this means the new process is running in 'foreground', ie. the shell process is suspended until such time as the forked process terminates. In order to run the process in background, which will allow the shell process to carry on as before and still be used, the '&' symbol must be included at the end of the command. For example, the 'xman' command normally runs in the foreground: enter 'xman' in a shell and the prompt does not return; close the xman program, or type CTRL-C in the shell window, and the shell prompt returns. This effectively means the xman program is 'tied' to the process which forked it, in this case the shell. If one closes the shell completely (eg. using the top-left GUI button, or a kill command from a different shell) then the xman window vanishes too. However, if one enters: xman &

then the xman program is run in the 'background', ie. the shell prompt returns immediately (note the space is optional, ie. 'xman&' is also valid). This means the xman session is now independent of the process which forked it (the shell) and will still exist even if the shell is closed.

Many programs run in the background by default, eg. swmgr (install system software). The 'fg' command can be used to bring any process into the foreground using the unqiue process ID number which every process has. With no arguments, fg will attempt to bring to the foreground the most recent process which was run in the background. Thus, after entering 'xman&', the 'fg' command on its will make the shell prompt vanish, as if the '&' symbol had never been used. A process currently running in the foreground can be deliberately 'suspended' using the CTRL-Z sequence. Try running xman in the foreground within a shell and then typing CTRL-Z - the phrase 'suspended' is displayed and the prompt returns, showing that the xman process has been temporarily halted. It still exists, but is frozen. Try using the xman program at this point: notice that the menus cannot be accessed and the window overlay/underlay actions are not dealt with anymore. Now go back to the shell and enter 'fg' - the xman program is brought back into the foreground and begins running once more. As a final example, try CTRL-Z once more, but this time enter 'bg'. Now the xman process is pushed fully into the background. Thus, if one intends to run a program in the background but forgets to include the '&' symbol, then one can use CTRL-Z followed by 'bg' to place the process in the background. Note: it is worth mentioning at this point an example of how I once observed Linux to be operating incorrectly. This example, seen in 1997, probably wouldn't happen today, but at the time I was very surprised. Using a csh shell on a PC running Linux, I ran the xedit editor in the background using: xedit&

Moments later, I had cause to shutdown the relevant shell, but the xedit session terminated as well, which should not have happened since the xedit process was supposed to be running in background. Exactly why this happened I do not know - presumably there was a bug in the way Linux handled process forking which I am sure has now been fixed. However, in terms of how UNIX is supposed to work, it's a bug which should not have been present. Actually, since many shells such as tcsh allow one to recall previous commands using the arrow keys, and to edit such commands using Alt/CTRL key combinations and other keys, the need to use metacharacter such as '!' and '^' is lessened. However, they're useful to know in case one encounters a different type of shell, perhaps as a result of a telnet session to a remote site where one may not have any choice over which type of shell is used.

Standard Input (stdin), Standard Output (stdout), Standard Error (stderr). As stated earlier, everything in UNIX is basically treated as a file. This even applies to the concept of where output from a program goes to, and where the input to a program comes from. The relevant files, or text data streams, are called stdin and stdout (standard 'in', standard 'out'). Thus, whenever a command produces a visible output in a shell, what that command is actually doing is sending its output to the file handle known as stdout. In the case of the user typing commands in a shell, stdout is defined to be the display which the user sees.

Similarly, the input to a command comes from stdin which, by default, is the keyboard. This is why, if you enter some commands on their own, they will appear to do nothing at first, when in fact they are simply waiting for input from the stdin stream, ie. the keyboard. Enter 'cat' on its own and see what happens; nothing at first, but then enter any text sequence - what you enter is echoed back to the screen, just as it would be if cat was dumping the contents of a file to the screen. This stdin input stream can be temporarily redefined so that a command takes its input from somewhere other than the keyboard. This is known as 'redirection'. Similarly, the stdout stream can be redirected so that the output goes somewhere other than the display. The '' symbols are used for data redirection. For example: ps -ef > file

This runs the ps command, but sends the output into a file. That file could then be examined with cat, more, or loaded into an editor such as nedit or jot. Try: cat > file

You can then enter anything you like until such time as some kind of termination signal is sent, either CTRL-D which acts to end the text stream, or CTRL-C which stops the cat process. Type 'hello', press Enter, then press CTRL-D. Enter 'cat file' to see the file's contents. A slightly different form of output redirection is '>>' which appends a data stream to the end of an existing file, rather than completely overwriting its current contents. Enter: cat >> file

and type 'there!' followed by Enter and then CTRL-D. Now enter 'cat file' and you will see: % cat file hello there!

By contrast, try the above again but with the second operation also using the single '>' operator. This time, the files contents will only be 'there!'. And note that the following has the same effect as 'cat file' (why?): cat < file

Anyone familiar with C++ programming will recognise this syntax as being similar to the way C++ programs display output. Input and output redirection is used extensively by system shell scripts. Users and administrators can use these operators as a quick and convenient way for managing program input and output. For example, the output from a find command could be redirected into a file for later

examination. I often use 'cat > whatever' as a quick and easy way to create a short file without using an editor. Error messages from programs and commands are also often sent to a different output stream called stderr - by default, stderr is also the relevant display window, or the Console Window if one exists on-screen. The numeric file handles associated with these three text streams are: 0 1 2

- stdin - stdout - stderr

These numbers can be placed before the < and > operators to select a particular stream to deal with. Examples of this are given in the notes on shell script programming (Day 2).

The '&&' combination allows one to chain commands together so that each command is only executed if the preceding command was successful, eg.: run_my_prog_which_takes_hours > results && lp results

In this example, some arbitrary program is executed which is expected to take a long time. The program's output is redirected into a file called results. If and only if the program terminates successfully will the results file be sent to the default printer by the lp program. Note: any error encountered by the program will also have the error message stored in the results file. One common use of the && sequence is for on-the-spot backups: cd /home && tar cv . && eject

This sequence changes directory to the /home area, archives the contents of /home to DAT and ejects the DAT tape once the archive process has completed. Note that the eject command without any arguments will search for a default removable media device, so this example assumes there is only one such device, a DAT drive, attached to the system. Otherwise, one could use 'eject /dev/tape' to be more specific.

The semicolon can also be used to chain commands together, but in a manner which does not require each command to be successful in order for the next command to be executed, eg. one could run two successive find commands, searching for different types of file, like this (try executing this command in the directory /mapleson/public_html/sgi): find . -name "*.gz" -print; find . -name "*.mpg" -print

The output given is: ./origin/techreport/compcon97_dv.pdf.gz

./origin/techreport/origin_chap7.pdf.gz ./origin/techreport/origin_chap6.pdf.gz ./origin/techreport/origin_chap5.pdf.gz ./origin/techreport/origin_chap4.pdf.gz ./origin/techreport/origin_chap3.pdf.gz ./origin/techreport/origin_chap2.pdf.gz ./origin/techreport/origin_chap1.5.pdf.gz ./origin/techreport/origin_chap1.0.pdf.gz ./origin/techreport/compcon_paper.pdf.gz ./origin/techreport/origin_techrep.pdf.tar.gz ./origin/techreport/origin_chap1-7TOC.pdf.gz ./pchall/pchal.ps.gz ./o2/phase/phase6.mpg ./o2/phase/phase7.mpg ./o2/phase/phase4.mpg ./o2/phase/phase5.mpg ./o2/phase/phase2.mpg ./o2/phase/phase3.mpg ./o2/phase/phase1.mpg ./o2/phase/phase8.mpg ./o2/phase/phase9.mpg

If one changes the first find command so that it will give an error, the second find command still executes anyway: % find /tmp/gurps -name "*.gz" -print ; find . -name "*.mpg" -print cannot stat /tmp/gurps No such file or directory ./o2/phase/phase6.mpg ./o2/phase/phase7.mpg ./o2/phase/phase4.mpg ./o2/phase/phase5.mpg ./o2/phase/phase2.mpg ./o2/phase/phase3.mpg ./o2/phase/phase1.mpg ./o2/phase/phase8.mpg ./o2/phase/phase9.mpg

However, if one changes the ; to && and runs the sequence again, this time the second find command will not execute because the first find command produced an error: % find /tmp/gurps -name "*.gz" -print && find . -name "*.mpg" -print cannot stat /tmp/gurps No such file or directory

As a final example, enter the following: find /usr -name "*.htm*" -print & find /usr -name "*.rgb" -print &

This command runs two separate find processes, both in the background at the same time. Unlike the previous examples, the output from each command is displayed first from one, then from the other, and back again in a non-deterministic manner, as and when matching files are located by each process. This is clear evidence that both processes are running at the same time. To shut

down the processes, either use 'killall find' or enter 'fg' followed by the use of CTRL-C twice (or one could use kill with the appropriate process IDs, identifiable using 'ps -ef | grep find'). When writing shell script files, the ; symbol is most useful when one can identify commands which do not depend on each other. This symbol, and the other symbols described here, are heavily used in the numerous shell script files which manage many aspects of any modern UNIX OS. Note: if non-dependent commands are present in a script file or program, this immediately allows one to imagine the idea of a multi-threaded OS, ie. an OS which can run many processes in parallel across multiple processors. A typical example use of such a feature would be batch processing scripts for image processing of medical data, or scripts that manage database systems, financial accounts, etc.

References: 1. HP-UX/SUN Interoperability Cookbook, Version 1.0, Copyright 1994 Hewlett-Packard Co.: 2. http://www.hp-partners.com/ptc_public/techsup/SunInterop/

comp.sys.hp.hpux FAQ, Copyright 1995 by Colin Wynd: http://hpux.csc.liv.ac.uk/hppd/FAQ/

3. Department of Computer Science and Electrical Engineering, Heriot Watt University, Riccarton Campus, Edinburgh, Scotland: 4. 5. http://www.cee.hw.ac.uk/

Day 1: Part 3:

File ownership and access permissions. Online help (man pages, etc.)

UNIX Fundamentals: File Ownership

UNIX has the concept of file 'ownership': every file has a unique owner, specified by a user ID number contained in /etc/passwd. When examining the ownership of a file with the ls command, one always sees the symbolic name for the owner, unless the corresponding ID number does not exist in the local /etc/passwd file and is not available by any system service such as NIS. Every user belongs to a particular group; in the case of the SGI system I run, every user belongs to either the 'staff' or 'students' group (note that a user can belong to more than one group, eg. my network has an extra group called 'projects'). Group names correspond to unique group IDs and are listed in the /etc/group file. When listing details of a file, usually the symbolic group name is shown, as long as the group ID exists in the /etc/group file, or is available via NIS, etc. For example, the command: ls -l /

shows the full details of all files in the root directory. Most of the files and directories are owned by the root user, and belong to the group called 'sys' (for system). An exception is my home account directory /mapleson which is owned by me. Another example command: ls -l /home/staff

shows that every staff member owns their particular home directory. The same applies to students, and to any user which has their own account. The root user owns the root account (ie. the root directory) by default. The existence of user groups offers greater flexibility in how files are managed and the way in which users can share their files with other users. Groups also offer the administrator a logical way of managing distinct types of user, eg. a large company might have several groups: accounts clerical investors management security

The admin decides on the exact names. In reality though, a company might have several internal systems, perhaps in different buildings, each with their own admins and thus possibly different group names.

UNIX Fundamentals: Access Permissions Every file also has a set of file 'permissions'; the file's owner can set these permissions to alter who can read, write or execute the file concerned. The permissions for any file can be examined using the ls command with the -l option, eg.: % ls -l /etc/passwd -rw-r--r-1 root uuugggooo

owner

sys group

1306 Jan 31 17:07

/etc/passwd

size

name

date

mod

Each file has three sets of file access permissions (uuu, ggg, ooo), relating to:   

the files owner, ie. the 'user' field the group which the file's owner belongs to the 'rest of the world' (useful for systems with more than one group)

This discussion refers to the above three fields as 'user', 'group' and 'others'. In the above example, the three sets of permissions are represented by field shown as uuugggooo, ie. the main system password file can be read by any user that has access to the relevant host, but can only be modified by the root user. The first access permission is separate and is shown as a 'd' if the file is a directory, or 'l' if the file is a link to some other file or directory (many examples of this can be found in the root directory and in /etc).

Such a combination of options offers great flexibility, eg. one can have private email (user-only), or one can share documents only amongst one's group (eg. staff could share exam documents, or students could share files concerning a Student Union petition), or one can have files that are accessible by anyone (eg. web pages). The same applies to directories, eg. since a user's home directory is owned by that user, an easy way for a user to prevent anyone else from accessing their home directory is to remove all read and execute permissions for groups and others. File ownership and file access permissions are a fundamental feature of every UNIX file, whether that file is an ordinary file, a directory, or some kind of special device file. As a result, UNIX as an OS has inherent built-in security for every file. This can lead to problems if the wrong permissions are set for a file by mistake, but assuming the correct permissions are in place, a file's security is effectively secured. Note that no non-UNIX operating system for PCs yet offers this fundamental concept of fileownership at the very heart of the OS, a feature that is definitely required for proper security. This is largely why industrial-level companies, military, and government institutions do not use NT systems where security is important. In fact, only Cray's Unicos (UNIX) operating system passes all of the US DoD's security requirements.

Relevant Commands: chown - change file ownership chgrp - change group status of a file chmod

- change access permissions for one or more files

For a user to alter the ownership and/or access permissions of a file, the user must own that file. Without the correct ownership, an error is given, eg. assuming I'm logged on using my ordinary 'mapleson' account: % chown mapleson var var - Operation not permitted % chmod go+w /var chmod() failed on /var: Operation not permitted % chgrp staff /var /var - Operation not permitted

All of these operations are attempting to access files owned by root, so they all fail. Note: the root user can access any file, no matter what ownership or access permissions have been set (unless a file owned by root has had its read permission removed). As a result, most hacking attempts on UNIX systems revolve around trying to gain root privileges. Most ordinary users will rarely use the chown or chgrp commands, but administrators may often use them when creating accounts, installing custom software, writing scripts, etc. For example, an admin might download some software for all users to use, installing it somewhere in /usr/local. The final steps might be to change the ownership of every newly installed file so ensure that it is owned by root, with the group set to sys, and then to use chmod to ensure any newly installed executable programs can be run by all users, and perhaps to restrict access to original source code. Although chown is normally used to change the user ID of a file, and chgrp the group ID, chown can actually do both at once. For example, while acting as root: yoda 1# echo hello > file yoda 2# ls -l file -rw-r--r-1 root sys yoda 3# chgrp staff file yoda 4# chown mapleson file yoda 5# ls -l file -rw-r--r-1 mapleson staff yoda 6# /bin/rm file yoda 7# echo hello > file yoda 8# ls -l file -rw-r--r-1 root sys

6 May

2 21:50 file

6 May

2 21:50 file

6 May

2 21:51 file

yoda 9# chown mapleson.staff file yoda 10# ls -l file -rw-r--r-1 mapleson staff

6 May

2 21:51 file

Figure 18. Using chown to change both user ID and group ID.

Changing File Permissions: Examples. The general syntax of the chmod command is: chmod [-R]

Where defines the new set of access permissions. The -R option is optional (denoted by square brackets []) and can be used to recursively change the permissions for the contents of a directory. can be defined in two ways: using Octal (base-8) numbers or by using a sequence of meaningful symbolic letters. This discussion covers the symbolic method since the numeric method (described in the man page for chmod) is less intuitive to use. I wouldn't recommend an admin use Octal notation until greater familiarity with how chmod works is attained. can be summarised as containing three parts: U operator P

where U is one or more characters corresponding to user, group, or other; operator is +, -, or =, signifying assignment of permissions; and P is one or more characters corresponding to the permission mode. Some typical examples would be: chmod go-r file chmod ugo+rx file chmod ugo=r file

- remove read permission for groups and others - add read/execute permission for all - set permission to read-only for all users

A useful abbreviation in place of 'ugo' is 'a' (for all), eg.: chmod a+rx file chmod a=r file

- give read and execute permission for all - set to read-only for all

For convenience, if the U part is missing, the command automatically acts for all, eg.: chmod -x file chmod =r file

- remove executable access from everyone - set to read-only for everyone

though if a change in write permission is included, said change only affects user, presumably for better security:

chmod +w file chmod +rwx file user chmod -rw file

- add write access only for user - add read/execute for all, add write only for - remove read from all, remove write from user

Note the difference between the +/- operators and the = operator: + and - add or take away from existing permissions, while = sets all the permissions to a particular state, eg. consider a file which has the following permissions as shown by ls -l: -rw-------

The command 'chmod +rx' would change the permissions to: -rwxr-xr-x

while the command 'chmod =rx' would change the permissions to: -r-xr-xr-x

ie. the latter command has removed the write permission from the user field because the rx permissions were set for everyone rather than just added to an existing state. Further examples of possible permissions states can be found in the man page for ls.

A clever use of file ownership and groups can be employed by anyone to 'hand over' ownership of a file to another user, or even to root. For example, suppose user alex arranges with user sam to leave a new version of a project file (eg. a C program called project.c) in the /var/tmp directory of a particular system at a certain time. User alex not only wants sam to be able to read the file, but also to remove it afterwards, eg. move the file to sam's home directory with mv. Thus, alex could perform the following sequence of commands: cp project.c /var/tmp cd /var/tmp chmod go-rwx project.c chown sam project.c

-

copy the file change directory remove all access for everyone else change ownership to sam

Figure 19. Handing over file ownership using chown.

Fig 19 assumes alex and sam are members of the same group, though an extra chgrp command could be used before the chown if this wasn't the case, or a combinational chown command used to perform both changes at once. After the above commands, alex will not be able to read the project.c file, or remove it. Only sam has any kind of access to the file. I once used this technique to show students how they could 'hand-in' project documents to a lecturer in a way which would not allow students to read each others' submitted work.

Note: it can be easy for a user to 'forget' about the existence of hidden files and their associated permissions. For example, someone doing some confidential movie editing might forget or not even know that temporary hidden files are often created for intermediate processing. Thus, confidential tasks should always be performed by users inside a sub-directory in their home directory, rather than just in their home directory on its own. Experienced users make good use of file access permissions to control exactly who can access their files, and even who can change them. Experienced administrators develop a keen eye and can spot when a file has unusual or perhaps unintended permissions, eg.: -rwxrwxrwx

if a user's home directory has permissions like this, it means anybody can read, write and execute files in that directory: this is insecure and was probably not intended by the user concerned. A typical example of setting appropriate access permissions is shown by my home directory: ls -l /mapleson

Only those directories and files that I wish to be readable by anyone have the group and others permissions set to read and execute. Note: to aid security, in order for a user to access a particular directory, the execute permission must be set on for that directory as well as read permission at the appropriate level (user, group, others). Also, only the owner of a file can change the permissions or ownership state for that file (this is why a chown/chgrp sequence must have the chgrp done first, or both at once via a combinational chown).

The Set-UID Flag. This special flag appears as an 's' instead of 'x' in either the user or group fields of a file's permissions, eg.: % ls -l /sbin/su -rwsr-xr-x 1 root

sys

40180 Apr 10 22:12 /sbin/su*

The online book, "IRIX Admin: Backup, Security, and Accounting", states: "When a user runs an executable file that has either of these permissions, the system gives the user the permissions of the owner of the executable file."

An admin might use su to temporarily become root or another user without logging off. Ordinary users may decide to use it to enable colleagues to access their account, but this should be discouraged since using the normal read/write/execute permissions should be sufficient.

Mandatory File Locking. If the 'l' flag is set in a file's group permissions field, then the file will be locked while another user from the same group is accessing the file. For example, file locking allows a user to gather data from multiple users in their own group via a group-writable file (eg. petition, questionnaire, etc.), but blocks simultaneous file-write access by multiple users - this prevents data loss which might otherwise occur via two users writing to a file at the same time with different versions of the file.

UNIX Fundamentals: Online Help From the very early days of UNIX, online help information was available in the form of manual pages, or 'man' pages. These contain an extensive amount of information on system commands, program subroutines, system calls and various general references pages on topics such as file systems, CPU hardware issues, etc. The 'man' command allows one to search the man page database using keywords, but this textbased interface is still somewhat restrictive in that it does not allow one to 'browse' through pages at will and does not offer any kind of direct hyperlinked reference system, although each man pages always includes a 'SEE ALSO' section so that one will know what other man pages are worth consulting. Thus, most modern UNIX systems include the 'xman' command: a GUI interface using X Window displays that allows one to browse through man pages at will and search them via keywords. System man pages are actually divided into sections, a fact which is not at all obvious to a novice user of the man command. By contrast, xman reveals immediately the existence of these different sections, making it much easier to browse through commands. Since xman uses the various X Windows fonts to display information, the displayed text can incorporate special font styling such as italics and bold text to aid clarity. A man page shown in a shell can use bright characters and inverted text, but data shown using xman is much easier to read, except where font spacing is important, eg. enter 'man ascii' in a shell and compare it to the output given by xman (use xman's search option to bring up the man page for ascii). xman doesn't include a genuine hypertext system, but the easy-to-access search option makes it much more convenient to move from one page to another based on the contents of a particular 'SEE ALSO' section. Most UNIX systems also have some form of online book archive. SGIs use the 'Insight' library system which includes a great number of books in electronic form, all written using hypertext techniques. An ordinary user would be expected to begin their learning process by using the online books rather than the man pages since the key introductory books guide the user through the basics of using the system via the GUI interface rather than the shell interface.

SGIs also have online release notes for each installed software product. These can be accessed via the command 'grelnotes' which gives a GUI interface to the release notes archive, or one can use relnotes in a shell or terminal window. Other UNIX variants probably also have a similar information resource. Many newer software products also install local web pages as a means of providing online information, as do 3rd-party software distributions. Such web pages are usually installed somewhere in /usr/local, eg. /usr/local/doc. The URL format 'file:/file-path' is used to access such pages, though an admin can install file links with the ln command so that online pages outside of the normal file system web area (/var/www/htdocs on SGIs) are still accessible using a normal http format URL. In recent years, there have been moves to incorporate web technologies into UNIX GUI systems. SGI began their changes in 1996 (a year before anyone else) with the release of the O2 workstation. IRIX 6.3 (used only with O2) included various GUI features to allow easy integration between the existing GUI and various web features, eg. direct iconic links to web sites, and using Netscape browser window interface technologies for system administration, online information access, etc. Most UNIX variants will likely have similar features; on SGIs with the latest OS version (IRIX 6.5), the relevant system service is called InfoSearch - for the first time, users have a single entry point to the entire online information structure, covering man pages, online books and release notes. Also, extra GUI information tools are available for consulting "Quick Answers" and "Hints and Shortcuts". These changes are all part of a general drive on UNIX systems to make them easier to use. Unlike the xman resource, viewing man pages using InfoSearch does indeed hyperlink references to other commands and resources throughout each man page. This again enhances the ability of an administrator, user or application developer to locate relevant information.

Summary: UNIX systems have a great deal of online information. As the numerous UNIX variants have developed, vendors have attempted to improve the way in which users can access that information, ultimately resulting in highly evolved GUI-based tools that employ standard windowing technologies such as those offered by Netscape (so that references may include direct links to web sites, ftp sites, etc.), along with hypertext techniques and search mechanisms. Knowing how to make the best use of available documentation tools can often be the key to effective administration, ie. locating answers quickly as and when required.

Detailed Notes for Day 2 (Part 1) UNIX Fundamentals: System Identity, IP Address, Domain Name, Subdomain.

Every UNIX system has its own unique name, which is the means by which that machine is referenced on local networks and beyond, eg. the Internet. The normal term for this name is the local 'host' name. Systems connected to the Internet employ naming structures that conform to existing structures already used on the Internet. A completely isolated network can use any naming scheme. Under IRIX, the host name for a system is stored in the /etc/sys_id file. The name may be up to 64 alphanumeric characters in length and can include hyphens and periods. Period characters '.' are not part of the real name but instead are used to separate the sequence into a domain-name style structure (eg. www.futuretech.vuurwerk.nl). The SGI server's host name is yoda, the fullyqualified version of which is written as yoda.comp.uclan.ac.uk. The choice of host names is largely arbitrary, eg. the SGI network host names are drawn from my video library (I have chosen names designed to be short without being too uninteresting). On bootup, a system's /etc/rc2.d/S20sysetup script reads its /etc/sys_id file to determine the local host name. From then onwards, various system commands and internal function calls will return that system name, eg. the 'hostname' and 'uname' commands (see the respective man pages for details). Along with a unique identity in the form of a host name, a UNIX system has its own 32bit Internet Protocol (IP) address, split for convenience into four 8bit integers separated by periods, eg. yoda's IP address is 193.61.250.34, an address which is visible to any system anywhere on the Internet. IP is the network-level communications protocol used by Internet systems and services. Various extra options can be used with IP layer communications to create higher-level services such as TCP (Transmission Control Protocol). The entire Internet uses the TCP/IP protocols for communication. A system which has more than one network interface (eg. multiple Ethernet ports) must have a unique IP address for each port. Special software may permit a system to have extra addresses, eg. 'IP Aliasing', a technique often used by an ISP to provide a more flexible service to its customers. Note: unlike predefined Ethernet addresses (every Ethernet card has its own unique address), a system's IP address is determined by the network design, admin personnel, and external authorities. Conceptually speaking, an IP address consists of two numbers: one represents the network while the other represents the system. In order to more efficiently make use of the numerous possible address 'spaces', four classes of addresses exist, named A, B, C and D. The first few bits of an address determines its class:

Class A B C D use]

Initial Binary Bit Field 0 10 110 1110

No. of Bits for the Network Number

No. of Bits for The Host Number

7 24 14 16 21 8 [special 'multicast' addresses for internal network

Figure 20. IP Address Classes: bit field and width allocations.

This system allows the Internet to support a range of different network sizes with differing maximum limits on the number of systems for each type of network: Class A No. of networks: No. of systems each:

128 16777214

Class B

Class C

Class D

16384 65534

2097152 254

[multicast] [multicast]

Figure 21. IP Address Classes: supported network types and sizes.

The numbers 0 and 255 are never used for any host. These are reserved for special uses. Note that a network which will never be connected to the Internet can theoretically use any IP address and domain/subdomain configuration.

Which class of network an organisation uses depends on how many systems it expects to have within its network. Organisations are allocated IP address spaces by Internet Network Information Centers (InterNICs), or by their local ISP if that is how they are connected to the Internet. An organisation's domain name (eg. uclan.ac.uk) is also obtained from the local InterNIC or ISP. Once a domain name has been allocated, the organisation is free to setup its own network subdomains such as comp.uclan.ac.uk (comp = Computing Department), within which an individual host would be yoda.comp.uclan.ac.uk. A similar example is Heriot Watt University in Edinburgh (where I studied for my BSc) which has the domain hw.ac.uk, with its Department of Computer Science and Electrical Engineering using a subdomain called cee.hw.ac.uk, such that a particular host is www.cee.hw.ac.uk (see Appendix A for an example of what happens when this methodology is not followed correctly). UCLAN uses Class C addresses, with example address spaces being 193.61.255 and 193.61.250. A small number of machines in the Computing Department use the 250 address space, namely the SGI server's external Ethernet port at 193.61.250.34, and the NT server at 193.61.250.35 which serves the NT network in Ve27. Yoda has two Ethernet ports; the remaining port is used to connect to the SGI Indys via a hub this port has been defined to use a different address space, namely 193.61.252. The machines' IP addresses range from 193.61.252.1 for yoda, to 193.61.252.23 for the admin Indy; .20 to .22 are kept available for two HP systems which are occasionally connected to the network, and for a future plan to include Apple Macs on the network. The IP addresses of the Indys using the 252 address space cannot be directly accessed outside the SGI network or, as the jargon goes, 'on the other side' of the server's Ethernet port which is being used for the internal network. This automatically imposes a degree of security at the physical level. IP addresses and host names for systems on the local network are brought together in the file /etc/hosts. Each line in this file gives an IP address, an official hostname and then any name aliases which represent the same system, eg. yoda.comp.uclan.ac.uk is also known as www.comp.uclan.ac.uk, or just yoda, or www, etc. When a system is first booted, the ifconfig command uses the /etc/hosts file to assign addresses to the various available Ethernet network interfaces. Enter 'more /etc/hosts' or 'nedit /etc/hosts' to examine the host names file for the particular system you're using. NB: due to the Internet's incredible expansion in recent years, the world is actually beginning to run out of available IP addresses and domain names; at best, existing top-level domains are being heavily overused (eg. .com, .org, etc.) and the number of allocatable network address spaces is rapidly diminishing, especially if one considers the possible expansion of the Internet into Russia, China, the Far East, Middle East, Africa, Asia and Latin America. Thus, there are moves afoot to change the Internet so that it uses 128bit instead of 32bit IP addresses. When this will happen is unknown, but such a change would solve the problem.

Special IP Addresses Certain reserved IP addresses have special meanings, eg. the address 127.0.0.1 is known as the 'loopback' address (equivalent host name 'localhost') and always refers to the local system which one happens to be using at the time. If one never intends to connect a system to the Internet, there's no reason why this default IP address can't be left as it is with whatever default name assigned to it in the /etc/hosts file (SGIs always use the default name, "IRIS"), though most people do change their system's IP address and host name in case, for example, they have to connect their system to the network used at their place of work, or to provide a common naming scheme, group ID setup, etc. If a system's IP address is changed from the default 127.0.0.1, the exact procedure is to add a new line to the /etc/hosts file such that the system name corresponds to the information in /etc/sys_id. One must never remove the 127.0.0.1 entry from the /etc/hosts file or the system will not work properly. The important lines of the /etc/hosts file used on the SGI network are shown in Fig 22 below (the appearance of '[etc]' in Fig 22 means some text has been clipped away to aid clarity). # This entry must be present or the system will not work. 127.0.0.1 localhost # SGI Server. Challenge S. 193.61.252.1 yoda.comp.uclan.ac.uk yoda www.comp.uclan.ac.uk www [etc] # Computing Services router box link. 193.61.250.34 gate-yoda.comp.uclan.ac.uk gate-yoda # SGI Indys in Ve24, except milamber which is in Ve47. 193.61.252.2 193.61.252.3 193.61.252.4 193.61.252.5 193.61.252.6 193.61.252.7 193.61.252.8 193.61.252.9 193.61.252.10 193.61.252.11 193.61.252.12 193.61.252.13 193.61.252.14 193.61.252.15 193.61.252.16 193.61.252.17 193.61.252.18 193.61.252.19 193.61.252.23

akira.comp.uclan.ac.uk akira ash.comp.uclan.ac.uk ash cameron.comp.uclan.ac.uk cameron chan.comp.uclan.ac.uk chan conan.comp.uclan.ac.uk conan gibson.comp.uclan.ac.uk gibson indiana.comp.uclan.ac.uk indiana leon.comp.uclan.ac.uk leon merlin.comp.uclan.ac.uk merlin nikita.comp.uclan.ac.uk nikita ridley.comp.uclan.ac.uk ridley sevrin.comp.uclan.ac.uk sevrin solo.comp.uclan.ac.uk solo spock.comp.uclan.ac.uk spock stanley.comp.uclan.ac.uk stanley warlock.comp.uclan.ac.uk warlock wolfen.comp.uclan.ac.uk wolfen woo.comp.uclan.ac.uk woo milamber.comp.uclan.ac.uk milamber

[etc] Figure 22. The contents of the /etc/hosts file used on the SGI network.

One example use of the localhost address is when a user accesses a system's local web page structure at: http://localhost/

On SGIs, such an address brings up a page about the machine the user is using. For the SGI network, the above URL always brings up a page for yoda since /var/www is NFS-mounted from yoda. The concept of a local web page structure for each machine is more relevant in company Intranet environments where each employee probably has her or his own machine, or where different machines have different locally stored web page information structures due to, for example, differences in available applications, etc.

The BIND Name Server (DNS). If a site is to be connected to the Internet, then it should use a name server such as BIND (Berkeley Internet Name Domain) to provide an Internet Domain Names Service (DNS). DNS is an Internet-standard name service for translating hostnames into IP addresses and vice-versa. A client machine wishing to access a remote host executes a query which is answered by the DNS daemon, called 'named'. Yoda runs a DNS server and also a Proxy server, allowing the machines in Ve24 to access the Internet via Netscape (telnet, ftp, http, gopher and other services can be used). Most of the relevant database configuration files for a DNS setup reside in /var/named. A set of example configuration files are provided in /var/named/Examples - these should be used as templates and modified to reflect the desired configuration. Setting up a DNS database can be a little confusing at first, thus the provision of the Examples directory. The files which must be configured to provide a functional DNS are: /etc/named.boot /var/named/root.cache /var/named/named.hosts /var/named/named.rev /var/named/localhost.rev

If an admin wishes to use a configuration file other than /etc/named.boot, then its location should be specified by creating a file called /etc/config/named.options with the following contents (or added to named.options if it already exists): -b some-other-boot-file

After the files in /var/named have been correctly configured, the chkconfig command is used to set the appropriate variable file in /etc/config: chkconfig named on

The next reboot will activate the DNS service. Once started, named reads initial configuration information from the file /etc/named.boot, such as what kind of server it should be, where the DNS database files are located, etc. Yoda's named.boot file looks like this: ; ; Named boot file for yoda.comp.uclan.ac.uk. ; directory /var/named cache primary primary primary primary forwarders

. comp.uclan.ac.uk 0.0.127.IN-ADDR.ARPA 252.61.193.IN-ADDR.ARPA 250.61.193.IN-ADDR.ARPA 193.61.255.3 193.61.255.4

root.cache named.hosts localhost.rev named.rev 250.rev

Figure 23. Yoda's /etc/named.boot file.

Looking at the contents of the example named.boot file in /var/named/Examples, the differences are not that great: ; ; boot file for authoritative master name server for Berkeley.EDU ; Note that there should be one primary entry for each SOA record. ; ; sortlist 10.0.0.0 directory

/var/named

; type file

domain

source host/file

cache primary primary primary

. Berkeley.EDU 32.128.IN-ADDR.ARPA 0.0.127.IN-ADDR.ARPA

root.cache named.hosts named.rev localhost.rev

backup

Figure 24. The example named.boot file in /var/named/Examples.

Yoda's file has an extra line for the /var/named/250.rev file; this was an experimental attempt to make Yoda's subdomain accessible outside UCLAN, which failed because of the particular configuration of a router box elsewhere in the communications chain (the intention was to enable students and staff to access the SGI network using telnet from a remote host). For full details on how to configure a typical DNS, see Chapter 6 of the online book, "IRIX Admin: Networking and Mail". A copy of this Chapter has been provided for reference. As an example of how identical DNS is across UNIX systems, see the issue of Network Week [10] which has an article on configuring a typical DNS. Also, a copy of each of Yoda’s DNS files which I had to configure is included for reference. Together, these references should serve as an

adequate guide to configuring a DNS; as with many aspects of managing a UNIX system, learning how someone else solved a problem and then modifying copies of what they did can be very effective. Note: it is not always wise to use a GUI tool for configuring a service such as BIND [11]. It's too easy for ill-tested grandiose software management tools to make poor assumptions about how an admin wishes to configure a service/network/system. Services such as BIND come with their own example configuration files anyway; following these files as a guide may be considerably easier than using a GUI tool which itself can cause problems created by whoever wrote the GUI tool, rather than the service itself (in this case BIND).

Proxy Servers A Proxy server acts as a go-between to the outside world, answering client requests for data from the Internet, calling the DNS system to obtain IP addresses based on domain names, opening connections to the Internet perhaps via yet another Proxy server elsewhere (the Ve24 system uses Pipex as the next link in the communications chain), and retrieving data from remote hosts for transmission back to clients. Proxy servers are a useful way of providing Internet access to client systems at the same time as imposing a level of security against the outside world, ie. the internal structure of a network is hidden from the outside world due to the operational methods employed by a Proxy server, rather like the way in which a representative at an auction can act for an anonymous client via a mobile phone during the bidding. Although there are more than a dozen systems in Ve24, no matter which machine a user decides to access the Internet from, the access will always appear to a remote host to be coming from the IP address of the closest proxy server, eg. the University web server would see Yoda as the accessing client. Similarly, I have noticed that when I access my own web site in Holland, the site concerned sees my access as if it had come from the proxy server at Pipex, ie. the Dutch system cannot see 'past' the Pipex Proxy server. There are various proxy server software solutions available. A typical package which is easy to install and configure is the Netscape Proxy Server. Yoda uses this particular system.

Network Information Service (NIS) It is reasonably easy to ensure that all systems on a small network have consistent /etc/hosts files using commands such as rcp. However, medium-sized networks consisting of dozens to hundreds of machines may present problems for administrators, especially if the overall setup consists of several distinct networks, perhaps in different buildings and run by different people. For such environments, a Network Information Service (NIS) can be useful. NIS uses a single system on the network to act as the sole trusted source of name service information - this system is known as the NIS master. Slave servers may be used to which copies of the database on the NIS master are periodically sent, providing backup services should the NIS master system fail.

Client systems locate a name server when required, requesting data based on a domain name and other relevant information.

Unified Name Service Daemon (UNS, or more commonly NSD). Extremely recently, the DNS and NIS systems have been superseded by a new system called the Unified Name Service Daemon, or NSD for short. NSD handles requests for domain information in a considerably more efficient manner, involving fewer system calls, replacing multiple files for older services with a single file (eg. many of the DNS files in /var/named are replaced by a single database file under NSD), and allowing for much larger numbers of entries in data files, etc. However, NSD is so new that even I have not yet had an opportunity to examine properly how it works, or the way in which it correlates to the older DNS and NIS services. As a result, this course does not describe DNS, NIS or NSD in any great detail. This is because, given the rapid advance of modern UNIX OSs, explaining the workings of DNS or NIS would likely be a pointless task since any admin beginning her or his career now is more likely to encounter the newer NSD system which I am not yet comfortable with. Nevertheless, administrators should be aware of the older style services as they may have to deal with them, especially on legacy systems. Thus, though not discussed in these lectures, some notes on a typical DNS setup are provided for further reading [10]. Feel free to login to the SGI server yourself with: rlogin yoda

and examine the DNS and NIS configuration files at your leisure; these may be found in the /var/named and /var/yp directories. Consult the online administration books for further details.

UNIX Fundamentals: UNIX Software Features Software found on UNIX systems can be classified into several types:     

System software: items provided by the vendor as standard. Commercial software: items purchased either from the same vendor which supplied the OS, or from some other commercial 3rd-party. Shareware software: items either supplied with the OS, or downloaded from the Internet, or obtained from some other source such as a cover magazine CD. Freeware software: items supplied in the same manner as Shareware, but using a more open 'conditions of use'. User software: items created by users of a system, whether that user is an admin or an ordinary user.

System Software

Any OS for any system today is normally supplied on a set of CDs. As the amount of data for an OS installation increases, perhaps the day is not far away when vendors will begin using DVDs instead. Whether or not an original copy of OS CDs can be installed on a system depends very much on the particular vendor, OS and system concerned. Any version of IRIX can be installed on an SGI system which supports that particular version of IRIX - this ability to install the OS whether or not one has a legal right to use the software is simply a practice SGI has adopted over the years. SGI could have chosen to make OS installation more difficult by requiring license codes and other details at installation time, but instead SGI chose a different route. What is described here applies only to SGI's IRIX OS. SGI decided some time ago to adopt a strategy of official software and hardware management which makes it extremely difficult to make use of 'pirated' software. The means by which this is achieved is explained in the System Hardware section below, but the end result is a policy where any version IRIX older than the 'current' version is free by default. Thus, since the current release of IRIX is 6.5, one could install IRIX 6.4, 6.3, 6.2 (or any older version) on any appropriate SGI system (eg. installing IRIX 6.2 on a 2nd-hand Indy) without having to worry about legal issues. There's nothing to stop one physically installing 6.5 if one had the appropriate CDs (ie. the software installation tools and CDs do not include any form of installation protection or copy protection), but other factors might make for trouble later on if the user concerned did not apply for a license at a later date, eg. attempting to purchase commercial software and licenses for the latest OS release. It is highly likely that in future years, UNIX vendors will also make their current OSs completely free, probably as a means of combating WindowsNT and other rivals. As an educational site operating under an educational license agreement, UCLAN's Computing Department is entitled to install IRIX 6.5 on any of the SGI systems owned by the Computing Department, though at present most systems use the older IRIX 6.2 release for reasons connected with system resources on each machine (RAM, disk space, CPU power). Thus, the idea of a license can have two meanings for SGIs: 



A theoretical 'legal' license requirement which applies, for example, to the current release of IRIX, namely IRIX 6.5 - this is a legal matter and doesn't physically affect the use of IRIX 6.5 OS CDs. A real license requirement for particular items of software using license codes, obtainable either from SGI or from whatever 3rd-party the software in question was purchased.

Another example of the first type is the GNU licensing system, explained in the 'Freeware Software' section below (what the GNU license is and how it works is fascinatingly unique).

Due to a very early top-down approach to managing system software, IRIX employs a high-level software installation structure which ensures that:  



It is extremely easy to add, remove, or update software, especially using the GUI software tool called Software Manager (swmgr is the text command name which can be entered in a shell). Changes to system software are handled correctly with very few, if any, errors most of the time; 'most' could be defined as 'rarely, if ever, but not never'. A real world example might be to state that I have installed SGI software elements thousands of times and rarely if ever encountered problems, though I have had to deal with some issues on occasion. Software 'patches' (modificational updates to existing software already installed) are handled in such a way as to allow the later removal of said patches if desired, leaving the system in exactly its original state as if the patch had never been installed.

As an example of software installation reliability, my own 2nd-hand Indigo2 at home has been in use since March 1998, was originally installed with IRIX 6.2, updated with patches several times, added to with extra software over the first few months of ownership (mid-1998), then upgraded to IRIX 6.5, added to with large amounts of freeware software, then upgraded to IRIX 6.5.1, then 6.5.2, then 6.5.3, and all without a single software installation error of any kind. In fact, my Indigo2 hasn't crashed or given a single error since I first purchased it. As is typical of any UNIX system which is/was widely used in various industries, most if not all of the problems ever encountered on the Indigo2 system have been resolved by now, producing an incredibly stable platform. In general, the newer the system and/or the newer the software, then the greater number of problems there will be to deal with, at least initially.

Thankfully, OS revisions largely build upon existing code and knowledge. Plus, since so many UNIX vendors have military, government and other important customers, there is incredible pressure to be very careful when planning changes to system or application software. Intensive testing is done before any new version is released into the marketplace (this contrasts completely with Microsoft which deliberately allows the public to test Beta versions of its OS revisions as a means of locating bugs before final release - a very lazy way to handle system testing by any measure). Because patches often deal with release versions of software subsystems, and many software subsystems may have dependencies on other subsystems, the issue of patch installation is the most common area which can cause problems, usually due to unforseen conflicts between individual versions of specific files. However, rigorous testing and a top-down approach to tracking release versions minimises such problems, especially since all UNIX systems come supplied with source code version/revision tracking tools as-standard, eg. SCCS. The latest 'patch CD' can usually be installed automatically without causing any problems, though it is wise for an administrator to check what changes are going to be made before commencing any such installation, just in case. The key to such a high-level software management system is the concept of a software 'subsystem'. SGI has developed a standard means by which a software suite and related files (manual pages, release notes, data, help documents, etc.) are packaged together in a form suitable for installation by the usual software installation tools such as inst and swmgr. Once this mechanism was carefully defined many years ago, insisting that all subsequent official software

releases comply with the same standard ensures that the opportunity for error is greatly minimised, if not eliminated. Sometimes, certain 3rd-party applications such as Netscape can display apparent errors upon installation or update, but these errors are usually explained in accompanying documentation and can always be ignored. Each software subsystem is usually split into several sub-units so that only relevant components need be installed as desired. The sub-units can then be examined to see the individual files which would be installed, and where. When making updates to software subsystems, selecting a newer version of a subsystem automatically selects only the relevant sub-units based on which subunits have already been installed, ie. new items will not automatically be selected. For ease of use, an admin can always choose to execute an automatic installation or removal (as desired), though I often select a custom installation just so that I can see what's going on and learn more about the system as a result. In practice, I rarely need to alter the default behaviour anyway. The software installation tools automatically take care not to overwrite existing configuration files when, for example, installing new versions (ie. upgrades) of software subsystems which have already been installed (eg. Netscape). In such cases, both the old and new configuration files are kept and the user (or admin) informed that there may be a need to decide which of the two files to keep, or perhaps to copy key data from the old file to the new file, deleting the old file afterwards.

Commercial Software A 3rd-party commercial software package may or may not come supplied in a form which complies with any standards normally used by the hardware system vendor. UNIX has a long history of providing a generic means of packaging software and files in an archive which can be downloaded, uncompressed, dearchived, compiled and installed automatically, namely the 'tar.gz' archive format (see the man pages for tar and gzip). Many commercial software suppliers may decide to sell software in this format. This is ok, but it does mean one may not be able to use the usual software management tools (inst/swmgr in the case of SGIs) to later remove the software if desired. One would have to rely on the supplier being kind enough to either provide a script which can be used to remove the software, or at the very least a list of which files get installed where. Thankfully, it is likely that most 3rd-parties will at least try to use the appropriate distribution format for a particular vendor's OS. However, unlike the source vendor, one cannot be sure that the 3rd-party has taken the same degree of care and attention to ensure they have used the distribution format correctly, eg. checking for conflicts with other software subsystems, providing product release notes, etc. Commercial software for SGIs may or may not use the particular hardware feature of SGIs which SGI uses to prevent piracy, perhaps because exactly how it works is probably itself a licensed product from SGI. Details of this mechanism are given in the System Hardware section below.

Shareware Software The concept of shareware is simple: release a product containing many useful features, but which has more advanced features and perhaps essential features limited, restricted, or locked out entirely, eg. being able to save files, or working on files over a particular size. A user can download the shareware version of the software for free. They can test out the software and, if they like it, 'register' the software in order to obtain either the 'full' (ie. complete) version, or some kind of encrypted key or license code that will unlock the remaining features not accessible or present in the shareware version. Registration usually involves sending a small fee, eg. $30, to the author or company which created the software. Commonly, registration results in the author(s) sending the user proper printed and bound documentation, plus regular updates to the registered version, news releases on new features, access to dedicated mailing lists, etc. The concept of shareware has changed over the years, partly due to the influence of the computer game 'Doom' which, although released as shareware in name, actually effectively gave away an entire third of the complete game for free. This was a ground-breaking move which proved to be an enormous success, earning the company which made the game (id Software, Dallas, Texas, USA) over eight million $US and a great deal of respect and loyalty from gaming fans. Never before had a company released shareware software in a form which did not involve deliberately 'restricting' key aspects of the shareware version. As stated above, shareware software is often altered so that, for example, one could load files, work on them, make changes, test out a range of features, but (crucially) not save the results. Such shareware software is effectively not of any practical use on its own, ie. it serves only as a kind of hands-on advertisement for the full version. Doom was not like this at all. One could play an entire third of the game, including over a network against other players. Today, other creative software designers have adopted a similar approach, perhaps the most famous recent example of which is 'Blender' [1], a free 3D rendering and animation program for UNIX and (as of very soon) WindowsNT systems. In its as-supplied form, Blender can be used to do a great deal of work, creating 3D scenes, renderings and animations easily on a par with 3D Studio Max, even though some features in Blender are indeed locked out in the shareware version. However, unlike conventional traditional concepts, Blender does allow one to save files and so can be used for useful work. It has spread very rapidly in the last few months amongst students in educational sites worldwide, proving to be of particular interest to artists and animators who almost certainly could not normally afford a commercial package which might cost hundreds or perhaps thousands of pounds. Even small companies have begun using Blender. However, supplied documentation for Blender is limited. As a 'professional level' system, it is unrealistic to expect to be able to get the best out of it without much more information on how it works and how to use it. Thus, the creators of Blender, a company called NaN based in Holland, makes most of their revenue by offering a very detailed 350 page printed and bound manual for

about $50 US, plus a sequence of software keys which make available the advanced features in Blender. Software distribution concepts such as the above methods used by NaN didn't exist just a few years ago, eg. before 1990. The rise of the Internet, certain games such as Doom, the birth of Linux, and changes in the way various UNIX vendors manage their business have caused a quantum leap in what people think of as shareware. Note that the same caveat stated earlier with respect to software quality also applies to shareware, and to freeware too, ie. such software may or may not use the normal distribution method associated with a particular UNIX platform - in the case of SGIs, the 'inst' format. Another famous example of shareware is the XV [2] image-viewer program, which offers a variety of functions for image editing and image processing (even though its author insists it's really just an image viewer). XV does not have restricted features, but it is an official shareware product which one is supposed to register if one intends to use the program for commercial purposes. However, as is typical with many modern shareware programs, the author stipulates that there is no charge for personal (non-commercial) or educational use.

Freeware Software Unlike shareware software, freeware software is exactly that: completely free. There is no concept of registration, restricted features, etc. at all. Until recently, even I was not aware of the vast amount of free software available for SGIs and UNIX systems in general. There always has been free software for UNIX systems, but as in keeping with other changes by UNIX vendors over the past few years, SGI altered its application development support policy in 1997 to make it much easier for users to make use of freeware on SGI systems. Prior to that time, SGI did not make the system 'header' files (normally kept in /usr/include) publicly available. Without these header files, one could not compile any new programs even if one had a free compiler. So, SGI adopted a new stance whereby the header files, libraries, example source code and other resources are provided free, but its own advanced compiler technologies (the MIPS Pro Compilers) remain commercial products. Immediately, anyone could then write their own applications for SGI systems using the supplied CDs (copies of which are available from SGI's ftp site) in conjunction with free compilation tools such as the GNU compilers. As a result, the 2nd-hand market for SGI systems in the USA has skyrocketed, with extremely good systems available at very low cost (systems which cost 37500 pounds new can now be bought for as little as 500 pounds, even though they can still be better than modern PCs in many respects). It is highly likely that other vendors have adopted similar strategies in recent years (most of my knowledge concerns SGIs). Sun Microsystems made its SunOS free for students some years ago (perhaps Solaris too); my guess is that a similar compiler/development situation applies to systems using SunOS and Solaris as well - one can write applications using free software and

tools. This concept probably also applies to HP systems, Digital UNIX systems, and other flavours of UNIX. Linux is a perfect example of how the ideas of freeware development can determine an OS' future direction. Linux was meant to be a free OS from its very inception - Linus Torvalds, its creator, loathes the idea of an OS supplier charging for the very platform upon which essential software is executed. Although Linux is receiving considerable industry support these days, Linus is wary of the possibility of Linux becoming more commercial, especially as vendors such as Red Hat and Caldera offer versions of Linux with added features which must be paid for. Whether or not the Linux development community can counter these commercial pressures in order to retain some degree of freeware status and control remains to be seen. Note: I'm not sure of the degree to which completely free development environments on a quality-par with GNU are available for MS Windows-based systems (whether that involves Win95, Win98, WinNT or even older versions such as Win3.1).

The GNU Licensing System The GNU system is, without doubt, thoroughly unique in the modern era of copyright, trademarks, law suits and court battles. It can be easily summarised as a vast collection of free software tools, but the detail reveals a much deeper philosophy of software development, best explained by the following extract from the main GNU license file that accompanies any GNUbased program [3]: "The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.

We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all."

Reading the above extract, it is clear that those responsible for the GNU licensing system had to spend a considerable amount of time actually working out how to make something free! Free in a legal sense that is. So many standard legal matters are designed to restrict activities, the work put into the GNU Free Software Foundation makes the license document read like some kind of software engineer's nirvana. It's a serious issue though, and the existence of GNU is very important in terms of the unimaginable amount of creative work going on around the world which would not otherwise exist (without GNU, Linux would probably not exist). SGI, and other UNIX vendors I expect, ships its latest OS (IRIX 6.5) with a CD entitled 'Freeware', which not only contains a vast number of freeware programs in general (everything from spreadsheets and data plotting to games, audio/midi programming and molecular modeling), but also a complete, pre-compiled inst-format distribution of the entire GNU archive: compilers, debugging tools, GNU versions of shells and associated utilities, calculators, enhanced versions of UNIX commands and tools, even higher-level tools such as a GUI-based file manager and shell tool, and an absolutely superb Photoshop-style image editing tool called GIMP [4] (GNU Image Manipulation Program) which is extendable by the user. The individual software subsystems from the Freeware CD can also be downloaded in precompiled form from SGI's web site [5]. The February 1999 edition of SGI's Freeware CD contains 173 different software subsystems, 29 of which are based on the GNU licensing system (many others are likely available from elsewhere on the Internet, along with further freeware items). A printed copy of the contents of the Feb99 Freeware CD is included with the course notes for further reading. Other important freeware programs which are supplied separately from such freeware CD distributions (an author may wish to distribute just from a web site), include the Blue Moon Rendering Tools (BMRT) [6], a suite of advanced 3D ray-tracing and radiosity tools written by one of the chief architects at Pixar animation studios - the company which created "Toy Story", "Small Soldiers" and "A Bug's Life". Blender can output files in Inventor format, which can then be converted to RIB format for use by BRMT. So why is shareware and freeware important? Well, these types of software matter because, today, it is perfectly possible for a business to operate using only shareware and/or freeware software. An increasingly common situation one comes across is an entrepreneurial multimedia firm using Blender, XV, GIMP, BMRT and various GNU tools to manage its entire business,

often running on 2nd-hand equipment using free versions of UNIX such as Linux, SunOS or IRIX 6.2! I know of one such company in the USA which uses decade-old 8-CPU SGI servers and old SGI workstations such as Crimson RealityEngine and IRIS Indigo. The hardware was acquired 2nd-hand in less than a year. Whether or not a company decides to use shareware or freeware software depends on many factors, especially the degree to which a company feels it must have proper, official support. Some sectors such as government, medical and military have no choice: they must have proper, fully guaranteeable hardware and software support because of the nature of the work they do, so using shareware or freeware software is almost certainly out of the question. However, for medium-sized or smaller companies, and especially home users or students, the existence of shareware and freeware software, combined with the modern approaches to these forms of software by today's UNIX vendors, offers whole new avenues of application development and business ideas which have never existed before as commercially viable possibilities.

System Hardware The hardware platforms supplied by the various UNIX vendors are, like UNIX itself today, also designed and managed with a top-down approach. The world of PCs has always been a bottom-up process of putting together a mish-mash of different components from a wide variety of sources. Motherboards, video cards, graphics cards and other components are available in a plethora of types of varying degrees of quality. This bottom-up approach to systems design means it's perfectly possible to have a PC with a good CPU, good graphics card, good video card, but an awful motherboard. If the hardware is suspect, problems faced by the user may appear to be OS-related when in fact they could be down to poor quality hardware. It's often difficult or impossible to ascertain the real cause of a problem sometimes system components just don't work even though they should, or a system suddenly stops recognising the presence of a device; these problems are most common with peripherals such as CDROM, DVD, ZIP, sound cards, etc. Dealing only with hardware systems designed specifically to run a particular vendor's UNIX variant, the situation is very different. The vendor maintains a high degree of control over the design of the hardware platform. Hence, there is opportunity to focus on the unique requirements of target markets, quality, reliability, etc. rather than always focusing on absolute minimum cost which inevitably means cutting corners and making tradeoffs. This is one reason why even very old UNIX systems, eg. multi-processor systems from 1991 with (say) eight 33MHz CPUs, are still often found in perfect working order. The initial focus on quality results in a much lower risk of component failure. Combined with generous hardware and software support policies, hardware platforms for traditional UNIX systems are far more reliable than PCs. My personal experience is with hardware systems designed by SGI, about which I know a great deal. Their philosophy of design is typical of most UNIX hardware vendors (others would be

Sun, HP, IBM, DEC, etc.) and can be contrasted very easily with the way PCs are designed and constructed: UNIX low-end: "What can we give the customer for 5000?" mid-range: "What can we give the customer for 15000?" high-end: "What can we give the customer for 65000+?" PC: "How cheap can we make a machine which offers a particular feature set and level of ability?"

Since the real driving force behind PC development is the home market, especially games, the philosophy has always been to decide what features a typical 'home' or 'office' PC ought to have and then try and design the cheapest possible system to offer those features. This approach has eventually led to incredibly cut-throat competition, creating new concepts such as the 'sub-$1000' PC, and even today's distinctly dubious 'free PC', but in reality the price paid by consumers is the use of poor quality components which do not integrate well, especially components from different suppliers. Hardware problems in PCs are common, and now unavoidable. In Edinburgh, I know of a high-street PC store which always has a long queue of customers waiting to have their particular problem dealt with. By contrast, most traditional UNIX vendors design their own systems with a top-down approach which focuses on quality. Since the vendor usually has complete control, they can ensure a much greater coherence of design and degree of integration. System components work well with each other because all parts of the system were designed with all the other parts in mind. Another important factor is that a top-down approach allows vendors to innovate and develop new architectural designs, creating fundamentally new hardware techniques such as SMP and S2MP processing, highly scalable systems, advanced graphics architectures, and perhaps most importantly of all from a customer's point of view: much more advanced CPU designs (Alpha, MIPS, SPARC, PA-RISC, POWER series, etc.) Such innovations and changes in design concept are impossible in the mainstream PC market: there is too much to lose by shifting from the status-quo. Everything follows the lowest common denominator. The most obvious indication of these two different approaches is that UNIX hardware platforms have always been more expensive than PCs, but that is something which should be expected given that most UNIX platforms are deliberately designed to offer a much greater feature set, better quality components, better integration, etc. A good example is the SGI Indy. With respect to absolute cost, the Indy was very expensive when it was first released in 1993, but because of what it offered in terms of hardware and software features it was actually a very cheap system compared to trying to put together a PC with a similar feature set. In fact, Indy offered features such as hardware-accelerated 3D graphics at high resolution (1280x1024) and 24bit colour at a time when such features did not exist at all for PCs. PCW magazine said in its original review [7] that to give a PC the same standard features and abilities, such as ISDN, 4-channel 16bit stereo sound with multiple stereo I/O sockets, S-

Video/Composite/Digital video inputs, NTSC-resolution CCD digital camera, integrated SCSI, etc. would have cost twice as much as an Indy. SGI set out to design a system which would include all these features as-standard, so the end result was bound to cost several thousand pounds, but that was still half the cost of trying to cobble together a collection of mis-matched components from a dozen different companies to produce something which still would not have been anywhere near as good. As PCW put it, the Indy - for its time - was a great machine offering superb value if one was the kind of customer which needed its features and would be able to make good use of them. Sun Microsystems adopted a similar approach to its recent Ultra5, Ultra10 and other systems: provide the user with an integrated design with a specific feature set that Sun knew its customers wanted. SGI did it again with their O2 system, released in October 1996. O2 has such a vast range of features (highly advanced for its time) that few ordinary customers would find themselves using most or all of them. However, for the intended target markets (ranging from CAD, design, animation, film/video special effects, video editing to medical imaging, etc.) the O2 was an excellent system. Like most UNIX hardware systems, O2 today is not competitive in certain areas such as basic 3D graphics performance (there are exceptions to this), but certain advanced and unique architectural features mean it's still purchased by customers who require those features. This, then, is the key: UNIX hardware platforms which offer a great many features and highquality components are only a good choice if one:  

is the kind of customer which definitely needs those features values the ramifications of using a better quality system that has been designed top-down: reliability, quality, long-term value, ease of maintenance, etc.

One often observes people used to PCs asking why systems like O2, HP's Visualize series, SGI's Octane, Sun's Ultra60, etc. cost so much compared compared to PCs. The reason for the confusion is that the world of PCs focuses heavily on the abilities of the main CPU, whereas all UNIX vendors have, for many years, made systems which include as much dedicated acceleration hardware as possible, easing the burden on the main CPU. For the home market, systems like the Amiga pioneered this approach; unfortunately, the company responsible fort the Amiga doomed itself to failure as a result of various marketing blunders.

From an admin's point of view, the practical side effect of having to administer and run a UNIX hardware platform is that there is far, far less effort needed in terms of configuring systems at the hardware level, or having to worry about different system hardware components operating correctly with one other. Combined with the way most UNIX variants deal with hardware devices (ie. automatically and transparently most of the time), a UNIX admin can swap hardware components between different systems from the same vendor without any need to alter system software, ie. any changes in system hardware configuration are dealt with automatically. Further, many UNIX vendors use certain system components that are identical (usually memory, disks and backup devices), so admins can often swap generic items such as disks between different vendor platforms without having to reconfigure those components (in the case of disks)

or worry about damaging either system. SCSI disks are a good example: they are supplied preformatted, so an admin should never have to reformat a SCSI disk. Swapping a SCSI disk between different vendor platforms may require repartitioning of the disk, but never a reformat. In the 6 years I've been using SGIs, I've never had to format a SCSI disk. Examining a typical UNIX hardware system such as Indy, one notices several very obvious differences compared to PCs:   

There are far fewer cables in view. Components are positioned in such a way as to greatly ease access to all parts of the system. The overall design is highly integrated so that system maintenance and repairs/replacements are much easier to carry out.

Thus, problems that are solvable by the admin can be dealt with quickly, while problems requiring vendor hardware support assistance can be fixed in a short space of time by a visiting technician, which obviously reduces costs for the vendor responsible by enabling their engineers to deal with a larger number of queries in the same amount of time.

Just as with the approaches taken to hardware and software design, the way in which support contracts for UNIX systems operate also follow a top-down approach. Support costs can be high, but the ethos is similar: you get what you pay for - fast no-nonsense support when it's needed. I can only speak from experience of dealing with SGIs, but I'm sure the same is true of other UNIX vendors. Essentially, if I encounter a hardware problem of some kind, the support service always errs on the side of caution in dealing with the problem, ie. I don't have to jump through hoops in order to convince them that there is a problem - they accept what I say and organise a visiting technician to help straight away (one can usually choose between a range of response times from 1 hour to 5 days). Typically, unless the technician can fix the problem on-site in a matter of minutes, then some, most, or even all of the system components will be replaced if necessary to get the system in working order once more. For example, when I was once encountering SCSI bus errors, the visiting engineer was almost at the point of replacing the motherboard, video card and even the main CPU (several thousand pounds worth of hardware in terms of new-component replacement value at the time) before some extra further tests revealed that it was in fact my own personal disk which was causing the problem (I had an important jumper clip missing from the jumper block). In other words, UNIX vendor hardware support contracts tend to place much less emphasis on the customer having to prove they have a genuine problem. I should imagine this approach exists because many UNIX vendors have to deal with extremely important clients such as government, military, medical, industrial and other sectors (eg. safety critical systems). These are customers with big budgets who don't want to waste time messing around with details while their faulty system is losing them money - they expect the vendor to help them get their system working again as soon as possible.

Note: assuming a component is replaced (eg. motherboard), even if the vendor's later tests show the component to be working correctly, it is not returned to the customer, ie. the customer keeps the new component. Instead, most vendors have their own dedicated testing laboratories which pull apart every faulty component returned to them, looking for causes of problems so that the vendor can take corrective action if necessary at the production stage, and learn any lessons to aid in future designs. To summarise the above:  

A top-down approach to hardware design means a better feature set, better quality, reliability, ease of use and maintenance, etc. As a result, UNIX hardware systems can be costly. One should only purchase such a system if one can make good use of the supplied features, and if one values the implications of better quality, etc., despite the extra cost.

However, a blurred middle-ground between the top-down approach to UNIX hardware platforms and the bottom-up approach to the supply of PCs is the so-called 'vendor-badged' NT workstation market. In general, this is where UNIX vendors create PC-style hardware systems that are still based on off-theshelf components, but occasionally include certain modifications to improve performance, etc. beyond what one normally sees of a typical PC. The most common example is where vendors such as Compaq supply systems which have two 64bit PCI busses to increase available system bandwidth.

All these systems are targeted at the 'NT workstation' market. Cynics say that such systems are just a clever means of placing a 'quality' brand name on ordinary PC hardware. However, such systems do tend to offer a better level of quality and integration that ordinary PCs (even expensive ordinary PCs), but an inevitable ironic side effect is that these vendor-badged systems do cost more. Just as with traditional UNIX hardware systems, whether or not that cost is worth it depends on customers' priorities. Companies such as movie studios regard stability and reliability as absolutely critical, which is why most studios do not use NT [8]. Those that do, especially smaller studios (perhaps because of limited budgets) will always go for vendor-badged NT workstations rather than purchasing systems from PC magazines and attempting to cobble together a reliable platform. The extra cost is worth it. There is an important caveat to the UNIX hardware design approach: purchasing what can be a very good UNIX hardware system is a step that can easily be ruined by not equipping that system in the first instance with sufficient essential system resources such as memory capacity, disk space, CPU power and (if relevant) graphics/image/video processing power. Sometimes, situations like this occur because of budget constraints, but the end result may be a system which cannot handle the tasks for which it was purchased. If such mis-matched purchases are made, it's usually a good sign that the company concerned is using a bottom-up approach to making decisions about whether or not to buy a hardware platform that has been built using a top-down approach. The irony is plain to see. Since admins often have to advise on hardware purchases or upgrades, a familiarity with these issues is essential. Conclusion: decide what is needed to solve the problem. Evaluate which systems offer appropriate solutions. If there no system is affordable, do not compromise on essentials such as

memory or disk as a means of lowering cost - choose a different platform instead such as good quality NT system, or a system with lower costs such as an Intel machine running Linux, etc. Similarly, it makes no sense to have a good quality UNIX system, only to then adopt a strategy of buying future peripherals (eg. extra disks, memory, printers, etc.) that are of poor quality. In fact, some UNIX vendors may not offer or permit hardware support contracts unless the customer sticks to using approved 3rd-party hardware sources. Summary: UNIX hardware platforms are designed top-down, offer better quality components, etc., but tend to be more expensive as a result. Today, an era when even SGI has started to sell systems that support WindowsNT, the philosophy is still the same: design top-down to give quality hardware, etc. Thus, SGI's WindowsNT systems start at around 2500 pounds - alot by the standards of any home user, but cheap when considering the market in general. The same caveat applies though: such a system with a slow CPU is wasting the capabilities of the machine.

UNIX Characteristics. Integration:

A top-down approach results in an integrated design. Systems tend to be supplied 'complete', ie. everything one requires is usually supplied as-standard. Components work well together since the designers are familiar with all aspects of the system.

Stability and Reliability: The use of quality components, driven by the demands of the markets which most UNIX vendors aim for, results in systems that experience far fewer component failures compared to PCs. As a result of a top-down and integrated approach, the chances of a system experiencing hardwarelevel conflicts are much lower compared to PCs.

Security: It is easy for system designers to incorporate hardware security features such as metal hoops that are part of the main moulded chassis, for attaching to security cables. On the software side, and as an aid to preventing crime (as well as making it easier to solve crime in terms of tracing components, etc.) systems such as SGIs often incorporate unique hardware features. The following applies to SGIs but is also probably true of hardware from other UNIX vendors in some equivalent form.

Every SGI has a PROM chip on the motherboard, without which the system will not boot. This PROM chip is responsible for initiating the system bootup sequence at the very lowest hardware level. However, the chip also contains an ID number which is unique to that particular machine. One can display this ID number with the following command: sysinfo -s

Alternatively, the number can be displayed in hexadecimal format by using sysinfo command on its own (one notes the first 4 groups of two hex digits). A typical output might look like this: % sysinfo -s 1762299020 % sysinfo System ID: 69 0a 8c 8c 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

The important part of the output from the second command is the beginning sequence consisting of '690A8C8C'. The ID number is not only used by SGI when dealing with system hardware and software support contracts, it is also the means by which license codes are supplied for SGI's commercial software packages. If one wishes to use a particular commercial package, eg. the VRML editor called CosmoWorlds, SGI uses the ID number of the machine to create a license code which will be recognised by the program concerned as being valid only for that particular machine. The 20digit hexadecimal license code is created using a special form of encryption, presumably combining the ID number with some kind of internal database of codes for SGI's various applications which only SGI has access to. In the case of the O2 I use at home, the license code for CosmoWorlds is 4CD4FB82A67B0CEB26B7 (ie. different software packages on the same system need different license codes). This code will not work for any other software package on any other SGI anywhere in the world. There are two different license management systems in use by SGIs: the NetLS environment on older platforms, and the FlexLM environment on newer platforms. FlexLM is being widely adopted by many UNIX vendors. NetLS licenses are stored in the /var/netls directory, while FlexLM licenses are kept in /var/flexlm. To the best of my knowledge, SGI's latest version of IRIX (6.5) doesn't use NetLS licenses anymore, though it's possible that 3rd-party software suppliers still do. As stated in the software section, the use of the ID number system at the hardware level means it is impossible to pirate commercial software. More accurately, anyone can copy any SGI software CD, and indeed install the software, but that software will not run without the license code which is unique to each system, so there's no point in copying commercial software CDs or installing copied commercial software in the first place.

Of course, one could always try to reverse-engineer the object code of a commercial package to try and get round the section which makes the application require the correct license code, but this would be very difficult. The important point is that, to the best of my knowledge, SGI's license code schema has never been broken at the hardware level. Note: from the point of view of an admin maintaining an SGI system, if a machine completely fails, eg. damage by fire and water, the admin should always retain the PROM chip if possible ie. a completely new system could be obtained but only the installation of the original PROM chip will make the new system effectively the same as the old one. For PCs, the most important system component in terms of system identity is the system disk (more accurately, its contents); but for machines such as SGIs, the PROM chip is just as if not more important than the contents of the system disk when it comes to a system having a unique identity.

Scalability. Because a top-down hardware design approach has been used by all UNIX hardware vendors over the years, most UNIX vendors offer hardware solutions that scale to a large number of processors. Sun, IBM, SGI, HP and other vendors all offer systems that scale to 64 CPUs. Currently, one cannot obtain a reliable PC/NT platform that scales to even 8 CPUs (Intel won't begin shipping 8-way chip sets until Q3 1999). Along with the basic support for a larger number of processors, UNIX vendors have spent a great deal of time researching advanced ways of properly supporting many CPUs. There are complex issues concerning how such systems handle shared memory, the movement of data, communications links, efficient use of other hardware such as graphics and video subsystems, maximised use of storage systems (eg. RAID), and so on. The result is that most UNIX vendors offer large system solutions which can tackle extremely complex problems. Since these systems are obviously designed to the very highest quality standards with a top-down approach to integration, etc., they are widely used by companies and institutions which need such systems for solving the toughest of tasks, from processing massive databases to dealing with huge seismic data sets, large satellite images, complex medical data and intensive numerical processing (eg. weather modeling). One very beneficial side-effect of this kind of development is that the technology which comes out of such high-quality designs slowly filters down to the desktop systems, enabling customers to eventually utilise extremely advanced and powerful computing systems. A particularly good example of this is SGI's Octane system [9] - it uses the same components and basic technology as SGI's high-end Origin server system. As a result, the user benefits from many advanced features, eg. 

Octane has no inherent maximum memory limit. Memory is situated on a 'node board' along with the 1 or 2 main CPUs, rather than housed on a backplane. As CPU designs improve, so memory capacity on the node board can be increased by using a different node board design, ie. without changing the base system at all. For example, Octane systems using the R10000 CPU



can have up to 2GB RAM, while Octane systems using the R12000 CPU can have up to 4GB RAM. Future CPUs (R14K, R16K, etc.) will change this limit again to 8GB, 16GB, etc. The speed at which all internal links operate is directly synchronised to the clock speed of the main CPU. As a result, internal data pathways can always supply data to both main CPUs faster than they can theoretically cope with, ie. one can get the absolute maximum performance out of a CPU (this is fundamentally not possible with any PC design). As CPU clock speeds increase, so does the rate at which the system can move data around internally. An Octane using 195MHz R10000s offers three separate internal data pathways each operating at 1560MB/sec (10X faster than a typical PCI bus). An Octane using 300MHz R12000s runs the same pathways at the faster rate of 2400MB/sec per link. ie. system bandwidth and memory bandwidth increase to match CPU speed.

The above is not a complete list of advanced features.

SGI's high-end servers are currently the most scalable in the world, offering up to 256 CPUs for a commercially available system, though some sites with advance copies of future OS changes have systems with 512 and 720 CPUs. As stated elsewhere, one system has 6144 CPUs. The quality of design required to create technologies like this, along with software and OS concepts that run them properly, are quite incredible. These features are passed on down to desktop systems and eventually into consumer markets. But it means that, at any one time, midrange systems based on such advanced technologies can be quite expensive (Octanes generally start at around 7000 pounds). Since much of the push behind these developments comes from military and government clients, again there is great emphasis on quality, reliability, security, etc. Cray Research, which is owned by SGI, holds the world record for the most stable and reliable system: a supercomputer with 2048 CPUs which ran for 2.5 years without any of the processors exhibiting a single system-critical error. Sun, HP, IBM, DEC, etc. all operate similar design approaches, though SGI/Cray happens to have the most advanced and scalable server and graphics system designs at the present time, mainly because they have traditionally targeted high-end markets, especially US government contracts. The history of UNIX vendor CPU design follows a similar legacy: typical customers have always been willing to pay 3X as much as an Intel CPU in order to gain access to 2X the performance. Ironically, as a result, Intel have always produced the world's slowest CPUs, even though they are the cheapest. CPUs at much lower clock speeds from other vendors (HP, IBM, Sun, SGI, etc.) can easily be 2X to 5X faster than Intel's current best. As stated above though, these CPUs are much more expensive - even so, it's an extra cost which the relevant clients say they will always bare in order to obtain the fastest available performance. The exception today is the NT workstation market where systems from UNIX vendors utilise Intel CPUs and WindowsNT (and/or Linux), offering a means of gaining access to better quality graphics and video hardware while sacrificing the use of more powerful CPUs and the more sophisticated UNIX OSs, resulting in lower cost. Even so, typical high-end NT systems still cost around 3000 to 15000 pounds.

So far, no UNIX vendor makes any product that is targeted at the home market, though some vendors create technologies that are used in the mass consumer market (eg. the R3000 CPU which runs the Sony PlayStation is designed by SGI and was used in their older workstations in the late 1980s and early 1990s; all of the Nintendo64's custom processors were designed by SGI). In terms of computer systems, it is unlikely this situation will ever change because to do so would mean a vendor would have to adopt a bottom-up design approach in order to minimise cost above all else - such a change wouldn't be acceptable to customers and would contradict the way in which the high-end systems are developed. Vendors which do have a presence in the consumer market normally use subsidiaries as a means of avoiding internal conflicts in design ethos, eg. SGI's MIPS subsidiary (soon to be sold off).

References: 1. Blender Animation and Rendering Program: 2. http://www.blender.nl/

3. XV Image Viewer: 4. http://www.trilon.com/xv/xv.html

5. Extract taken from GNU GENERAL PUBLIC LICENSE, Version 2, June 1991, Copyright (C) 1989, 1991 Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 6. GIMP (GNU Image Manipulation Program): 7. http://www.gimp.org/

8. SGI Freeware Sites (identical): 9. http://freeware.sgi.com/ 10. http://toolbox.sgi.com/TasteOfDT/public/freeware/

11. Pixar's Blue Moon Rendering Tools (BMRT): 12. http://www.bmrt.org/

13. Silicon Graphics Indy, PCW, September 1993: 14. http://www.futuretech.vuurwerk.nl/pcw9-93indy.html

15. "LA conferential", CGI Magazine, Vol4, Issue 1, Jan/Feb 1999, pp. 21, by Richard Spohrer.

Interview from the 'Digital Content and Creation' conference and exhibition: '"No major production facilities rely on commercial software, everyone has to customise applications in order to get the most out of them," said Hughes. "We run Unix on SGI as we need a stable environment which allows fast networking. NT is not a professional solution and was never designed to handle high-end network environments," he added. "Windows NT is the antithesis of what the entertainment industry needs. If we were to move from Irix, we would use Linux over NT."'

- John Hughes, president/CEO of Rhythm & Hues and Scott Squires, visual effects supervisor at ILM and ceo of Puffin Design.

16. Octane Information Index: http://www.futuretech.vuurwerk.nl/octane/

17. "How to set up the BIND domain name server", Network Week, Vol4 No. 29, 14th April 1999, pp. 17, by David Cartwright. 18. A letter from a reader in response to [10]: "Out of a BIND", Network Week, Vol4 No. 31, 28th April 1999, pp. 6:

"A couple of weeks ago, I had a problem. I was attempting to configure NT4's DNS Server for use on a completely private network, but it just wasn't working properly. The WindowsNT 'help' - and I use that term loosely - assumed my network was connected to the Internet, so the examples it gave were largely useless. Then I noticed David Cartwright's article about setting up DNS servers. (Network Week, 14th April). The light began to dawn. Even better, the article used BIND's configuration files as examples. This meant that I could dump NT's obtuse GUI DNS Manager application and hand-hack the configuration files myself. A few minor problems later (most of which were caused by Microsoft's example DNS config files being a bit... um... optimistic) and the DNS server finally lurched into life. Thank you Network Week. The more Q&A and how-to type information you print, the better." - Matthew Bell, Fluke UK.

General References: Anonymous SGI FTP Site List: Origin2000 Information Index: Onyx2 Information Index: SGI: Hewlett Packard: Sun Microsystems: IBM: Compaq/Digital: SCO: Linux:

http://reality.sgi.com/billh/anonftp/ http://www.futuretech.vuurwerk.nl/origin/ http://www.futuretech.vuurwerk.nl/onyx2/ http://www.sgi.com/ http://www.hp.com/ http://www.sun.com/ http://www.ibm.com/ http://www.digital.com/ http://www.sco.com/ http://www.linux.org/

Appendix A: Case Study. For unknown and unchangeable reasons, UCLAN's central admin system has a DNS setup which, incorrectly, does not recognise comp.uclan.ac.uk as a subdomain. Instead, the central DNS lists comp as a host name, ie. comp.uclan.ac.uk is listed as a direct reference to Yoda's external IP address, 193.61.250.34; in terms of the intended use of

the word 'comp', this is rather like referring to a house on a street by using just the street name. As a result, the SGI network's fully qualified host names, such as yoda.comp.uclan.ac.uk, are not recognised outside UCLAN, and neither is comp.uclan.ac.uk since all the machines on the SGI network treat comp as a subdomain. Thus, external users can access Yoda's IP address directly by referring to 193.61.250.34 (so ftp is possible), but they cannot access Yoda as a web server, or access individual systems in Ve24 such as sevrin.comp.uclan.ac.uk, or send email to the SGI network. Also, services such as USENET cannot be setup, so internal users must use web sites to access newsgroups. This example serves as a warning: organisations should thoroughly clarify what their individual department's network structures are going to be, through a proper consultation and discussion process, before allowing departments to setup internal networks. Otherwise, confusion and disagreement can occur. In the case of the SGI network, its internal structure is completely correct (as confirmed by SGI themselves), but the way it is connected to the Internet is incorrect. Only the use of a Proxy server allows clients to access the Internet, but some strange side-effects remain; for example, email can be sent from the SGI network to anywhere on the Internet (from Yoda to Yahoo in less than 10 seconds!), but not vice-versa because incoming data is blocked by the incorrectly configured central DNS. Email from the SGI network can reach the outside world because of the way the email system works: the default settings installed along with the standard Berkeley Sendmail software (/usr/lib/sendmail) are sufficient to forward email from the SGI network to the Internet via routers further along the communications chain, which then send the data to JANET at Manchester, and from there to the final destination (which could include a UCLAN student or staff member). The situation is rather like posting a letter without a sender's address, or including an address which gives everything as far as the street name but not the house number - the letter will be correctly delivered, but the recipient will not be able to reply to the sender.

Detailed Notes for Day 2 (Part 2) UNIX Fundamentals: Shell scripts.

It is an inevitable consequence of using a command interface such as shells that one would wish to be able to run a whole sequence of commands to perform more complex tasks, or perhaps the same task many times on multiple systems. Shells allow one to do this by creating files containing sequences of commands. The file, referred to as a shell script, can be executed just like any other program, though one must ensure the execute permissions on the file are set appropriately in order for the script to be executable. Large parts of all modern UNIX variants use shell scripts to organise system management and behaviour. Programming in shell script can include more complicated structures such as if/then statements, case statements, for loops, while loops, functions, etc. Combined with other features such as metacharacters and the various text-processing utilities (perl, awk, sed, grep, etc.) one can create extremely sophisticated shell scripts to perform practically any system administration task, ie. one is able to write programs which can use any available application or existing command as part of the code in the script. Since shells are based on C and the commands use a similar syntax, shell programming effectively combines the flexibility of C-style programming with the ability to utilise other programs and resources within the shell script code. Looking at typical system shell script files, eg. the bootup scripts contained in /etc/init.d, one can see that most system scripts make extensive use of if/then expressions and case statements. However, a typical admin will find it mostly unnecessary to use even these features. In fact, many administration tasks one might choose to do can be performed by a single command or sequence of commands on a single line (made possible via the various metacharacters). An admin might put such mini-scripts into a file and execute that file when required; even though the file's contents may not appear to be particularly complex, one can perform a wide range of tasks using just a few commands. Hash symbol '#' in a script file at the beginning of a line is used to denote a comment. One of the most commonly used commands in UNIX is 'find' which allows one to search for files, directories, files belonging to a particular user or group, files of a special type (eg. a link to another file), files modified before or after a certain time, and so on (there are many options). Most admins tend to use the find command to select certain files upon which to perform some other operation, to locate files for information gathering purposes, etc. The find command uses a Boolean expression which defines the type of file the command is to search for. The name of any file matching the Boolean expression is returned.

For example (see the 'find' man page for full details): find /home/students -name "capture.mv" -print

Figure 25. A typical find command.

This command searches all students directories, looking for any file called 'capture.mv'. On Indy systems, users often capture movie files when first using the digital camera, but usually never delete them, wasting disk space. Thus, an admin might have a site policy that, at regular intervals, all files called capture.mv are erased - users would be notified that if they captured a video sequence which they wished to keep, they should either set the name to use as something else, or rename the file afterwards. One could place the above command into a executable file called 'loc', running that file when one so desired. This can be done easily by the following sequence of actions (only one line is entered in this example, but one could easily enter many more): % cat > loc find /home/students -name "capture.mv" -print [press CTRL-D] % chmod u+x loc % ls -lF loc -rwxr--r-1 mapleson staff 46 May

3 13:20 loc*

Figure 26. Using cat to quickly create a simple shell script.

Using ls -lF to examine the file, one would see the file has the execute permission set for user, and a '*' has been appended after the file name, both indicating the file is now executable. Thus, one could run that file just as if it were a program. One might imagine this is similar to .BAT files in DOS, but the features and functionality of shell scripts are very different (much more flexible and powerful, eg. the use of pipes). There's no reason why one couldn't use an editor to create the file, but experienced admins know that it's faster to use shortcuts such as employing cat in the above way, especially compared to using GUI actions which requires one to take hold the mouse, move it, double-click on an icon, etc. Novice users of UNIX systems don't realise until later that very simple actions can take longer to accomplish with GUI methods. Creating a file by redirecting the input from cat to a file is a technique I often use for typing out files with little content. cat receives its input from stdin (the keyboard by default), so using 'cat > filename' means anything one types is redirected to the named file instead of stdout; one must press CTRL-D to end the input stream and close the file. An even lazier way of creating the file, if just one line was required, is to use echo: % echo 'find /home/students -name "capture.mv" -print' > loc % chmod u+x loc % ls -lF loc -rwxr--r-1 mapleson staff 46 May 3 13:36 loc

% cat loc find /home/students -name "capture.mv" -print

Figure 27. Using echo to create a simple one-line shell script.

This time, there is no need to press CTRL-D, ie. the prompt returns immediately and the file has been created. This happens because, unlike cat which requires an 'end of file' action to terminate the input, echo's input terminates when it receives an end-of-line character instead (this behaviour can be overridden with the '-n' option). The man page for echo says, "echo is useful for producing diagnostics in command files and for sending known data into a pipe." For the example shown in Fig 27, single quote marks surrounding the find command were required. This is because, without the quotes, the double quotes enclosing capture.mv are not included in the output stream which is redirected into the file. When contained in a shell script file, find doesn't need double quotes around the file name to search for, but it's wise to include them because other characters such as * have special meaning to a shell. For example, without the single quote marks, the script file created with echo works just fine (this example searches for any file beginning with the word 'capture' in my own account): % echo find /mapleson -name "capture.*" -print > loc % chmod u+x loc % ls -lF loc -rwxr--r-1 mapleson staff 38 May 3 14:05 loc* % cat loc find /mapleson -name capture.* -print % loc /mapleson/work/capture.rgb

Figure 28. An echo sequence without quote marks.

Notice the loc file has no double quotes. But if the contents of loc is entered directly at the prompt: % find /mapleson -name capture.* -print find: No match.

Figure 29. The command fails due to * being treated as a metacommand by the shell.

Even though the command looks the same as the contents of the loc file, entering it directly at the prompt produces an error. This happens because the * character is interpreted by the shell before the find command, ie. the shell tries to evaluate the capture.* expression for the current directory, instead of leaving the * to be part of the find command. Thus, when entering commands at the shell prompt, it's wise to either use double quotes where appropriate, or use the backslash \ character to tell the shell not to treat the character as if it was a shell metacommand, eg.: % find /mapleson -name capture.\* -print

/mapleson/work/capture.rgb

Figure 30. Using a backslash to avoid confusing the shell.

A -exec option can be used with the find command to enable further actions to be taken on each result found, eg. the example in Fig 25 could be enhanced by including making the find operation execute a further command to remove each capture.mv file as it is found: find /home/students -name "capture.mv" -print -exec /bin/rm {} \;

Figure 31. Using find with the -exec option to execute rm.

Any name returned by the search is passed on to the rm command. The shell substitutes the {} symbols with each file name result as it is returned by find. The \; grouping at the end serves to terminate the find expression as a whole (the ; character is normally used to terminate a command, but a backslash is needed to prevent it being interpreted by the shell as a metacommand). Alternatively, one could use this type of command sequence to perform other tasks, eg. suppose I just wanted to know how large each movie file was: find /home/students -name "capture.mv" -print -exec /bin/ls -l {} \;

Figure 32. Using find with the -exec option to execute ls.

This works, but two entries will be printed for each command: one is from the -print option, the other is the output from the ls command. To see just the ls output, one can omit the -print option. Consider this version: find /home/students -name "*.mov" -exec /bin/ls -l {} \; > results

Figure 33. Redirecting the output from find to a file.

This searches for any .mov movie file (usually QuickTime movies), with the output redirected into a file. One can then perform further operations on the results file, eg. one could search the data for any movie that contains the word 'star' in its name: grep star results

A final change might be to send the results of the grep operation to the printer for later reading: grep star results | lp

Thus, the completed script looks like this: find /home/students -name "*.mv" -exec /bin/ls -l {} \; > results grep star results | lp

Figure 34. A simple script with two lines.

Only two lines, but this is now a handy script for locating any movies on the file system that are likely to be related to the Star Wars or Star Trek sagas and thus probably wasting valuable disk space! For the network I run, I could then use the results to send each user a message saying the Star Wars trailer is already available in /home/pub/movies/misc, so they've no need to download extra copies to their home directory. It's a trivial example, but in terms of the content of the commands and the way extra commands are added, it's typical of the level of complexity of most scripts which admins have to create. Further examples of the use of 'find' are in the relevant man page; an example file which contains several different variations is: /var/spool/cron/crontabs/root

This file lists the various administration tasks which are executed by the system automatically on a regular basis. The cron system itself is discussed in a later lecture.

WARNING. The Dangers of the Find Command and Wildcards. Although UNIX is an advanced OS with powerful features, sometimes one encounters an aspect of its operation which catches one completely off-guard, though this is much less the case after just a little experience. A long time ago (January 1996), I realised that many students who used the Capture program to record movies from the Digital Camera were not aware that using this program or other movierelated programs could leave unwanted hidden directories containing temporary movie files in their home directory, created during capture, editing or conversion operations (I think it happens when an application is killed of suddenly, eg. with CTRL-C, which doesn't give it an opportunity to erase temporary files). These directories, which are always located in a user's home directory, are named '.capture.mv.tmpXXXXX' where XXXXX is some 5-digit string such as '000Hb', and can easily take up many megabytes of space each. So, I decided to write a script to automatically remove such directories on a regular basis. Note that I was logged on as root at this point, on my office Indy. In order to test that a find command would work on hidden files (I'd never used the find command to look for hidden files before), I created some test directories in the /tmp directory, whose contents would be given by 'ls -AR' as something like this: % ls -AR

.b/ .c/ ./.b:

a/

d/

./.c: .b a ./a: ./d: a

ie. a simple range of hidden and non-hidden directories with or without any content:    

Ordinary directories with or without hidden/non-hidden files inside, Hidden directories with or without hidden/non-hidden files inside, Directories with ordinary files, etc.

The actual files such as .c/a and .c/.b didn't contain anything. Only the names were important for the test.

So, to test that find would work ok, I executed the following command from within the /tmp directory: find . -name ".*" -exec /bin/rm -r {} \;

(NB: the -r option for rm means do a recursive removal, and note that there was no -i option used with the rm here) What do you think this find command would do? Would it remove the hidden directories .b and .c and their contents? If not, why not? Might it do anything else as well?

Nothing happened at first, but the command did seem to be taking far too long to return the shell prompt. So, after a few seconds, I decided something must have gone wrong; I typed CTRL-C to stop the find process (NB: it was fortunate I was not distracted by a phone call or something at this point). Using the ls command showed the test files I'd created still existed, which seemed odd. Trying some further commands, eg. changing directories, using the 'ps' command to see if there was something causing system slowdown, etc., produced strange errors which I didn't understand at the time (this was after only 1 or 2 months' admin experience), so I decided to reboot the system. The result was disaster: the system refused to boot properly, complaining about swap file errors and things relating to device files. Why did this happen? Consider the following command sequence by way of demonstration:

cd /tmp mkdir xyz cd xyz /bin/ls -al

The output given will look something like this: drwxr-xr-x drwxrwxrwt

2 root 6 sys

sys sys

9 Apr 21 13:28 ./ 512 Apr 21 13:28 ../

Surely the directory xyz should be empty? What are these two entries? Well, not quite empty. In UNIX, as stated in a previous lecture, virtually everything is treated as a file. Thus, for example, the command so commonly performed even on the DOS operating system: cd ..

is actually doing something rather special on UNIX systems. 'cd ..' is not an entire command in itself. Instead, every directory on a UNIX file system contains two hidden directories which are in reality special types of file: ./ ../

- this refers to the current directory. - this is effectively a link to the directory above in the file system.

So typing 'cd ..' actually means 'change directory to ..' (logical since cd does mean 'change directory to') and since '..' is treated as a link to the directory above, then the shell changes the current working directory to the next level up. [by contrast, 'cd ..' in DOS is treated as a distinct command in its own right - DOS recognises the presence of '..' and if possible changes directory accordingly; this is why DOS users can type 'cd..' instead if desired] But this can have an unfortunate side effect if one isn't careful, as is probably becoming clear by now. The ".*" search pattern in the find command will also find these special './' and '../' entries in the /tmp directory, ie.:    



The first thing the find command locates is './' './' is inserted into the search string ".*" to give "../*" find changes directory to / (root directory). Uh oh... find locates the ./ entry in / and substitutes this string into ".*" to give "../*". Since the current directory cannot be any higher, the search continues in the current directory; ../ is found next and is treated the same way. The -exec option with 'rm' causes find to begin erasing hidden files and directories such as .Sgiresources, eventually moving onto non-hidden files: first the /bin link to /usr/bin, then the /debug link, then all of /dev, /dumpster, /etc and so on.

By the time I realised something was wrong, the find command had gone as far as deleting most of /etc. Although important files in /etc were erased which I could have replaced with a backup tape or reinstall,

the real damage was the erasure of the /dev directory. Without important entries such as /dev/dsk, /dev/rdsk, /dev/swap and /dev/tty*, the system cannot mount disks, configure the swap partition on bootup, connect to keyboard input devices (tty terminals), and accomplish other important tasks.

In other words, disaster. And I'd made it worse by rebooting the system. Almost a complete repair could have been done simply by copying the /dev and /etc directories from another machine as a temporary fix, but the reboot made everything go haywire. I was partly fooled by the fact that the files in /tmp were still present after I'd stopped the command with CTRL-C. This led me to at first think that nothing had gone awry. Consulting an SGI software support engineer for help, it was decided the only sensible solution was to reinstall the OS, a procedure which was alot simpler than trying to repair the damage I'd done. So, the lessons learned: 

Always read up about a command before using it. If I'd searched the online books with the expression 'find command', I would have discovered the following paragraph in Chapter 2 ("Making the Most of IRIX") of the 'IRIX Admin: System Configuration and Operation' manual: "Note that using recursive options to commands can be very dangerous in that the command automatically makes changes to your files and file system without prompting you in each case. The chgrp command can also recursively operate up the file system tree as well as down. Unless you are sure that each and every case where the recursive command will perform an action is desired, it is better to perform the actions individually. Similarly, it is good practice to avoid the use of metacharacters (described in "Using Regular Expressions and Metacharacters") in combination with recursive commands."

I had certainly broken the rule suggested by the last sentence in the above paragraph. I also did not know what the command would do before I ran it. 

Never run programs or scripts with as-yet unknown effects as root.

ie. when testing something like removing hidden directories, I should have logged on as some ordinary user, eg. a 'testuser' account, so that if the command went wrong it would not have been able to change or remove any files owned by root, or files owned by anyone else for that matter, including my own in /mapleson. If I had done this, the command I used would have given an immediate error and halted when the find string tried to remove the very first file found in the root directory (probably some minor hidden file such as .Sgiresources). Worrying thought: if I hadn't CTRL-C'd the find command when I did, after enough time, the command would have erased the entire file system (including /home), or at least tried to. I seem to recall that, in reality (tested once on a standalone system deliberately), one can get about as far as most of /lib before the system actually goes wrong and stops the current command anyway, ie. the find command

sequence eventually ends up failing to locate key libraries needed for the execution of 'rm' (or perhaps the 'find' itself) at some point.

The only positive aspects of the experience were that, a) I'd learned alot about the subtleties of the find command and the nature of files very quickly; b) I discovered after searching the Net that I was not alone in making this kind of mistake - there was an entire web site dedicated to the comical mess-ups possible on various operating systems that can so easily be caused by even experienced admins, though more usually as a result of inexperience or simple errors, eg. I've had at least one user so far who has erased their home directory by mistake with 'rm -r *' (he'd thought his current working directory was /tmp when in fact it wasn't). A backup tape restored his files. Most UNIX courses explain how to use the various available commands, but it's also important to show how not to use certain commands, mainly because of what can go wrong when the root user makes a mistake. Hence, I've described my own experience of making an error in some detail, especially since 'find' is such a commonly used command. As stated in an earlier lecture, to a large part UNIX systems run themselves automatically. Thus, if an admin finds that she/he has some spare time, I recommend using that time to simply read up on random parts of the various administration manuals - look for hints & tips sections, short-cuts, sections covering daily advice, guidance notes for beginners, etc. Also read man pages: follow them from page to page using xman, rather like the way one can become engrossed in an encyclopedia, looking up reference after reference to learn more.

A Simple Example Shell Script. I have a script file called 'rebootlab' which contains the following: rsh akira init 6& rsh ash init 6& rsh cameron init 6& rsh chan init 6& rsh conan init 6& rsh gibson init 6& rsh indiana init 6& rsh leon init 6& rsh merlin init 6& rsh nikita init 6& rsh ridley init 6& rsh sevrin init 6& rsh solo init 6& #rsh spock init 6& rsh stanley init 6& rsh warlock init 6& rsh wolfen init 6& rsh woo init 6&

Figure 35. The simple rebootlab script.

The rsh command means 'remote shell'. rsh allows one to execute commands on a remote system by establishing a connection, creating a shell on that system using one's own user ID information, and then executing the supplied command sequence. The init program is used for process control initialisation (see the man page for details). A typical use for init is to shutdown the system or reboot the system into a particular state, defined by a number from 0 to 6 (0 = full shutdown, 6 = full reboot) or certain other special possibilities. As explained in a previous lecture, the '&' runs a process in the background. Thus, each line in the file executes a remote shell on a system, instructing that system to reboot. The init command in each case is run in the background so that the rsh command can immediately return control to the rebootlab script in order to execute the next rsh command. The end result? With a single command, I can reboot the entire SGI lab without ever leaving the office. Note: the line for the machine 'spock' is commented out. This is because the Indy called spock is currently in the technician's office, ie. not in service. This is a good example of where I could make the script more efficient by using a for loop, something along the lines of: for each name in this list of names, do . As should be obvious, the rebootlab script makes no attempt to check if anybody is logged into the system. So in practice I use the rusers command to make sure nobody is logged on before executing the script. This is where the script could definitely be improved: the command sent by rsh to each system could be modified with some extra commands so that each system is only rebooted if nobody is logged in at the time (the 'who' command could probably be used for this, eg. 'who | grep -v root' would give no output if nobody was logged on). The following script, called 'remountmapleson', is one I use when I go home in the evening, or perhaps at lunchtime to do some work on the SGI I use at home. rsh yoda umount /mapleson && mount /mapleson & rsh akira umount /mapleson && mount /mapleson & rsh ash umount /mapleson && mount /mapleson & rsh cameron umount /mapleson && mount /mapleson & rsh chan umount /mapleson && mount /mapleson & rsh conan umount /mapleson && mount /mapleson & rsh gibson umount /mapleson && mount /mapleson & rsh indiana umount /mapleson && mount /mapleson & rsh leon umount /mapleson && mount /mapleson & rsh merlin umount /mapleson && mount /mapleson & rsh nikita umount /mapleson && mount /mapleson & rsh ridley umount /mapleson && mount /mapleson & rsh sevrin umount /mapleson && mount /mapleson & rsh solo umount /mapleson && mount /mapleson & #rsh spock umount /mapleson && mount /mapleson &

rsh rsh rsh rsh

stanley umount /mapleson && mount /mapleson & warlock umount /mapleson && mount /mapleson & wolfen umount /mapleson && mount /mapleson & woo umount /mapleson && mount /mapleson &

Figure 36. The simple remountmapleson script.

When I leave for home each day, my own external disk (where my own personal user files reside) goes with me, but this means the mount status of the /mapleson directory for every SGI in Ve24 is now out-of-date, ie. each system still has the directory mounted even though the file system which was physically mounted from the remote system (called milamber) is no longer present. As a result, any attempt to access the /mapleson directory would give an error: "Stale NFS file handle." Even listing the contents of the root directory would show the usual files but also the error as well. To solve this problem, the script makes every system unmount the /mapleson directory and, if that was successfully done, remount the directory once more. Without my disk present on milamber, its /mapleson directory simply contains a file called 'README' whose contents state: Sorry, /mapleson data not available - my external disk has been temporarily removed. I've probably gone home to work for a while. If you need to contact me, please call .

As soon as my disk is connected again and the script run once more, milamber's local /mapleson contents are hidden by my own files, so users can access my home directory once again. Thus, I'm able to add or remove my own personal disk and alter what users can see and access at a global level without users ever noticing the change. Note: the server still regards my home directory as /mapleson on milamber, so in order to ensure that I can always logon to milamber as mapleson even if my disk is not present, milamber's /mapleson directory also contains basic .cshrc, .login and .profile files. Yet again, a simple script is created to solve a particular problem.

Command Arguments. When a command or program is executed, the name of the command and any parameters are passed to the program as arguments. In shell scripts, these arguments can be referenced via the '$' symbol. Argument 0 is always the name of the command, then argument 1 is the first parameter, argument 2 is the second parameter, etc. Thus, the following script called (say) 'go': echo $0 echo $1 echo $2

would give this output upon execution: % go somewhere nice go somewhere nice

Including extra echo commands such 'echo $3' merely produces blank lines after the supplied parameters are displayed. If one examines any typical system shell script, this technique of passing parameters and referencing arguments is used frequently. As an example, I once used the technique to aid in the processing of a large number of image files for a movie editing task. The script I wrote is also typical of the general complexity of code which most admins have to deal with; called 'go', it contained: subimg $1 a.rgb 6 633 6 209 gammawarp a.rgb m.rgb 0.01 mult a.rgb a.rgb n.rgb mult n.rgb m.rgb f.rgb addborder f.rgb b.rgb x.rgb subimg x.rgb ../tmp2/$1 0 767 300 875

(the commands used in this script are various image processing commands that are supplied as part of the Graphics Library Image Tools software subsystem. Consult the relevant man pages for details) The important feature is the use of the $1 symbol in the first line. The script expects a single parameter, ie. the name of the file to be processed. By eventually using this same argument at the end of an alternative directory reference, a processed image file with the same name is saved elsewhere after all the intermediate processing steps have finished. Each step uses temporary files created by previous steps. When I used the script, I had a directory containing 449 image files, each with a different name: i000.rgb i001.rgb i002.rgb . . . i448.rgb

To process all the frames in one go, I simply entered this command: find . -name "i*.rgb" -print -exec go {} \;

As each file is located by the find command, its name is passed as a parameter to the go script. The use of the -print option displays the name of each file before the go script begins processing the file's contents. It's a simple way to execute multiple operations on a large number of files.

Secure/Restricted Shell Scripts. It is common practice to include the following line at the start of a shell script: #!/bin/sh

This tells any shell what to use to interpret the script if the script is simply executed, as opposed to sourcing the script within the shell. The 'sh' shell is a lower level shell than csh or tcsh, ie. it's more restricted in what it can do and does not have all the added features of csh and tcsh. However, this means a better level of security, so many scripts (especially as-standard system scripts) include the above line in order to make sure that security is maximised. Also, by starting a new shell to run the script in, one ensures that the commands are always performed in the same way, ie. a script without the above line may work slightly differently when executed from within different shells (csh, tcsh, etc.), perhaps because of any aliases present in the current shell environment, or a customised path definition, etc.

Detailed Notes for Day 2 (Part 3) UNIX Fundamentals: System Monitoring Tools.

Running a UNIX system always involves monitoring how a system is behaving on a daily basis. Admins must keep an eye on such things as:       

disk space usage system performance and statistics, eg. CPU usage, disk I/O, memory, etc. network performance and statistics system status, user status service availability, eg. Internet access system hardware failures and related maintenance suspicious/illegal activity

Figure 37. The daily tasks of an admin.

This section explains the various system monitoring tools, commands and techniques which an admin can use to monitor the areas listed above. Typical example administration tasks are discussed in a later lecture. The focus here is on available tools and what they offer, not on how to use them as part of an admin strategy.

Disk Space Usage. The df command reports current disk space usage. Run on its own, the output is expressed in terms of numbers of blocks used/free, eg.: yoda # df Filesystem /dev/root /dev/dsk/dks4d5s7 milamber:/mapleson

Type blocks use avail %use Mounted on xfs 8615368 6116384 2498984 71 / xfs 8874746 4435093 4439653 50 /home nfs 4225568 3906624 318944 93 /mapleson

Figure 38. Using df without options.

A block is 512 bytes. But most people tend to think in terms of kilobytes, megabytes and gigabytes, not multiples of 512 bytes. Thus, the -k option can be used to show the output in K: yoda # df -k Filesystem /dev/root /dev/dsk/dks4d5s7 milamber:/mapleson

Type xfs xfs nfs

kbytes 4307684 4437373 2112784

Figure 39. The -k option with df to show data in K.

use 3058192 2217547 1953312

avail %use Mounted on 1249492 71 / 2219826 50 /home 159472 93 /mapleson

The df command can be forced to report data only for the file system housing the current directory by adding a period: yoda # cd /home && df -k . Filesystem Type /dev/dsk/dks4d5s7 xfs

kbytes 4437373

use 2217547

avail %use Mounted on 2219826 50 /home

Figure 40. Using df to report usage for the file system holding the current directory.

The du command can be used to show the amount of space used by a particular directory or file, or series of directories and files. The -k option can be used to show usage in K instead of 512byte blocks just as with df. du's default behaviour is to report a usage amount recursively for every sub-directory, giving a total at the end, eg.: yoda # du -k /usr/share/data/models 436 /usr/share/data/models/sgi 160 /usr/share/data/models/food 340 /usr/share/data/models/toys 336 /usr/share/data/models/buildings 412 /usr/share/data/models/household 864 /usr/share/data/models/scenes 132 /usr/share/data/models/chess 1044 /usr/share/data/models/geography 352 /usr/share/data/models/CyberHeads 256 /usr/share/data/models/machines 1532 /usr/share/data/models/vehicles 88 /usr/share/data/models/simple 428 /usr/share/data/models/furniture 688 /usr/share/data/models/robots 7760 /usr/share/data/models

Figure 41. Using du to report usage for several directories/files.

The -s option can be used to restrict the output to just an overall total for the specified directory: yoda # du -k -s /usr/share/data/models 7760 /usr/share/data/models

Figure 42. Restricting du to a single directory.

By default, du does not follow symbolic links, though the -L option can be used to force links to be followed if desired. However, du does examine NFS-mounted file systems by default. The -l and -m options can be used to restrict this behaviour, eg.: ASH # cd / ASH # du -k -s -l 0 CDROM 0 bin 0 debug 68 dev

0 2 0 299 0 2421 2579 0 0 1 4391 565 65 3927 397570 6346

disk2 diskcopy dumpster etc home lib lib32 opt proc root.home sbin stand tmp unix usr var

Figure 43. Forcing du to ignore symbolic links.

The output in Fig 43 shows that the /home directory has been ignored. Another example: a user can find out how much disk space their account currently uses by entering: du -k -s ~/ Swap space (ie. virtual memory on disk) can be monitored using the swap command with the -l option. For full details on these commands, see the relevant man pages. Commands relating to file system quotas are dealt with in a later lecture.

System Performance. This includes processor loading, disk loading, etc. The most common command used by admins/users to observe CPU usage is ps, which displays a list of currently running processes along with associated information, including the percentage of CPU time currently being consumed by each process, eg.: ASH 6# ps -ef UID PID root 0 root 1 root 2 root 3 root 4 root 5 root 900 [etc] root 7 root 8

PPID 0 0 0 0 0 0 895 0 0

C 0 0 0 0 0 0 0

STIME 08:00:41 08:00:41 08:00:41 08:00:41 08:00:41 08:00:41 08:03:27

TTY ? ? ? ? ? ? ?

0 08:00:41 ? 0 08:00:41 ?

TIME 0:01 0:01 0:00 0:03 0:00 0:02 1:25

CMD sched /etc/init vhand bdflush munldd vfs_sync /usr/bin/X11/Xsgi -bs

0:00 shaked 0:00 xfsd

root 9 0 0 08:00:41 root 10 0 0 08:00:41 root 11 0 0 08:00:41 root 909 892 0 08:03:31 root 1512 1509 0 15:37:17 root 158 1 0 08:01:01 root 70 1 0 08:00:50 root 1536 211 0 16:06:04 root 148 1 0 08:01:00 root 146 1 0 08:01:00 root 173 172 0 08:01:03 root 172 1 0 08:01:03 root 174 172 0 08:01:03 root 175 172 0 08:01:03 root 178 1 0 08:01:03 root 179 1 0 08:01:03 root 180 1 0 08:01:03 root 181 1 0 08:01:03 root 189 1 0 08:01:04 root 190 1 0 08:01:04 root 191 1 0 08:01:04 root 202 1 0 08:01:05 root 192 1 0 08:01:04 root 188 1 0 08:01:03 root 311 1 0 08:01:08 root 211 1 0 08:01:05 root 823 1 0 08:01:33 q15m root 1557 1537 9 16:10:58 root 892 1 0 08:03:25 root 1513 1512 0 15:37:17 root 1546 872 0 16:07:55 /usr/Cadmin/bin/directoryserver root 1537 1536 1 16:06:04 root 903 1 0 08:03:27 lp 460 1 0 08:01:17 root 1509 895 0 15:37:13 root 488 1 0 08:01:19 root 1556 1537 28 16:10:56 print root 895 1 0 08:03:27 root 872 1 0 08:02:32 /usr/Cadmin/bin/directoryserver

? ? ? ? ? ? ? pts/0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

0:00 0:00 0:00 0:02 0:00 0:01 0:00 0:00 0:01 0:00 0:01 0:01 0:01 0:01 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:02 0:13

xfsd xfsd pdflush /usr/etc/videod sh -c /var/X11/xdm/Xlogin /usr/etc/ypbind -ypsetme /usr/etc/syslogd rlogind /usr/etc/routed -h -[etc] /usr/etc/portmap /usr/etc/nfsd 4 /usr/etc/nfsd 4 /usr/etc/nfsd 4 /usr/etc/nfsd 4 /usr/etc/biod 4 /usr/etc/biod 4 /usr/etc/biod 4 /usr/etc/biod 4 bio3d bio3d bio3d /usr/etc/rpc.statd bio3d bio3d /usr/etc/timed -M -F yoda /usr/etc/inetd /usr/lib/sendmail -bd -

pts/0 ? ? ?

0:00 ps -ef 0:00 /usr/etc/videod 0:07 /usr/Cadmin/bin/clogin -f 0:00

pts/0 tablet ? ? ? pts/0

0:01 0:00 0:00 0:00 0:01 0:01

? ?

0:00 /usr/bin/X11/xdm 0:06

-tcsh /sbin/getty ttyd1 co_9600 /usr/lib/lpsched /usr/bin/X11/xdm /sbin/cron find /usr -name *.txt -

Figure 44. Typical output from the ps command.

Before obtaining the output shown in Fig 44, I ran a find command in the background. The output shows that the find command was utilising 28% of available CPU resources; tasks such as find are often limited by the speed and bandwidth capacity of the disk, not the speed of the main CPU. The ps command has a variety of options to show or not show various information. Most of the time though, 'ps -ef' is adequate to display the kind of information required. Note that other

UNIX variants use slightly different options, eg. the equivalent command on SunOS would be 'ps -aux'. One can use grep to only report data for a particular process, eg.: ASH 5# ps -ef | grep lp

lp

460

1

0 08:01:17 ?

0:00 /usr/lib/lpsched

Figure 45. Filtering ps output with grep.

This only reports data for the lp printer scheduler.

However, ps only gives a snapshot of the current system state. Often of more interest is a system's dynamic behaviour. A more suitable command for monitoring system performance over time is 'top', a typical output of which looks like this: IRIX ASH 6.2 03131015 IP22 Load[0.22,0.12,0.01] 16:17:47 166 procs user pid pgrp %cpu proc pri size rss time command root 1576 1576 24.44 * 20 386 84 0:02 find root 1577 1577 0.98 0 65 432 100 0:00 top root 1513 1509 0.18 * 60 4322 1756 0:07 clogin root 900 900 0.12 * 60 2858 884 1:25 Xsgi root 146 146 0.05 * 60 351 77 0:00 portmap root 158 0 0.05 * 60 350 81 0:00 ypbind root 1567 1567 0.02 * 60 349 49 0:00 rlogind root 3 0 0.01 * +39 0 0 0:03 bdflush root 172 0 0.00 * 61 0 0 0:00 nfsd root 173 0 0.00 * 61 0 0 0:00 nfsd root 174 0 0.00 * 61 0 0 0:00 nfsd root 175 0 0.00 * 61 0 0 0:00 nfsd Figure 46. top shows a continuously updated output.

From the man page for top: "Two header lines are displayed. The first gives the machine name, the release and build date information, the processor type, the 1, 5, and 15 minute load average, the current time and the number of active processes. The next line is a header containing the name of each field highlighted."

The display is constantly updated at regular intervals, the duration of which can be altered with the -i option (default duration is 5 seconds). top shows the following data for each process:

"user name, process ID, process group ID, CPU usage, processor currently executing the process (if process not currently running), process priority, process size (in pages), resident set size (in pages), amount of CPU time used by the process, and the process name."

Just as with the ps command, top shows the ID number for each process. These IDs can be used with the kill command (and others) to control running processes, eg. shut them down, suspend them, etc. There is a GUI version of top called gr_top. Note that IRIX 6.5 contains a newer version of top which gives even more information, eg.: IRIX WOLFEN 6.5 IP22 load averages: 0.06 0.01 0.00 17:29:44 58 processes: 56 sleeping, 1 zombie, 1 running CPU: 93.5% idle, 0.5% usr, 5.6% ker, 0.0% wait, 0.0% xbrk, 0.5% intr Memory: 128M max, 116M avail, 88M free, 128M swap, 128M free swap PID COMMAND 1372 153 1364 rlogind

PGRP USERNAME PRI 1372 root 153 root 1364 root

SIZE

RES STATE

20 2204K 1008K run/0 20 2516K 1516K sleep 20 1740K 580K sleep

TIME WCPU% CPU% 0:00 0:05 0:00

0.2 0.1 0.0

3.22 top 1.42 nsd 0.24

Figure 47. The IRIX 6.5 version of top, giving extra information.

A program which offers much greater detail than top is osview. Like top, osview constantly updates a whole range of system performance statistics. Unlike top though, so much information is available from osview that it offers several different 'pages' of data. The number keys are used to switch between pages. Here is a typical output for each of the five pages: Page 1 (system information): Osview 2.1 : One Second Average int=5s Load Average fs ctl 1 Min 0.000 fs data 5 Min 0.000 delwri 15 Min 0.000 free CPU Usage data %user 0.20 empty %sys 0.00 userdata %intr 0.00 reserved %gfxc 0.00 pgallocs %gfxf 0.00 Scheduler %sxbrk 0.00 runq %idle 99.80 swapq System Activity switch syscall 19 kswitch read 1 preempt

WOLFEN 17:32:13 04/21/99 #5 2.0M 7.7M 0 87.5M 26.0M 61.4M 20.7M 0 2 0 0 4 95 1

write 0 fork 0 exec 0 readch 19 writech 38 iget 0 System Memory Phys 128.0M kernel 10.1M heap 3.9M mbufs 96.0K stream 40.0K ptbl 1.2M

Wait Ratio %IO %Swap %Physio

1.2 0.0 0.0

Figure 48. System information from osview.

Page 2 (CPU information): Osview 2.1 : One Second Average int=5s CPU Usage %user 0.00 %sys 100.00 %intr 0.00 %gfxc 0.00 %gfxf 0.00 %sxbrk 0.00 %idle 0.00

WOLFEN 17:36:27 04/21/99 #1

Figure 49. CPU information from osview.

Page 3 (memory information): Osview 2.1 : One Second Average WOLFEN 17:36:56 04/21/99 #1 int=5s System Memory iclean 0 Phys 128.0M *Swap kernel 10.5M *System VM heap 4.2M *Heap mbufs 100.0K *TLB Actions stream 48.0K *Large page stats ptbl 1.3M fs ctl 1.5M fs data 8.2M delwri 0 free 77.1M data 28.8M empty 48.3M userdata 30.7M reserved 0 pgallocs 450 Memory Faults

vfault protection demand cw steal onswap oncache onfile freed unmodswap unmodfile

1.7K 225 375 25 375 0 1.4K 0 0 0 0

Figure 50. Memory information from osview.

Page 4 (network information): Osview 2.1 : One Second Average int=5s TCP acc. conns 0 sndtotal 33 rcvtotal 0 sndbyte 366 rexmtbyte 0 rcvbyte 0 UDP ipackets 0 opackets 0 dropped 0 errors 0 IP ipackets 0 opackets 33 forward 0 dropped 0 errors 0 NetIF[ec0] Ipackets 0 Opackets 33 Ierrors 0 Oerrors 0 collisions 0 NetIF[lo0]

WOLFEN 17:38:15 04/21/99 #1

Figure 51. Network information from osview.

Page 5 (miscellaneous): Osview 2.1 : One Second Average int=5s Block Devices lread 37.5K

WOLFEN 17:38:43 04/21/99 #1

bread 0 %rcache 100.0 lwrite 0 bwrite 0 wcancel 0 %wcache 0.0 phread 0 phwrite 0 Graphics griioctl 0 gintr 75 swapbuf 0 switch 0 fifowait 0 fifonwait 0 Video vidioctl 0 vidintr 0 drop_add 0 *Interrupts *PathName Cache *EfsAct *XfsAct *Getblk *Vnodes Figure 51. Miscellaneous information from osview.

osview clearly offers a vast amount of information for monitoring system and network activity. There is a GUI version of osview called gr_osview. Various options exist to determine which parameters are displayed with gr_osview, the most commonly used being -a to display as much data as possible. Programs such as top and osview may be SGI-specific (I'm not sure). If they are, other versions of UNIX are bound to have equivalent programs to these. Example use: although I do virtually all the administration of the server remotely using the office Indy (either by command line or GUI tools), there is also a VT-style terminal in my office connected to the server's serial port via a lengthy cable (the Challenge S server itself is in a small ante room). The VT display offers a simple text-only interface to the server; thus, most of the time, I leave osview running on the VT display so that I can observe system activity whenever I need to. The VT also offers an extra communications link for remote administration should the network go down, ie. if the network links fail (eg. broken hub) the admin Indy cannot be used to communicate with the server, but the VT still can. Another tool for monitoring memory usage is gmemusage, a GUI program which displays a graphical split-bar chart view of current memory consumption. gmemusage can also display a breakdown of the regions within a program's memory space, eg. text, data, shared memory, etc.

Much lower-level tools exist too, such as sar (system activity reporter). In fact, osview works by using sar. Experienced admins may use tools like sar, but most admins will prefer to use higherlevel tools such as top, osview and gmemusage. However, since sar gives a text output, one can use it in script files for automated system information gathering, eg. a system activity report produced by a script, executed every hour by the cron job-scheduling system (sar-based information gathering scripts are included in the cron job schedule as standard). sar can be given options to report only on selected items, eg. the number of processes in memory waiting for CPU resource time. sar can be told to monitor some system feature for a certain period, saving the data gathered during that period to a file. sar is a very flexible program.

Network Performance and Statistics. osview can be used to monitor certain network statistics, but another useful program is ttcp. The online book, "IRIX Admin: Networking and Mail", says: "The ttcp tool measures network throughput. It provides a realistic measurement of network performance between two stations because it allows measurements to be taken at both the local and remote ends of the transmission."

To run a test with ttcp, enter the following on one system, eg. sevrin: ttcp -r -s

Then enter the following on another system, eg. akira: ttcp -t -s sevrin

After a delay of roughly 20 seconds for a 10Mbit network, results are reported by both systems, which will look something like this: SEVRIN # ttcp -r -s ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from 193.61.252.2 ttcp-r: 16777216 bytes in 18.84 real seconds = 869.70 KB/sec +++ ttcp-r: 3191 I/O calls, msec/call = 6.05, calls/sec = 169.39 ttcp-r: 0.1user 3.0sys 0:18real 16% 118maxrss 0+0pf 1170+1csw AKIRA # ttcp-t: sevrin ttcp-t: ttcp-t: ttcp-t: ttcp-t: ttcp-t:

ttcp -t -s sevrin buflen=8192, nbuf=2048, align=16384/0, port=5001

tcp

->

socket connect 16777216 bytes in 18.74 real seconds = 874.19 KB/sec +++ 2048 I/O calls, msec/call = 9.37, calls/sec = 109.27 0.0user 2.3sys 0:18real 12% 408maxrss 0+0pf 426+4csw

Figure 52. Results from ttcp between two hosts on a 10Mbit network.

Full details of the output are in the ttcp man page, but one can immediately see that the observed network throughput (around 870KB/sec) is at a healthy level.

Another program for gathering network performance information is netstat. The online book, "IRIX Admin: Networking and Mail", says: "The netstat tool displays various network-related data structures that are useful for monitoring and troubleshooting a network. Detailed statistics about network collisions can be captured with the netstat tool."

netstat is commonly used with the -i option to list basic local network information, eg.: yoda # netstat -i Name Mtu Network Coll ec0 1500 193.61.252 553847 ec3 1500 193.61.250 16460 lo0 8304 loopback 0

Address

Ipkts Ierrs

Opkts Oerrs

yoda.comp.uclan

3906956

3

2945002

0

gate-yoda.comp.

560206

2

329366

0

localhost

476884

0

476884

0

Figure 53. The output from netstat.

Here, the packet collision rate has averaged at 18.8%. This is within acceptable limits [1]. Another useful command is 'ping'. This program sends packets of data to a remote system requesting an acknowledgement response for each packet sent. Options can be used to send a specific number of packets, or send as many packets as fast as they are returned, send a packet every so often (user-definable duration), etc. For example: MILAMBER # ping yoda PING yoda.comp.uclan.ac.uk (193.61.252.1): 56 data bytes 64 bytes from 193.61.252.1: icmp_seq=0 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=1 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=2 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=3 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=4 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=5 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=6 ttl=255 time=1 ms ----yoda.comp.uclan.ac.uk PING Statistics---7 packets transmitted, 7 packets received, 0% packet loss round-trip min/avg/max = 1/1/1 ms

Figure 54. Example use of the ping command.

I pressed CTRL-C after the 7th packet was sent. ping is a quick and easy way to see if a host is active and if so how responsive the connection is. If a ping test produces significant packet loss on a local network, then it is highly likely there exists a problem of some kind. Normally, one would rarely see a non-zero packet loss on a local network from a direct machine-to-machine ping test. A fascinating use of ping I once observed was at The Moving Picture Company (MPC) [2]. The admin had written a script which made every host on the network send a ping test to every other host. The results were displayed as a table with host names shown down the left hand side as well as along the top. By looking for horizontal or diagonal lines of unusually large ping times, the admin could immediately see if there was a problem with a single host, or with a larger part of the network. Because of the need for a high system availability rate, the script allows the admin to spot problems almost as soon as they occur, eg. by running the script once every ten seconds. When the admin showed me the script in use, one column had rather high ping times (around 20ms). Logging into the host with rlogin, ps showed everything was ok: a complex process was merely consuming alot of CPU time, giving a slower network response.

System Status and User Status. The rup command offers an immediate overview of current system states, eg.: yoda # rup yoda.comp.uclan.ac.u 0.35 gate-yoda.comp.uclan 0.35 wolfen.comp.uclan.ac 0.00 conan.comp.uclan.ac. 0.00 akira.comp.uclan.ac. 0.00 nikita.comp.uclan.ac 0.00 gibson.comp.uclan.ac 0.00 woo.comp.uclan.ac.uk 0.00 solo.comp.uclan.ac.u 0.00 cameron.comp.uclan.a 0.00

up

6 days,

8:25,

load average: 0.33, 0.36,

up

6 days,

8:25,

load average: 0.33, 0.36,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.06, 0.01,

up

11:28,

load average: 0.01, 0.00,

up

11:28,

load average: 0.03, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.01, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.02, 0.00,

sevrin.comp.uclan.ac 0.50 ash.comp.uclan.ac.uk 0.00 ridley.comp.uclan.ac 0.00 leon.comp.uclan.ac.u 0.00 warlock.comp.uclan.a 0.11 milamber.comp.uclan. 0.00 merlin.comp.uclan.ac 0.00 indiana.comp.uclan.a 0.02 stanley.comp.uclan.a 0.00

up

11:28,

load average: 0.69, 0.46,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

1:57,

load average: 0.08, 0.13,

up

9:52,

load average: 0.11, 0.07,

up

11:28,

load average: 0.01, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

1:56,

load average: 0.00, 0.00,

Figure 55. The output from rup.

The load averages for a single machine can be ascertained by running 'uptime' on that machine, eg.: MILAMBER 84# uptime 8:05pm up 10:28, 6 users, load average: 0.07, 0.06, 0.25 MILAMBER 85# rsh yoda uptime 8:05pm up 6 days, 9:02, 2 users, load average: 0.47, 0.49, 0.42 Figure 56. The output from uptime.

The w command displays current system activity, including what each user is doing. The man page says, "The heading line shows the current time of day, how long the system has been up, the number of users logged into the system, and the load averages." For example: yoda # w 8:10pm up 6 days, 9:07, User tty from root q0 milamber.comp. cmprj ftp UNKNOWN@ns5ip.

2 users, load average: 0.51, 0.50, 0.41 login@ idle JCPU PCPU what 7:02pm 8 w 7:29pm -

Figure 57. The output from w showing current user activity.

With the -W option, w shows the 'from' information on a separate line, allowing one to see the full domain address of ftp connections, etc.: yoda # w -W 8:11pm up 6 days, 9:08, 2 users, load average: 0.43, 0.48, 0.40 User tty login@ idle JCPU PCPU what root ttyq0 7:02pm 8 w -W milamber.comp.uclan.ac.uk cmprj ftp22918 7:29pm [email protected]

Figure 58. Obtaining full domain addresses from w with the -W option.

The rusers command broadcasts to all machines on the local network, gathering data about who is logged on and where, eg.: yoda # rusers yoda.comp.uclan.ac.uk wolfen.comp.uclan.ac.uk gate-yoda.comp.uclan.ac.uk milamber.comp.uclan.ac.uk warlock.comp.uclan.ac.uk

root guest guest root root root root root mapleson mapleson sensjv sensjv

Figure 59. The output from rusers, showing who is logged on where.

The multiple entries for certain users indicate that more than one shell is active for that user. As usual, my login data shows I'm doing several things at once. rusers can be modified with options to:   



report for all machines, whether users are logged in or not (-a), probe a specific machine (supply host name(s) as arguments), display the information sorted alphabetically by: o host name (-h), o idle time (-i), o number of users (-u), give a more detailed output in the same style as the who command (-l).

Service Availability.

The most obvious way to check if a service is available for use by users is to try and use the service, eg. ftp or telnet to a test location, run up a Netscape sessions and enter a familiar URL, send an email to a local or remote account, etc. The ps command can be used to make sure the relevant background process is running for a service too, eg. 'nfsd' for the NFS system. However, if a service is experiencing problems, simply attempting to use the service will not reveal what may be wrong. For example, if one cannot ftp, it could be because of anything from a loose cable connection to some remote server that's gone down. The ping command is useful for an immediate check of network-related services such as ftp, telnet, WWW, etc. One pings each host in the communication chain to see if the hosts respond. If a host somewhere in the chain does not respond, then that host may be preventing any data from getting through (eg. a remote proxy server is down). A useful command one can use to aid in such detective work is traceroute. This command sends test packets in a similar way to ping, but it also reports how the test packets reached the target

site at each stage of the communication chain, showing response times in milliseconds for each step, eg.: yoda # traceroute www.cee.hw.ac.uk traceroute to osiris.cee.hw.ac.uk (137.195.52.12), 30 hops max, 40 byte packets 1 193.61.250.33 (193.61.250.33) 6 ms (ttl=30!) 3 ms (ttl=30!) 4 ms (ttl=30!) 2 193.61.250.65 (193.61.250.65) 5 ms (ttl=29!) 5 ms (ttl=29!) 5 ms (ttl=29!) 3 gw-mcc.netnw.net.uk (194.66.24.1) 9 ms (ttl=28!) 8 ms (ttl=28!) 10 ms (ttl=28!) 4 manchester-core.ja.net (146.97.253.133) 12 ms 11 ms 9 ms 5 scot-pop.ja.net (146.97.253.42) 15 ms 13 ms 14 ms 6 146.97.253.34 (146.97.253.34) 20 ms 15 ms 17 ms 7 gw1.hw.eastman.net.uk (194.81.56.110) 20 ms (ttl=248!) 18 ms 14 ms 8 cee-gw.hw.ac.uk (137.195.166.101) 17 ms (ttl=23!) 31 ms (ttl=23!) 18 ms (ttl=23!) 9 osiris.cee.hw.ac.uk (137.195.52.12) 14 ms (ttl=56!) 26 ms (ttl=56!) 30 ms (ttl=56!)

If a particular step shows a sudden jump in response time, then there may be a communications problem at that step, eg. the host in question may be overloaded with requests, suffering from lack of communications bandwidth, CPU processing power, etc.

At a lower level, system services often depend on background system processes, or daemons. If these daemons are not running, or have shut down for some reason, then the service will not be available. On the SGI Indys, one example is the GUI service which handles the use of on-screen icons. The daemon responsible is called objectserver. Older versions of this particular daemon can occasionally shut down if an illegal iconic operation is performed, or if the file manager daemon experiences an error. With no objectserver running, the on-screen icons disappear. Thus, a typical task might be to periodically check to make sure the objectserver daemon is running on all relevant machines. If it isn't, then the command sequence: /etc/init.d/cadmin stop /etc/init.d/cadmin start

restarts the objectserver. Once running, the on-screen icons return. A common cause of objectserver shutting down is when a user's desktop layout configuration files (contained in .desktop- directories) become corrupted in some way, eg. edited by hand in an incorrect manner, or mangled by some other operation (eg. a faulty Java script from a home made web page). One solution is to erase the user's desktop layout configuration directory, then login as the user and create a fresh .desktop- directory. objectserver is another example of UNIX GUI evolution. In 1996 SGI decided to replace the objectserver system entirely in IRIX 6.3 (and later) with a new service that was much more

reliable, less likely to be affected by errors made in other applications, and fully capable of supporting new 'webified' iconic services such as on-screen icons that are direct links to ftp, telnet or WWW sites. In general, checking the availability of a service requires one to check that the relevant daemons are running, that the appropriate configuration files are in place, accessible and have the correct settings, that the relevant daemon is aware of any changes which may have been made (perhaps the service needs to be stopped and restarted?) and to investigate via online information what may have caused services to fail as and when incidents occur. For every service one can use, the online information explains how to setup, admin and troubleshoot the service. The key is to know where to find that information when it is needed. A useful source of constantly updated status information is the /var/adm/SYSLOG file. This file is where any important system events are logged. One can configure all the various services and daemons to log different degrees of detailed information in the SYSLOG file. Note: logging too much detail can cause the log file to grow very quickly, in which case one would also have to ensure that it did not consume valuable disk space. The SYSLOG file records user logins, connections via ftp, telnet, etc., messages logged at system bootup/shutdown time, and many other things.

Vendor Information Updates. Most UNIX vendors send out periodic information booklets containing indepth articles on various system administration issues. SGI's bulletin is called Pipeline. Such information guides are usually supplied as part of a software support contract, though the vendor will often choose to include copies on the company web site. An admin should read any relevant articles from these guides - they can often be unexpectedly enlightening.

System Hardware Failures. When problems occur on a system, what might at first appear to be a software problem may in fact be a hardware fault. Has a disk failed? The fx program can be used to check disk status (block read tests, disk label checks, etc.) Has a network cable failed? Are all the cable connections firmly in place in the hub? Has a plug come loose? In late 1998, the Ve24 network stopped operating quite unexpectedly one morning. The errors made it appear that there was a problem with the NFS service or perhaps the main user files disk connected to the server; in fact, the fault lay with the Ve24 hub. The online guides have a great deal of advice on how to spot possible hardware failures. My advice is to check basic things first and move onto the more complex possibilities later. In the above example, I wasted a great deal of time investigating whether the NFS service was

responsible, or the external user files disk, when in fact I should have checked the hub connections first. As it happens, the loose connection was such that the hub indicator light was on even though the connection was not fully working (thus, a visual check would not have revealed the problem) - perhaps the fault was caused by a single loose wire out of the 8 running through the cable, or even an internal fault in the hub (more likely). Either way, the hub was eventually replaced. Other things that can go wrong include memory faults. Most memory errors are not hardware errors though, eg. applications with bugs can cause errors by trying to access some protected area of memory. Hardware memory errors will show up in the system log file /var/adm/SYSLOG as messages saying something like 'Hardware ECC Memory Error in SIMM slot 4'. By swapping the memory SIMMs around between the slots, one can identify which SIMM is definitely at fault (assuming there is only one causing the problem). The most common hardware component to go wrong on a system, even a non-PC system, is the disk drive. When configuring systems, or carrying out upgrades/expansions, it is wise to stick with models recommended by the source vendor concerned, eg. SGI always uses high-quality Seagate, IBM or Quantum disk drives for their systems; thus, using (for example) a Seagate drive is a good way to ensure a high degree of reliability and compatibility with the system concerned. Sometimes an admin can be the cause of the problem. For example, when swapping disks around or performing disk tasks such as disk cloning, it is possible to incorrectly set the SCSI ID of the disk. SGI systems expect the system disk to be on SCSI ID 1 (though this is a configurable setting); if the internal disk is on the wrong SCSI ID, then under certain circumstances it can appear to the system as if there are multiple disks present, one on each possible ID. If hardware errors are observed on bootup (the system diagnostics checks), then the first thing to do is to reboot and enter the low-level 'Command Monitor' (an equivalent access method will exist for all UNIX systems): the Command Monitor has a small set of commands available, some of which can be used to perform system status checks, eg. the hinv command. For the problem described above, hinv would show multiple instances of the same disk on all SCSI IDs from 1 to 7 - the solution is to power down and check the SCSI jumpers carefully. Other problems can occasionally be internal, eg. a buildup of dust blocking air vents (leading to overheating), or a fan failure, followed by overheating and eventually an automatic system shutdown (most UNIX systems' power supplies include circuitry to monitor system temperature, automatically shutting down if the system gets too hot). This leads on to questions of system maintenance which will be dealt with on Day 3. After disk failures, the other most common failure is the power supply. It can sometimes be difficult to spot because a failure overnight or when one isn't around can mean the system shuts down, cools off and is thus rebootable again the next morning. All the admin sees is a system that's off for no readily apparent reason the next morning. The solution is to, for example, move the system somewhere close at hand so that it can be monitored, or write a script which tests

whether the system is active every few seconds, logging the time of each successful test - if the system goes down, the admin is notified in some way (eg. audio sound file played) and the admin can then quickly check the machine - if the power supply area feels overly hot, then that is the likely suspect, especially if an off/on mains switch toggle doesn't turn the system back on (power supplies often have circuitry which will not allow power-on if the unit is still too hot). If the admin wasn't available at the time, then the logged results can show when the system failed. All SGIs (and UNIX systems in general) include a suite of hardware and software diagnostics tests as part of the OS. IRIX contains a set of tests for checking the mouse, keyboard, monitor, audio ports, digital camera and other basic hardware features. Thankfully, for just about any hardware failure, hardware support contracts cover repairs and/or replacements very effectively for UNIX systems. It's worth noting that although the Computing Department has a 5-day support contract with SGI, all problems I've encountered so far have been dealt either on the same day or early next morning by a visiting support engineer (ie. they arrived much earlier than they legally had to). Since November 1995 when I took charge of the Ve24 network, the hardware problems I've encountered have been:         

2 failed disks 1 replaced power supply 2 failed memory SIMMs (1 failed SIMM from two different machines) 1 replaced keyboard (user damage) 1 failed monitor 1 suspect motherboard (replaced just in case) 1 suspect video card (replaced just in case) 1 problematic 3rd-party disk (incorrect firmware, returned to supplier and corrected with up-todate firmware; now operating ok) 1 suspect hub (unknown problem; replaced just in case)

Given that the atmosphere in Ve24 is unfiltered, often humid air, and the fact that many of the components in the Indys in Ve24 have been repeatedly swapped around to create different configurations at different times, such a small number of failures is an excellent record after nearly 4 years of use.

It is likely that dirty air (dust, humidity, corrosive gases) was largely responsible for the disk, power supply and memory failures - perhaps some of the others too. A build up of dust can combine with airborne moisture to produce corrosive chemicals which can short-circuit delicate components. To put the above list another way: 14 out of the 18 Indys have been running non-stop for 3.5 years without a single hardware failure of any kind, despite being housed in an area without filtered air or temperature control. This is very impressive and is quite typical of UNIX hardware platforms.

Installing systems with proper air filtering and temperature control can be costly, but the benefit may be a much reduced chance of hardware failure - this could be important for sites with many more systems and a greater level of overall system usage (eg. 9 to 5 for most machines). Some companies go to great lengths to minimise the possibility of hardware failure. For example, MPC [2] has an Origin200 render farm for rendering movie animation frames. The render farm consists of 50 Origin200 servers, each with 2 R10000 CPUs, ie. 100 CPUs in total. The system is housed in a dedicated room with properly filtered air and temperature control. Almost certainly as a result of this high-quality setup, MPC has never had a single hardware failure of any kind in nearly 3 years of operation. Further, MPC has not experienced a single OS failure over the same period either, even though the system operates 24hours/day. This kind of setup is common amongst companies which have time-critical tasks to perform, eg. oil companies with computational models that can take six months to complete - such organisations cannot afford to have failures (the problem would likely have to be restarted from scratch, or at least delayed), so it's worth spending money on air filters, etc. If one does not have filtered air, then the very least one should do is keep the systems clean inside and out, performing system cleaning on a regular basis. At present, my current policy is to thoroughly clean the Indys twice a year: every machine is stripped right down to the bare chassis; every component is individually cleaned with appropriate cleaning solutions, cloths, air-dusters, etc. (this includes removing every single key from all the keyboards and mass-cleaning them with a bucket of hot water and detergent! And cleaning the keyboard bases inside and out too). Aside from these major bi-annual cleanings, simple regular cleaning is performed on a weekly or monthly basis: removing dirt from the mice (inside especially), screen, chassis/monitor surface, cables and so on; cleaning the desks; opening each system and blowing away internal dust using a can of compressed filtered air, etc. Without a doubt, this process greatly lengthens the life-span of the systems' hardware components, and users benefit too from a cleaner working environment - many new students each autumn often think the machines must be new because they look so clean. Hardware failures do and will occur on any system whether it's a UNIX platform or not. An admin can use information from online sources, combined with a knowledge of relevant system test tools such as fx and ping, to determine the nature of hardware failures and take corrective action (contacting vendor support if necessary); such a strategy may include setting up automated hardware tests using regularly-executed scripts. Another obvious source of extensive information about any UNIX platform is the Internet. Hundreds of existing users, including company employees, write web pages [3] or USENET posts describing their admin experiences and how to deal with typical problems.

Suspicious/Illegal Activity. Users inevitably get up to mischief on occasion, or external agencies may attempt to hack the system. Types of activity could include:     

users downloading illegal or prohibited material, either with respect to national/local laws or internal company policy, accessing of prohibited sites, eg. warez software piracy sites, mail spamming and other abuses of Internet services, attacks by hackers, misuse/abuse of system services internally.

There are other possibilities, but these are the main areas. This lecture is an introduction to security and monitoring issues. A more in-depth discussion is given in the last lecture.

As an admin who is given the task of supposedly preventing and/or detecting illegal activities, the first thing which comes to mind is the use of various file-searching methods to locate suspect material, eg. searching every user's netscape bookmarks file for particular keywords. However, this approach can pose legal problems. Some countries have data protection and/or privacy laws [4] which may prohibit one from arbitrarily searching users' files. Searches of this type are the equivalent of a police force tapping all the phones in an entire street and recording every single conversation just on the off-chance that they might record something interesting; such methods are sometimes referred to as 'fishing' and could be against the law. So, for example, the following command might be illegal: find grep grep grep grep grep grep grep grep

/home/students -name "*" -print > list sex list > suspected warez list >> suspected xxx list >> suspected pics list >> suspected mpg list >> suspected jpg list >> suspected gif list >> suspected sites list >> suspected

As a means of finding possible policy violations, the above script would be very effective, but it's definitely a form of fishing (even the very first line). Now consider the following: find /home/students -name "bookmarks.html" -print -exec grep playboy {} \;

This command will effectively locate any Netscape bookmarks file which contains a possible link to the PlayBoy web site. Such a command is clearly looking for fairly specific content in a very specific file in each user's .netscape directory; further, it is probably accessing a user's

account space without her or his permission (this opens the debate on whether 'root' even needs a user's permission since root actually owns all files anyway - more on this below). The whole topic of computer file access is a grey area. For example, might the following command also be illegal? find . -name "*.jpg" -print > results && grep sex results

A user's lawyer could argue that it's clearly looking for any JPEG image file that is likely to be of an explicit nature. On the other hand, an admin's lawyer could claim the search was actually looking for any images relating to tourism in Sussex county, or musical sextets, or adverts for local unisex hair salons, and just accidentally happened to be in a directory above /home/students when the command was executed (the find would eventually reach /home/students). Obviously a setting for a messy court-room battle. But even ignoring actions taken by an admin using commands like find, what about data backups? An extremely common practice on any kind of computer system is to backup user data to a media such as DAT on a regular basis - but isn't this accessing user files without permission? But hang on, on UNIX systems, the root user is effectively the absolute owner of any file, eg. suppose a file called 'database' in /tmp, owned by an ordinary user, contained some confidential data; if the the admin (logged in as root) then did this: cat /tmp/database

the contents of the database file would indeed be displayed. Thus, since root basically owns all files anyway by default, surely a backup procedure is just the root user archiving files it already owns? If so, does one instead have to create some abstract concept of ownership in order to offer users a concrete concept of what data privacy actually is? Who decides? Nations which run their legal systems using case-law will find these issues very difficult to clarify, eg. the UK's Data Protection Act is known to be 'weak'. Until such arguments are settled and better laws created, it is best for an admin to err on the side of caution. For example, if an admin wishes to have some kind of regular search conducted, the existence of the search should be included as part of stated company policy, and enshrined into any legal documents which users must sign before they begin using the system, ie. if a user signs the policy document, then the user has agreed to the actions described in that document. Even then, such clauses may not be legally binding. An admin could also setup some form of login script which would require users to agree to a sytsem usage policy before they were fully loggedin. However, these problems won't go away, partly because of the specifics of how some modern Internet services such as the web are implemented. For example, a user could access a site which automatically forces the pop-up of a Netscape window which is directed to access a prohibited site; inline images from the new site will then be present in the user's Netscape cache directory in their home account area even though they haven't specifically tried to download anything. Are they legally liable? Do such files even count as personal data? And if the site has its own proxy

server, then the images will also be in the server's proxy cache - are those responsible for the server also liable? Nobody knows. Legal arguments on the nature of cache directories and other file system details have not yet been resolved. Clearly, there is a limit to how far one can go in terms of prevention simply because of the way computing technologies work. Thus, the best thing to do is to focus efforts on information that does not reside inside user accounts. The most obvious place is the system log file, /var/adm/SYSLOG. This file will show all the ftp and telnet sites which users have been accessing; if one of these sites is a prohibited place, then that is sufficient evidence to take action. The next most useful data resource to keep an eye on is the web server log(s). The web logs record every single access by all users to the WWW. Users have their own record of their accesses in the form of a history file, hidden in their home directory inside the .netscape directory (or other browser); but the web logs are outside their accounts and so can be probably be freely examined, searched, processed, etc. by an admin without having to worry about legal issues. Even here though, there may be legal issues, eg. log data often includes user IDs which can be used to identify specific individuals and their actions - does a user have a legal right to have such data kept private? Only a professional lawyer in the field would know the correct answer. Note: the amount of detail placed into a log file can be changed to suit the type of logging required. If a service offers different levels of logging, then the appropriate online documentation will explain how to alter the settings.

Blocking Sites. If an admin does not want users to be able to access a particular site, then that site can be added to a list of 'blocked' sites by using the appropriate option in the web server software concerned, eg. Netscape Enterprise Server, CERN web server, Apache web server, etc. Even this may pose legal problems if a country has any form of freedom-of-speech legislation though (non-existent in the UK at present, so blocking sites should be legally OK in the UK). However, blocking sites can become somewhat cumbersome because there are thousands of web sites which an admin could theoretically have to deal with - once the list becomes quite large, web server performance decreases as every access has to have its target URL checked against the list of banned sites. So, if an admin does choose to use such a policy, it is best to only add sites when necessary, and to construct some kind of checking system so that if no attempt is made to access a blocked site after a duration of, say, two weeks (whatever), then that site is removed from the list of blocked sites. In the long term, such a policy should help to keep the list to a reasonably manageable size. Even so, just the act of checking the web logs and adding sites to the list could become a costly time-consuming process (time = money = wages). One can also use packet filtering systems such as hardware routers or software daemons like ipfilterd which can accept, reject, or reject-and-log incoming packets based on source/destination IP address, host name, network interface, port number, or any combination of these. Note that

daemons such as ipfilterd may require the presence of a fast CPU if the overhead from a busy site is to be properly supported. The ipfilterd system is discussed in detail on Day 3.

System Temporary Directories. An admin should keep a regular eye on the contents of temporary directories on all systems, ie. /tmp and /var/tmp. Users may download material and leave the material lying around for anyone to see. Thus, a suspicious file can theoretically be traced to its owner via the user ID and group ID of the file. I say theoretically because, as explained elsewhere, it is possible for a user X to download a file (eg. by ftp so as to avoid the web logs, or by telnet using a shell on a remote system) and then 'hand over' ownership of the file to someone else (say user Y) using the chgrp and chown commands, making it look as though a different user is responsible for the file. In that sense, files found outside a user's home directory could not normally be used as evidence, though they would at least alert the admin to the fact that suspect activities may be occurring, permitting a refocusing of monitoring efforts, etc. However, one way in which it could be possible to reinforce such evidence is by being able to show that user Y was not logged onto the system at the time when the file in question was created (this information can be gleaned from the system's own local /var/adm/SYSLOG file, and the file's creation time and date). Unfortunately, both users could have been logged onto the same system at the time of the file's creation. Thus, though a possibility, the extra information may not help. Except in one case: video evidence. If one can show by security camera recordings that user X did indeed login 'on console' (ie. at the actual physical keyboard) then that can be tied in with SYSLOG data plus file creation times, irrespective of what user Y was doing at the time. Certainly, if someone wished to frame a user, it would not be difficult to cause a considerable amount of trouble for that user with just a little thought on how to access files, where to put them, changing ownership, etc. In reality, many admins probably just do what they like in terms of searching for files, examining users' areas, etc. This is because there is no way to prove someone has attempted to search a particular part of the file system - UNIX doesn't keep any permanent record of executed commands. Ironically, the IRIX GUI environment does keep a record of any file-related actions taken with the GUI system (icons, file manager windows, directory views, etc.) but the log file with this information is kept inside the user's .desktop- directory and thus may be legally out of bounds.

File Access Permissions. Recall the concept of file access permissions for files. If a user has a directory or file with its permissions set so that another ordinary user can read it (ie. not just root, who can access

anything by default anyway), does the fact that the file is globally readable mean the user has by default given permission for anyone else to read the file? If one says no, then that would mean it is illegal to read any user's own public_html web area! If one says yes, and a legal body confirmed this for the admin, then that would at least enable the admin to examine any directory or file that had the groups and others permissions set to a minimum of read-only (read and executable for directories). The find command has an option called -perm which allows one to search for files with permissions matching a given mode. If nothing else, such an ability would catch out careless users since most users are not aware that their account has hidden directories such as .netscape. An admin ought to make users aware of security issues beforehand though.

Backup Media. Can an admin search the data residing on backup media? (DAT, CDR, ZIP, DLT, etc.) After all, the data is no longer inside the normal home account area. In my opinion yes, since root owns all files anyway (though I've never done such a search), but others might disagree. For that matter, consider the tar commands commonly used to perform backups: a full backup accesses every file on the file system by default (ie. including all users' files, whatever the permissions may be), so are backups a problem area? Yet again, one can easily see how legal grey areas emerge concerning the use of computing technologies.

Conclusion. Until the law is made clearer and brought up-to-date (unlikely) the best an admin can do is consult any available internal legal team, deciding policy based on any advice given.

References: 1. "Ethernet Collisions on Silicon Graphics Systems", SGI Pipeline magazine (support info bulletin), July/August 1998 (NB: URL not accessible to those without a software support contract): 2. http://support.sgi.com/surfzone/content/pipeline/html/19980404EthernetC ollisions.html

3. The Moving Picture Company, Soho Square, London. Responsible for some or all of the special effects in Daylight, The English Patient, Goldeneye, The Borrowers, and many other feature films, music videos, adverts, etc. Hardware used: several dozen Octane workstations, many Onyx2 graphics supercomputers, a 6.4TB Ampex disk rack with real-time Philips cinescan film-todigital-video converter (cinema resolution 70mm uncompressed video converter; 250K's worth),

Challenge L / Discrete Logic video server, a number of O2s, various older SGI models such as Onyx RealityEngine2, Indigo2, Personal IRIS, etc., some high-end Apple Macs and a great deal of dedicated video editing systems and VCRs, supported by a multi-gigabit network. I saw one NT system which the admin said nobody used.

4. The SGI Tech/Advice Centre: Holland #1: http://www.futuretech.vuurwerk.nl/ 5. Worldwide Mirror Sites: Holland #2: http://sgi.webguide.nl/ 6. Holland #3: http://sgi.webcity.nl/ 7. USA: http://sgi.cartsys.net/ 8. Canada: http://sgi-tech.unixology.com/

9. 10. The Data Protection Act 1984, 1998. Her Majesty's Stationary Office (HMSO): http://www.hmso.gov.uk/acts/acts1984/1984035.htm

Detailed Notes for Day 2 (Part 4) UNIX Fundamentals: Further Shell scripts. for/do Loops. The rebootlab script shown earlier could be rewritten using a for/do loop, a control structure which allows one to execute a series of commands many times. Rewriting the rebootlab script using a for/do loop doesn't make much difference to the complexity of this particular script, but using more sophisticated shell code is worthwhile when one is dealing with a large number of systems. Other benefits arise too; a suitable summary is given at the end of this discussion. The new version could be rewritten like this: #!/bin/sh for machine in akira ash cameron chan conan gibson indiana leon merlin \ nikita ridley sevrin solo stanley warlock wolfen woo do echo $machine rsh $machine init 6& done

The '\' symbol is used to continue a line onto the next line. The 'echo' line displays a comment as each machine is dealt with. This version is certainly shorter, but whether or not it's easier to use in terms of having to modify the list of host names is open to argument, as opposed to merely commenting out the relevant lines in the original version. Even so, if one happened to be writing a script that was fairly lengthy, eg. 20 commands to run on every system, then the above format is obviously much more efficient. Similarly, the remountmapleson script could be rewritten as follows: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ nikita ridley sevrin solo stanley warlock wolfen woo do echo $machine rsh $machine "umount /mapleson && mount /mapleson" done

Note that in this particular case, the command to be executed must be enclosed within quotes in order for it to be correctly sent by rsh to the remote system. Quotes like this are normally not needed; it's only because rsh is being used in this example that quotes are required.

Also note that the '&' symbol is not used this time. This is because the rebootlab procedure is asynchronous, whereas I want the remountdir script to output its messages just one action at a time. In other words, for the rebootlab script, I don't care in what order the machines reboot, so each rsh call is executed as a background process on the remote system, thus the rebootlab script doesn't wait for each rsh call to return before progressing. By contrast, the lack of a '&' symbol in remountdir's rsh command means the rsh call must finish before the script can continue. As a result, if an unexpected problem occurs, any error message will be easily noticed just by watching the output as it appears.

Sometimes a little forward thinking can be beneficial; suppose one might have reason to want to do exactly the same action on some other NFS-mounted area, eg. /home, or /var/mail, then the script could be modified to include the target directory as a single argument supplied on the command line. The new script looks like this: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ nikita ridley sevrin solo stanley warlock wolfen woo do echo $machine rsh $machine "umount $1 && mount $1" done

The script would probably be renamed to remountdir (whatever) and run with: remountdir /mapleson

or perhaps: remountdir /home

if/then/else constructs. But wait a minute, couldn't one use the whole concept of arguments to solve the problem of communicating to the script exactly which hosts to deal with? Well, a rather useful feature of any program is that it will always return a result of some kind. Whatever the output actually is, a command always returns a result which is defined to be true or false in some way. Consider the following command: grep target database

If grep doesn't find 'target' in the file 'database', then no output is given. However, as a program that has been called, grep has also passed back a value of 'FALSE' - the fact that grep does this is simply invisible during normal usage of the command. One can exploit this behaviour to create a much more elegant script for the remountdir command. Firstly, imagine that I as an admin keep a list of currently active hosts in a file called 'live' (in my case, I'd probably keep this file in /mapleson/Admin/Machines). So, at the present time, the file would contain the following: yoda akira ash cameron chan conan gibson indiana leon merlin nikita ridley sevrin solo stanley warlock wolfen woo

ie. the host called spock is not listed. The remountdir script can now be rewritten using an if/then construct: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live; then echo Remounting $1 on $machine... rsh $machine "umount $1 && mount $1" fi done

This time, the complete list of hosts is always used in the script, ie. once the script is rewritten, it doesn't need to be altered again. For each machine, the grep command searches the 'live' file for the target name; if it finds the name, then the result is some output to the screen from grep, but also a 'TRUE' condition, so the echo and rsh commands are executed. If grep doesn't find the target host name in the live file then that host is ignored.

The result is a much more elegant and powerful script. For example, suppose some generous agency decided to give the department a large amount of money for an extra 20 systems: the only changes required are to add the names of the new hosts to remountdir's initial list, and to add the names of any extra active hosts to the live file. Along similar lines, when spock finally is returned to the lab, its name would be added to the live file, causing remountdir to deal with it in the future. Even better, each system could be setup so that, as long as it is active, the system tells the server every so often that all is well (a simple script could achieve this). The server brings the results together on a regular basis, constantly keeping the live file up-to-date. Of course, the server includes its own name in the live file. A typical interval would be to update the live file every minutes. If an extra program was written which used the contents of the live file to create some kind of visual display, then an admin would know in less than a minute when a system had gone down. Naturally, commercial companies write professional packages which offer these kinds of services and more, with full GUI-based monitoring, but at least it is possible for an admin to create home-made scripts which would do the job just as well.

/dev/null. There is still an annoying feature of the script though: if grep finds a target name in the live file, the output from grep is visible on the screen which we don't really want to see. Plus, the umount command will return a message if /mapleson wasn't mounted anyway. These messages clutter up the main 'trace' messages. To hide the messages, one of UNIX's special device files can be used. Amongst the various device files in the /dev directory, one particularly interesting file is called /dev/null. This device is known as a 'special' file; any data sent to the device is discarded, and the device always returns zero bytes. Conceptually, /dev/null can be regarded as an infinite sponge - anything sent to it is just ignored. Thus, for dealing with the unwanted grep output, one can simply redirect grep's output to /dev/null. The vast majority of system script files use this technique, often many times even in a single script. Note: descriptions of all the special device files /dev are given in Appendix C of the online book, "IRIX Admin: System Configuration and Operation". Since grep returns nothing if a host name is not in the live file, a further enhancement is to include an 'else' clause as part of the if construct so that a separate message is given for hosts that are currently not active. Now the final version of the script looks like this: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \

spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live > /dev/null; then echo Remounting $1 on $machine... rsh $machine "umount $1 && mount $1" else echo $machine is not active. fi done

Running the above script with 'remountdir /mapleson' gives the following output: Checking yoda... Remounting /mapleson Checking akira... Remounting /mapleson Checking ash... Remounting /mapleson Checking cameron... Remounting /mapleson Checking chan... Remounting /mapleson Checking conan... Remounting /mapleson Checking gibson... Remounting /mapleson Checking indiana... Remounting /mapleson Checking leon... Remounting /mapleson Checking merlin... Remounting /mapleson Checking spock... spock is not active. Checking nikita... Remounting /mapleson Checking ridley... Remounting /mapleson Checking sevrin... Remounting /mapleson Checking solo... Remounting /mapleson Checking stanley... Remounting /mapleson Checking warlock... Remounting /mapleson Checking wolfen... Remounting /mapleson Checking woo... Remounting /mapleson

on yoda... on akira... on ash... on cameron... on chan... on conan... on gibson... on indiana... on leon... on merlin...

on nikita... on ridley... on sevrin... on solo... on stanley... on warlock... on wolfen... on woo...

Notice the output from grep is not shown, and the different response given when the script deals with the host called spock. Scripts such as this typically take around a minute or so to execute, depending on how quickly each host responds. The rebootlab script can also be rewritten along similar lines to take advantage of the new 'live' file mechanism, but with an extra if/then structure to exclude yoda (the rebootlab script is only meant to reboot the lab machines, not the server). The extra if/then construct uses the 'test' command to compare the current target host name with the word 'yoda' - the rsh command is only executed if the names do not match; otherwise, a message is given stating that yoda has been excluded. Here is the new rebootlab script: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live > /dev/null; then if test $machine != yoda; then echo Rebooting $machine... rsh $machine init 6& else echo Yoda excluded. fi else echo $machine is not active. fi done

Of course, an alternative way would be to simply exclude 'yoda' from the opening 'for' line. However, one might prefer to always use the same host name list in order to minimise the amount of customisation between scripts, ie. to create a new script just copy an existing one and modify the content after the for/do structure. Notes: 



All standard shell commands and other system commands, programs, etc. can be used in shell scripts, eg. one could use 'cd' to change the current working directory between commands. An easy way to ensure that a particular command is used with the default or specifically desired behaviour is to reference the command using an absolute path description, eg. /bin/rm instead of just rm. This method is frequently found in system shell scripts. It also ensures that the scripts are not confused by any aliases which may be present in the executing shell.



Instead of including a raw list of hosts in the script at beginning, one could use other commands such as grep, awk, sed, perl and cut to obtain relevant host names from the /etc/hosts file, one at a time. There are many possibilities.

Typically, as an admin learns the existence of new commands, better ways of performing tasks are thought of. This is perhaps one reason why UNIX is such a well-understood OS: the process of improving on what has been done before has been going on for 30 years, largely because much of the way UNIX works can be examined by the user (system script files, configuration files, etc.) One can imagine the hive of activity at BTL and Berkeley in the early days, with suggestions for improvements, additions, etc. pouring in from enthusiastic testers and volunteers. Today, after so much evolution, most basic system scripts and other files are probably as good as they're going to be, so efforts now focus on other aspects such as system service improvements, new technology (eg. Internet developments, NSD), security enhancements, etc. Linux evolved in a very similar way. I learned shell programming techniques mostly by looking at existing system scripts and reading the relevant manual pages. An admin's shell programming experience usually begins with simple sequential scripts that do not include if/then structures, for loops, etc. Later on, a desire to be more efficient gives one cause to learn new techniques, rewriting earlier work as better ideas are formed.

Simple scripts can be used to perform a wide variety of tasks, and one doesn't have to make them sophisticated or clever to get the job done - but with some insightful design, and a little knowledge of how the more useful aspects of UNIX work, one can create extremely flexible scripts that can include error checking, control constructs, progress messages, etc. written in a way which does not require them to be modified, ie. external ideas, such as system data files, can be used to control script behaviour; other programs and scripts can be used to extract information from other parts of the system, eg. standard configuration files. A knowledge of the C programming language is clearly helpful in writing shell scripts since the syntax for shell programming is so similar. An excellent book for this is "C Programming in a UNIX Environment", by Judy Kay & Bob Kummerfeld (Addison Wesley Publishing, 1989. ISBN: 0 201 12912 4).

Other Useful Commands. A command found in many of the numerous scripts used by any UNIX OS is 'test'; typically used to evaluate logical expressions within 'if' clauses, test can determine the existence of files, status of access permissions, type of file (eg. ordinary file, directory, symbolic link, pipe, etc.), whether or not a file is empty (zero size), compare strings and integers, and other possibilities. See the test man page for full details. For example, the test command could be used to include an error check in the rebootlab script, to ascertain whether the live file is accessible:

#!/bin/sh if test -r /mapleson/Admin/Machines/live; then for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live > /dev/null; then if test $machine != yoda; then echo Rebooting $machine... rsh $machine init 6& else echo Yoda excluded. fi else echo $machine is not active. fi done else echo Error: could not access live file, or file is not readable. fi

NOTE: Given that 'test' is a system command... % which test /sbin/test

...any user who creates a program called test, or an admin who writes a script called test, will be unable to execute the file unless one of the following is done:   

Use a complete pathname for the file, eg. /home/students/cmpdw/test Insert './' before the file name Alter the path definition ($PATH) so that the current directory is searched before /sbin (dangerous! The root user should definitely not do this).

In my early days of learning C, I once worked on a C program whose source file I'd called simply test.c - it took me an hour to realise why nothing happened when I ran the program (obviously, I was actually running the system command 'test', which does nothing when given no arguments except return an invisible 'false' exit status). Problem Question 1. Write a script which will locate all .capture.mv.* directories under /home and remove them safely. You will not be expected to test this for real, but feel free to create 'mini' test directories if required by using mkdir. Modify the script so that it searches a directory supplied as a single argument ($1). Relevant commands: find, rm

Tips:  

Research the other possible options for rm which might be useful. Don't use your home directory to test out ideas. Use /tmp or /var/tmp.

Problem Question 2. This is quite a complicated question. Don't feel you ought to be able to come up with an answer after just one hour. I want to be able to keep an eye on the amount of free disk space on all the lab machines. How could this be done? If a machine is running out of space, I want to be able to remove particular files which I know can be erased without fear of adverse side effects, including:   

Unwanted user files left in /tmp and /var/tmp, ie. files such as movie files, image files, sound files, but in general any file that isn't owned by root. System crash report files left in /var/adm/crash, in the form of unix.K and vmcore.K.comp, where K is some digit. Unwanted old system log information in the file /var/adm/SYSLOG. Normally, the file is moved to oSYSLOG minus the last 10 or 20 lines, and a new empty SYSLOG created containing the aforementioned most recent 10 or 20 lines.

a. Write a script which will probe each system for information, showing disk space usage. b. Modify the script (if necessary) so that it only reports data for the local system disk. c. Add a means for saving the output to some sort of results file or files. d. Add extra features to perform space-saving operations such as those described above. Advanced: e. Modify the script so that files not owned by root are only removed if the relevant user is not logged onto the target system. Relevant commands: grep, df, find, rm, tail, cd, etc.

UNIX Fundamentals: Application Development Tools. A wide variety of commands, programs, tools and applications exist for application development work on UNIX systems, just as for any system. Some come supplied with a UNIX OS asstandard, some are free or shareware, while others are commercial packages. An admin who has to manage a system which offers these services needs to be aware of their existence because there are implications for system administration, especially with respect to installed software. This section does not explain how to use these tools (even though an admin would probably find many of them useful for writing scripts, etc.) The focus here is on explaining what tools are available and may exist on a system, where they are usually located (or should be installed if an admin has to install non-standard tools), and how they might affect administration tasks and/or system policy. There tend to be several types of software tools: 1. Software executed usually via command line and written using simple editors, eg. basic compilers such as cc, development systems such as the Sun JDK for Java. Libraries for application development, eg. OpenGL, X11, Motif, Digital Media Libraries - such library resources will include example source code and programs, eg. X11 Demo Programs. In both cases, online help documents are always included: man pages, online books, hints & tips, local web pages either in /usr/share or somewhere else such as /usr/local/html. 2. Higher-level toolkits providing an easier way of programming with various libraries, eg. Open Inventor. These are often just extra library files somewhere in /usr/lib and so don't involve executables, though example programs may be supplied (eg. SceneViewer, gview, ivview). Any example programs may be in custom directories, eg. SceneViewer is in /usr/demos/Inventor, ie. users would have to add this directory to their path in order to be able to run the program. These kinds of details are in the release notes and online books. Other example programs may be in /usr/sbin (eg. ivview). 3. GUI-based application development systems for all manner of fields, eg. WorkShop Pro CASE tools for C, C++, Ada, etc., CosmoWorlds for VRML, CosmoCreate for HTML, CosmoCode for Java, RapidApp for rapid prototyping, etc. Executables are usually still accessible by default (eg. cvd appears to be in /usr/sbin) but the actual programs are normally stored in application-specific directories, eg. /usr/WorkShop, /usr/CosmoCode, etc. (/usr/sbin/cvd is a link to /usr/WorkShop/usr/sbin/cvd). Supplied online help documents are in the usual locations (/usr/share, etc.) 4. Shareware/Freeware programs, eg. GNU, Blender, XV, GIMP, XMorph, BMRT. Sometimes such software comes supplied in a form that means one can install it anywhere (eg. Blender) - it's up to the admin to decide where (/usr/local is the usual place). Other type of software installs automatically to a particular location, usually

/usr/freeware or /usr/local (eg. GIMP). If the admin has to decide where to install the software, it's best to follow accepted conventions, ie. place such software in /usr/local (ie. executables in /usr/local/bin, libraries in /usr/local/lib, header files in /usr/local/include, help documents in /usr/local/docs or /usr/local/html, source code in /usr/local/src). In all cases, it's the admin's responsibility to inform users of any new software, how to use it, etc. The key to managing these different types of tools is consistency; don't put one shareware program in /usr/local and then another in /usr/SomeCustomName. Users looking for online source code, help docs, etc. will become confused. It also complicates matters when one considers issues such as library and header file locations for compiling programs. Plus, consistency eases other aspects of administration, eg. if one always uses /usr/local for 3rd-party software, then installing this software onto a system which doesn't yet have it is a simple matter of copying the entire contents of /usr/local to the target machine. It's a good idea to talk to users (perhaps by email), ask for feedback on topics such as how easy it is to use 3rd-party software, are there further programs they'd like to have installed to make their work easier, etc. For example, a recent new audio standard is MPEG3 (MP3 for short); unknown to me until recently, there exists a freeware MP3 audio file player for SGIs. Unusually, the program is available off the Net in executable form as just a single program file. Once I realised that users were trying to play MP3 files, I discovered the existence of the MP3 player and installed it in /usr/local/bin as 'mpg123'. My personal ethos is that users come first where issues of carrying out their tasks are concerned. Other areas such as security, etc. are the admin's responsibility though - such important matters should either be left to the admin or discussed to produce some statement of company policy, probably via consulation with users, managers, etc. For everyday topics concerning users getting the most out of the system, it's wise for an admin to do what she/he can to make users' lives easier.

General Tools (editors). Developers always use editing programs for their work, eg. xedit, jot, nedit, vi, emacs, etc. If one is aware that a particular editor is in use, then one should make sure that all appropriate components of the relevant software are properly installed (including any necessary patches and bug fixes), and interested users notified of any changes, newly installed items, etc. For example, the jot editor is popular with many SGI programmers because it has some extra features for those programming in C, eg. an 'Electric C Mode'. However, a bug exists in jot which can cause file corruption if jot is used to access files from an NFSmounted directory. Thus, if jot is being used, then one should install the appropriate patch

file to correct the bug, namely Patch 2051 (patch CDs are supplied as part of any software support contract, but most patches can also be downloaded from SGI's ftp site). Consider searching the vendor's web site for information about the program in question, as well as the relevant USENET newsgroups (eg. comp.sys.sgi.apps, comp.sys.sgi.bugs). It is always best to prevent problems by researching issues beforehand. Whether or not an admin chooses to 'support' a particular editor is another matter; SGI has officially switched to recommending the nedit editor for users now, but many still prefer to use jot simply because of familiarity, eg. all these course notes have been typed using jot. However, an application may 'depend' on minor programs like jot for particular functions. Thus, one may have to install programs such as jot anyway in order to support some other application (dependency). An example in the case of the Ve24 network is the emacs editing system: I have chosen not to support emacs because there isn't enough spare disk space available to install emacs on the Indys which only have 549MB disks. Plus, the emacs editor is not a vendorsupplied product, so my position is that it poses too many software management issues to be worth using, ie. unknown bug status, file installation location issues, etc. Locations: editors are always available by default; executables tend to be in /usr/sbin, so users need not worry about changing their path definition in order to use them. All other supplied-as-standard system commands and programs come under the heading of general tools.

Compilers. There are many different compilers which might have to be installed on a system, eg.: Programming Language

Compiler Executable

C C++ Ada Fortran77 Fortran90 Pascal

cc, gcc CC ? f77 f90 ?

Some UNIX vendors supply C and C++ compilers as standard, though licenses may be required. If there isn't a supplied compiler, but users need one, then an admin can install the GNU compilers which are free. An admin must be aware that the release versions of software such as compilers is very important to the developers who use them (this actually applies to all types of software). Installing an update to a compiler might mean the libraries have fewer bugs, better

features, new features, etc., but it could also mean that a user's programs no longer compile with the updated software. Thus, an admin should maintain a suitable relationship with any users who use compilers and other similar resources, ie. keep each other informed of relevant issues, changes being made or requested, etc. Another possibility is to manage the system in such a way as to offer multiple versions of different software packages, whether that is a compiler suite such as C development kit, or a GUI-based application such as CosmoWorlds. Multiple versions of low-level tools (eg. cc and associated libraries, etc.) can be supported by using directories with different names, or NFS-mounting directories/disks containing software of different versions, and so on. There are many possibilities - which one to use depends on the size of the network, ease of management, etc. Multiple versions of higher-level tools, usually GUI-based development environments though possibly ordinary programs like Netscape, can be managed by using 'wrapper' scripts: the admin sets an environment variable to determine which version of some software package is to be the default; when a system is booted, the script is executed and uses the environment variable to mount appropriate directories, execute any necessary initialisation scripts, background daemons, etc. Thus, when a user logs in, they can use exactly the same commands but find themselves using a different version of the software. Even better, an admin can customise the setup so that users themselves can decide what version they want to use; logging out and then logging back in again would then reset all necessary settings, path definitions, command aliases, etc. MPC operates its network in this way. They use high-end professional film/video effects/animation tools such as Power Animator, Maya, Flame, etc. for their work, but the network actually has multiple versions of each software package available so that animators and artists can use the version they want, eg. for compatibility reasons, or personal preferences for older vs. newer features. MPC uses wrapper scripts of a type which require a system reboot to change software version availability, though the systems have been setup so that a user can initiate the reboot (I suspect the reboot method offers better reliability). Locations: Executables are normally in /usr/sbin, libraries in /usr/lib, header files in /usr/include and online documents, etc. in /usr/share. Note also that the release notes for such products contain valuable information for administrators (setup advice) and users alike.

Debuggers. Debugging programs are usually part of a compilation system, so everything stated above for compilers applies to debuggers as well. However, it's perfectly possible for a user to use a debugger that's part of a high-level GUI-based application development toolkit to debug programs that are created using low-level tools such as jot and xedit. A typical

example on the Ve24 machines is students using the cvd program (from the WorkShop Pro CASE Tools package) to debug their C programs, even though they don't use anything else from the comprehensive suite of CASE tools (source code management, version control, documentation management, rapid prototyping, etc.) Thus, an admin must again be aware that users may be using features of high-level tools for specific tasks even though most work is done with low-level tools. Hence, issues concerning software updates arise, eg. changing software versions without user consulation could cause problems for existing code.

High-level GUI-based Development Toolkits. Usually vendor-supplied or commercial in nature, these toolkits include products such as CosmoCode (Java development with GUI tools), RapidApp, etc. As stated above, there are issues with respect to not carrying out updates without proper consideration to how the changes may affect users who use the products, but the ramifications are usually much less serious than low-level programs or shareware/freeware. This is because the software supplier will deliberately develop new versions in such a way as to maximise compatibility with older versions. High-level toolkits sometimes rely on low-level toolkits (eg. CosmoCode depends on the Sun JDK software), so an admin should also be aware that installing updates to low-level toolkits may have implications for their higher-level counterparts.

High-level APIs (Application Programming Interfaces). This refers to advanced library toolkits such as Open Inventor, ViewKit, etc. The actual application developments tools used with these types of products are the same, whether low-level or high-level (eg. cc and commands vs. WorkShop Pro CASE Tools). Thus, high-level APIs are not executable programs in their own right; they are a suite of easierto-use libraries, header files, etc. which users can use to create applications designed at a higher level of abstraction. Some example high-level APIs and their low-level counterparts include: Lower-level

Higher-level

OpenGL X11/Motif ImageVision

Open Inventor ViewKit/Tk Image Format Library, Electronic Light Table.

This is not a complete list. And there may be more than one level of abstraction, eg. Open Inventor is a subset of VRML.

Locations: high-level APIs tend to have their files stored in correspondingly named directories in /usr/lib, /usr/include, etc. For example, Open Inventor files can be found in /usr/lib/Inventor and /usr/include/Inventor. An exception is support files such as example models, images, textures, etc. which will always be in /usr/share, but not necessarily in specifically named locations, eg. the example 3D Inventor models are in /usr/share/data/models.

Shareware and Freeware Software. This category of software, eg. the GNU compiler system, is usually installed either in /usr/local somewhere, or in /usr/freeware. Many shareware/freeware program don't have to be installed in one of these two places (Blender is one such example) but it is best to do so in order to maintain a consistent software management policy. Since /usr/local and /usr/freeware are not normally referenced by the standard path definition, an admin must ensure that relevant users are informed of any changes they may have to make in order to access newly installed software. A typical notification might be a recommendation of a how a user can modify her/his own .cshrc file so that shells and other programs know where any new executable files, libraries, online documents, etc. are stored. Note that, assuming the presence of Internet access, users can easily download freeware/shareware on their own and install it in their own directory so that it runs from their home account area, or they could even install software in globally writeable places such as /var/tmp. If this happens, it's common for an admin to become annoyed, but the user has every right to install software in their own account area (unless it's against company policy, etc.) A better response is to appreciate the user's need for the software and offer to install it properly so that everyone can use it, unless some other factor is more important. Unlike vendor-supplied or commercial applications, newer versions of shareware and freeware programs can often be radically different from older versions. GIMP is a good example of this - one version introduced so many changes that it was barely comparable to an older version. Users who utilise these types of packages might be annoyed if an update is made without consulting them because: it's highly likely their entire working environment may be different in the new version, o features of the old version may no longer be available, o aspects of the new version may be incompatible with the old version, o etc. o

Thus, shareware/freeware programs are a good example of where it might be better for admins to offer more than one version of a software package, eg. all the files for Blender V1.57 are stored in /usr/local/blender1.57_SGI_6.2_iris on akira and sevrin. When the

next version comes out (eg. V1.6), the files will be in /usr/local/blender1.6_SGI_6.2_iris - ie. users can still use the old version if they wish. Because shareware/free programs tend to be supplied as distinct modules, it's often easier to support multiple versions of such software compared to vendor-supplied or commercial packages.

Comments on Software Updates, Version Issues, etc. Modern UNIX systems usually employ software installation techniques which operate in such way so as to show any incompatibilities before installation (SGIs certainly operate this way); the inst program (and thus swmgr too since swmgr is just a GUI interface to inst) will not allow one to install software if there are conflicts present concerning software dependency and compatibility. This feature of inst (and swmgr) to monitor software installation issues applies only to software subsystems that can be installed and removed using inst/swmgr, ie. those said to be in 'inst' format. Thankfully, large numbers of freeware programs (eg. GIMP) are supplied in this format and so they can be managed correctly. Shareware/Freeware programs do not normally offer any means by which one can detect possible problems before installation or removal, unless the authors have been kind enough to supply some type of analysis script or program. Of course, there is nothing to stop an admin using low-level commands such as cp, tar, mv, etc. to manually install problematic files by copying them from a CD, or another system, but to do so is highly unwise as it would invalidate the inst database structure which normally acts as a highly accurate and reliable record of currently installed software. If an admin must make custom changes, an up-to-date record of these changes should be maintained. To observe inst/swmgr in action, either enter 'inst' or 'swmgr' at the command prompt (or select 'Software Manager' from the Toolchest which runs swmgr). swmgr is the easier to understand because of its intuitive interface. Assuming the use of swmgr, once the application window has appeared, click on 'Manage Installed Software'. swmgr loads the inst database information, reading the installation history, checking subsystem sizes, calculating dependencies, etc. The inst system is a very effective and reliable way of managing software. Most if not all modern UNIX systems will employ a software installation and management system such as inst, or a GUI-based equivalent.

Summary. As an administrator, one should not need to know how to use the software products which users have access to (though it helps in terms of being able to answer simple questions), but one should: be aware of where the relevant files are located, understand issues concerning revision control, notify users of any steps they must take in order to access new software or features, o aid users in being able to use the products efficiently (eg. using /tmp or /var/tmp for working temporarily with large files or complex tasks), o have a consistent strategy for managing software products. o o o

These issues become increasingly important as systems become more complex, eg. multiple vendor platforms, hundreds of systems connected across multiple departments, etc. One solution for companies with multiple systems and more than one admin is to create a system administration committee whose responsibilities could include coordinating site policies, dealing with security problems, sharing information, etc.

Detailed Notes for Day 3 (Part 1) UNIX Fundamentals: Installing an Operating System and/or Software. Installation Rationale. Installing an OS is a common task for an admin to perform, usually often because of the acquisition of a new system or the installation of a new disk. Although any UNIX variant should be perfectly satisfactory once it has been installed, sometimes the admin or a user has a particular problem which requires, for example, a different system configuration (and thus perhaps a reinstall to take account of any major hardware changes), or a different OS version for compatibility testing, access to more up-to-date features, etc. Alternatively, a serious problem or accidental mistake might require a reinstallation, eg. corrupted file system, damaged disk, or an unfortunate use of the rm command (recall the example given in the notes for Day 2, concerning the dangers of the 'find' command); although system restoration via backups is an option, often a simple reinstall is more convenient and faster. Whatever the reason, an admin must be familiar with the procedure for installing an OS on the platform for which she/he is responsible.

Installation Interface and Tools. Most UNIX systems have two interfaces for software installation: a high-level mode where an admin can use some kind of GUI-based tool, and a low-level mode which employs the command line shell. The GUI tool normally uses the command line version for the actual installation operations. In the case of SGI's IRIX, the low-level program is called 'inst', while the GUI interface to inst is called 'swmgr' - the latter can be activated from the 'Toolchest' on the desktop or entered as a command. Users can also run swmgr, but only in 'read-only' mode, ie. Non-root users cannot use inst or swmgr to install or remove software. For general software installation tasks (new/extra applications, updates, patches, etc.) the GUI tool can normally be used, but for installing an OS, virtually every UNIX platform will require the admin to not only use the low-level tool for the installation, but also carry out the installation in a 'lower' (restricted) access mode, ie. a mode where only the very basic system services are operating: no user-related processes are running, the end-user GUI interface is not active, no network services are running, etc. For SGI's IRIX, this mode is called 'miniroot'. Major updates to the OS are also usually carried out in miniroot mode - this is because a fully operational system will have services running which could be altered by a major OS change, ie. it would be risky to perform any such change in anything but the equivalent of miniroot. It is common for this restricted miniroot mode to be selected during bootup, perhaps by pressing the ESC key when prompted. In the case of SGI systems, the motherboard PROM chip includes

a hard-coded GUI interface mechanism called ARCS which displays a mouse-driven menu on bootup. This provides the admin with a user-friendly way of performing low-level system administration tasks, eg. installing the OS from miniroot, running hardware diagnostics, accessing a simple PROM shell called a Command Monitor for performing low-level actions such as changing PROM settings (eg. which SCSI ID to treat as the system disk), etc. Systems without graphics boards, such as servers, provide the same menu but in text-only form, usually through a VT or other compatible text display terminal driven from the serial port. Note that SGI's VisualWorkstation machine (an NT system) also uses the ARCS GUI interface - a first for any NT system (ie. no DOS at all for low-level OS operations). Not many UNIX vendors offer a GUI menu system like ARCS for low-level tasks - SGI is one of the few who do, probably because of a historical legacy of making machines for the visual arts and sciences. Though the ARCS system is perhaps unique, after one one has selected 'Software Installation' the procedure progresses to a stage where the interface does become the more familiar text-based use of inst (ie. the text information just happens to be presented within a GUI-style window). Very early UNIX platforms were not so friendly when it came to offering an easy method for installing the OS, especially in the days of older storage media such as 5.25" disks, magnetic tapes, etc. However, some vendors did a good job, eg. the text-only interface for installing HPUX on Hewlett Packard machines (eg. HP9000/730) is very user-friendly, allowing the admin to use the cursor arrow keys to select options, activate tasks, etc. During installation, constantly updated information shows how the installation is progressing: current file being installed, number of files installed so far, number of files remaining, amount of disk space used up so far, disk space remaining, percentage equivalents for all these, and even an estimate of how much longer the installation will take before completion (surprisingly, inst doesn't provide this last piece of information as it is running, though one can make good estimates or find out how long it's going to take from a 3rd-party information source). The inst program gives progress output equivalent to most of the above by showing the current software subsystem being installed, which sub-unit of which subsystem, and what percentage of the overall operation has been done so far. Perhaps because of the text-only interface which is at the heart of installing any UNIX variant, installing an OS can be a little daunting at first, but the actual procedure itself is very easy. Once an admin has installed an OS once, doing it again quickly becomes second nature. The main reason the task can seem initially confusing is that the printed installation guides are often too detailed, ie. the supplied documents have to assume that the person carrying out the installation may know nothing at all about what they're doing. Thankfully, UNIX vendors have recognised this fact and so nowadays any such printed material also contains a summary installation guide for experts and those who already know the general methods involved - this is especially useful when performing an OS update as opposed to an original OS installation.

OS Source Media. Years ago, an OS would be stored on magnetic tape or 5.25" disks. Today, one can probably state with confidence that CDROMs are used by every vendor. For example, SGI's IRIX 6.2 comes on 2 CDROMs; IRIX 6.5 uses 4 CDROMs, but this is because 6.5 can be used with any machine from SGI's entire current product line, aswell as many older systems - thus, the basic CD set must contain the data for all relevant systems even though an actual installation will only use a small subset of the data from the CDs (typically less than one CD's worth). In the future, it is likely that vendors will switch to DVDs due to higher capacities and faster transfer rates. Though a normal OS installation uses some form of original OS media, UNIX actually allows one to install an OS (or any software) via some quite unique ways. For example, one could copy the data from the source media (I shall assume CDROM) to a fast UltraSCSI disk drive. Since disks offer faster transfer rates and access times, using a disk as a source media enables a faster installation, as well as removing the need for swapping CDROMs around during the installation process. This is essentially a time-saving feature but is also very convenient, eg. no need to carry around many CDROMs (remember that after an OS installation, an admin may have to install extra software, applications, etc. from other CDs). A completely different option is to install the OS using a storage device which is attached to a remote machine across a network. This may sound strange, ie. the idea that a machine without an OS can access a device on a remote system and use that as an OS installation source. It's something which is difficult but not impossible with PCs (I'm not sure whether a Linux PC would support this method). A low-level communications protocol called bootp (Internet Bootstrap Protocol), supported by all traditional UNIX variants, is used to facilitate communication across the network. As long as the remote system has been configured to allow another system to access its local device as a source for remote OS installation, then the remote system will effectively act as an attatched storage medium. However, most admins will rarely if ever have to install an OS this way for small networks, though it may be more convenient for larger networks. Note that IRIX systems are supplied by default with the bootp service disabled in the /etc/inetd.conf file (the contents of this file controls various network services). Full details on how to use the bootp service for remote OS installation are normally provided by the vendor in the form of an online book or reference page. In the case of IRIX, see the section entitled, "Booting across the Network" in Chapter 10 of the online book, "IRIX Admin: System Configuration and Operation". Note: this discussion does not explain every single step of installing an OS on an SGI system, though the method will be demonstrated during the practical session if time permits. Instead, the focus here is on management issues which surround an OS installation, especially those techniques which can ease the installation task. Because of the SGI-related technical site I run, I have already created extremely detailed installation guides for IRIX 6.2 [1] and IRIX 6.5 [2] which also include tables of example installation times (these two documents are included for future reference). The installation times obtained were used to conduct a CPU and CDROM

performance analysis [3]. Certain lessons were learned from this analysis which are also relevant to installing an OS - these are explained later.

Installing an OS on multiple systems. Using a set of CDs to install an OS can take quite some time (15 to 30 minutes is a useful approximation). If an admin has many machines to install, there are several techniques for cutting the amount of time required to install the OS on all the machines. The most obvious method is for all machines to install via a remote network device, but this could actually be very slow, limited partly by network speed but also by the way in which multiple systems would all be trying to access the same device (eg. CDROM) at the same time. It would only really be effective for a situation where the network was very fast and the device - or devices, there could be more than one - was also fast. An example would be the company MPC; as explained in previous lectures, their site configuration is extremely advanced. The network they employ is so fast that it can saturate the typical 100Mbit Ethernet port of a modern workstation like Octane. MPC's storage systems include many high-end RAID devices capable of delivering data at hundreds of MB/sec rates (this kind of bandwidth is needed for editing broadcast-quality video and assuring that animators can load complete scene databases without significant delay). Thus, the admin at MPC can use some spare RAID storage to install an OS on a system across the network. When this is done, the limiting factor which determines how long the installation takes is the computer's main CPU(s) and/or its Ethernet port (100MBit), the end result of which is that an installation can take mere minutes. In reality, the MPC admin uses an even faster technique for installing an OS, which is discussed in a moment. At the time of my visit, MPC was using a high-speed crossbar switching 288Mbit/sec network (ie. multiple communications links through the routers - each machine could be supplied with up to 36MB/sec). Today they use multiple gigabit links (HiPPI) and other supporting devices. But not everyone has the luxury of having such equipment.

Disk Cloning [1]. If an admin only has a single machine to deal with, the method used may not matter too much, but often the admin has to deal with many machines. A simple technique which saves a huge amount of time is called 'disk cloning'. This involves installing an OS onto a single system ('A') and then copying (ie. cloning) the contents of that system's disk onto other disks. The first installation might be carried out by any of the usual means (CDROM, DAT, network, etc.), after which any extra software is also installed; in the case of SGIs, this would mean the admin starting up the system into a normal state of operation, logging in as root and using swmgr to install extra items. At this point, the admin may wish to make certain custom changes as well, eg.

installing shareware/freeware software, etc. This procedure could take more than an hour or two if there is a great deal of software to install. Once the initial installation has finished, then begins the cloning process. On SGIs, this is typically done as follows (other UNIX systems will be very similar if not identical): 1. Place the system disk from another system B into system A, installed at, for example, SCSI ID 2 (B's system disk would be on SCSI ID 1 in the case of SGIs; SCSI ID 0 is used for the SCSI controller). Bootup the system. 2. Login as root. Use fx to initialise the B disk to be a new 'root' (ie. system) disk; create a file system on it; mount the disk on some partition on A's disk such as /disk2. 3. Copy the contents of disk A to disk B using a command such as tar. Details of how to do this with example tar commands are given in the reference guides [1] [2]. 4. Every system disk contains special volume header information which is required in order to allow it to behave as a bootable device. tar cannot copy this information since it does not reside on the main data partition of the disk in the form of an ordinary file, so the next step is to copy the volume header data from A to B using a special command for that purpose. In the case of SGIs, the relevant program is called dvhtool (device volume header tool). 5. Shut down system A; remove the B disk; place the B disk back into system B, remembering to change its SCSI ID back to 1. If further cloning is required, insert another disk into system A on SCSI ID 2, and (if needed) a further disk into system B, also set to SCSI ID 2. Reboot both systems. 6. System A will reboot as normal. At bootup time, although system B already has a kernel file available (/unix) because all the files will be recognised as new (ie. changed) system B will also create a new kernel file (/unix.install) and then bootup normally ready for login. Reboot system B once more so that the new kernel file is made the current kernel file. At this stage, what one has effectively created is a situation comprising two systems as described in Step 1, instead of only one such system which existed before the cloning process. Thus, one could now repeat the process again, creating four systems ready to use or clone again as desired. Then eight, sixteen, thirty two and so on. This is exactly the same way biological cells divide, ie. binary fission. Most people are familiar with the idea that repeatedly doubling the number of a thing can create a great many things in a short space of time, but the use of such a technique for installing an operating system on many machines means an admin can, for example, completely configure over one hundred machines in less than five hours! The only limiting factor, as the number of machines to deal with increases, is the amount of help available by others to aid in the swapping of disks, typing of commands, etc. In the case of the 18 Indys in Ve24, the last complete reinstall I did on my own took less than three hours. Note: the above procedure assumes that each cloning step copies one disk onto just a single other disk - this is because I'm using the Indy as an example, ie. Indy only has internal space for one extra disk. But if a system has the available room, then many more disks could be installed on other SCSI IDs (3, 4, 5, etc.) resulting in each cloning step creating three, four, etc. disks from just one. This is only possible because one can run multiple tar copy commands at the same time.

Of course, one could use external storage devices to connect extra disks. There's no reason why a system with two SCSI controllers (Indigo2, O2, Octane, etc.) couldn't use external units to clone the system disk to 13 other disks at the same time; for a small network, such an ability could allow the reinstallation of the entire system in a single step!

Using a Backup Image. If a system has been backed up onto a medium such as DAT tape, one could in fact use that tape for installing a fresh OS onto a different disk, as opposed to the more usual use of the tape for data restoration purposes. The procedure would be similar to some of the steps in disk cloning, ie. install a disk on SCSI ID 2, initialise, and use tar to extract the DAT straight to the disk. However, the volume header information would have to come from the original system since it would not be present on the tape, and only one disk could be written to at a time from the tape. Backup media are usually slower than disks too.

Installing a New Version of an OS (Major Updates). An admin will often have to install updates to various OS components as part of the normal routine of installing software patches, bug fixes, new features, security fixes, etc. as they arrive in CD form from the vendor concerned. These can almost always be installed using the GUI method (eg. swmgr) unless specifically stated otherwise for some reason. However, if an admin wishes to change a machine which already has an OS installed to a completely new version (whether a newer version or an older one), then other issues must be considered. Although it is perfectly possible to upgrade a system to a newer OS, an existing system will often have so much software installed with a whole range of configuration files, a straight upgrade to a new OS revision may not work very well. It would be successful, but what usually happens is that the admin has to resolve installation conflicts before the procedure can begin, which is annoyingly time wasting. Further, some changes may even alter some fundamental aspect of the system, in which case an upgrade on top of the existing OS would involve extra changes which an admin would have to read up on first (eg. IRIX 6.2 uses a completely different file system to IRIX 5.3: XFS vs. EFS). Even if an update over an existing OS is successful, one can never really be sure that older files which aren't needed anymore were correctly removed. To an admin, the system would 'feel' as if the older OS was somehow still there, rather like an old layer of paint hidden beneath a new gloss. This aspect of OS management is perhaps only psychological, but it can be important. For example, if problems occurred later, an admin might waste time checking for issues concerning the older OS which aren't relevant anymore, even though the admin theoretically knows such checks aren't needed.

Thus, a much better approach is to perform a 'clean' installation when installing a new OS. A typical procedure would be as follows: 1. Read all the relevant notes supplied with the new OS release so that any issues relevant to how the system may be different with the new OS version are known beforehand, eg. if any system services operate in a different way, or other factors (eg. new type of file system, etc.) 2. Make a full system backup of the machine concerned. 3. Identify all the key files which make the system what it is, eg. /etc/sys_id, /etc/hosts, and other configuration files/directories such as /var/named, /var/flexlm/license.dat, etc. These could be placed onto a DAT, floptical, ZIP, or even another disk. Items such as shareware/freeware software are probably best installed anew (read any documents relevant to software such as this too). 4. Use the appropriate low-level method to reinitialise the system disk. For SGI IRIX systems, this means using the ARCS bootup menu to select the Command Monitor, boot off of the OS CDROM and use the fx program to reinitialise the disk as a root disk, use mkfs to create a new file system (the old OS image is now gone), then reboot to access the 'Install System Software' option from the ARCS menu. 5. Install the OS in the normal manner. 6. Use the files backed up in step 3 to change the system so that it adopts its usual identity and configuration, baring in mind any important features/caveats of the new OS release. This is a safe and reliable way of ensuring a clean installation. Of course, the installation data could come from a different media or over a network from a remote system as described earlier.

Time-saving Tips. When installing an OS or software from a CDROM, it's tempting to want to use the fastest possible CDROM available. However, much of the process of installing software, whether the task is an OS installation or not, involves operations which do not actually use the CDROM. For example, system checks need to be made before the installation can begin (eg. available disk space), hundreds of file structures need to be created on the disk, installation images need to be uncompressed in memory once they have been retrieved from the CDROM, installed files need to be checked as the installation progresses (checksums), and any post-installation tasks performed such as compiling any system software indices. As a result, perhaps 50% of the total installation time may involve operations which do not access the CDROM. Thus, using a faster CDROM may not speedup the overall installation to any great degree. This effect is worsened if the CPU in the system is particularly old or slow, ie. a slow CPU may not be able to take full advantage of an old CDROM, never mind a new one. In order for a faster CDROM to make any significant difference, the system's CPU must be able to take advantage of it, and a reasonably large proportion of an installation procedure must actually consist of accessing the CDROM.

For example, consider the case of installing IRIX 6.5 on two different Indys - one with a slow CPU, the other with a better CPU - comparing any benefit gained from using a 32X CDROM instead of a 2X CDROM [3]. Here is a table of installation times, in hours minutes and seconds, along with percentage speedups. 2X CDROM

32X CDROM

%Speedup

100MHz R4600PC Indy:

1:18:36

1:12:11

8.2%

200MHz R4400SC Indy:

0:52:35

0:45:24

13.7%

(data for a 250MHz R4400SC Indigo2 shows the speedup would rise to 15.2% - a valid comparison since Indy and Indigo2 are almost identical in system design) In other words, the better the main CPU, the better the speedup obtained by using a faster CDROM. This leads on to the next very useful tip for installing software (OS or otherwise)...

Temporary Hardware Swaps. The example above divided the columns in order to obtain the speedup for using a faster CDROM, but it should be obvious looking at the table that a far greater speedup can be obtained by using a better CPU: Using 200MHz R4400SC CPU Instead of 100MHz R4600PC. (Percentage Speedup) 2X CDROM with Indy:

33.1%

32X CDROM with Indy:

37.1%

In other words, no matter what CDROM is used, an admin can save approximately a third of the normal installation time just by temporarily swapping the best possible CPU into the target system! And of course, the saving is maximised by using the fastest CDROM available too, or other installation source such as a RAID containing the CDROM images. For example, if an admin has to carry out a task which would normally be expected to take, say, three hours on the target system, then a simple component swap could save over an hour of installation time. From an admin's point of view, that means getting the job done quicker (more time for other tasks), and from a management point of view that means lower costs and better efficiency, ie. less wages money spent on the admin doing that particular task. Some admins might have to install OS images as part of their job, eg. performance analysis or configuring systems to order. Thus, saving as much time as possible could result in significant daily productivity improvements.

The Effects of Memory Capacity. During the installation of software or an OS, the system may consume large amounts of memory in order to, for example, uncompress installation images from the CDROM, process existing system files during a patch update, recompile system file indices, etc. If the target system does not have enough physical memory, then swap space (otherwise known as virtual memory) will have to be used. Since software installation is a disk and memory intensive task, this can massively slow down the installation or removal procedure (the latter can happen too because complex file processing may be required in order to restore system files to an earlier state prior to the installation of the software items being removed). Thus, just as it can be helpful to temporarily swap a better CPU into the target system and use a faster CDROM if available, it is also a good idea to ensure the system has sufficient physical memory for the task. For example, I once had cause to install a large patch upgrade to the various compiler subsystems on an Indy running IRIX 6.2 with 64MB RAM [1]. The update procedure seemed to be taking far too long (15 minutes and still not finished). Noticing the unusually large amount of disk activity compared to what I would normally expect, ie. Noise coming from the disk, I became suspicious and wondered whether the installation process was running out of memory. A quick use of gmemusage showed the available memory to be very low (3MB) implying that memory swapping was probably occurring. I halted the update procedure (easy to do with IRIX) and cancelled the installation. After upgrading the system temporarily to 96MB RAM (using 32MB from another Indy) I ran the patch again. This time, the update was finished in less than one minute! Using gmemusage showed the patch procedure required at least 40MB RAM free in order to proceed without resorting to the use of swap space.

Summary. 1. Before making any major change to a system, make a complete backup just in case something goes wrong. Read any relevant documents supplied with the software to be installed, eg. release notes, caveats to installation, etc. 2. When installing an OS or other software, use the most efficient storage media available if possible, eg. the OS CDs copied onto a disk. NB: using a disk OS image for installation might mean repartitioning the disk so that the system regards the disk as a bootable device, just like a CDROM. By default, SCSI disks do not have the same partition layout as a typical CDROM. On SGIs using IRIX, the fx program is used to repartition disks. 3. If more than one system is involved, use methods such as disk cloning to improve the efficiency of the procedure. 4. If possible, temporarily swap better system components into the target system in order to reduce installation time and ensure adequate resources for the procedure (better CPU, lots of RAM, fastest possible CDROM).

Caution: item 4 above might not be possible if the particular set of files which get installed are determined by the presence of internal components. In the case of Indy, installing an R5000 series CPU would result in the installation of different low-level bootup CPU-initialisation libraries compared to R4600 or R4400 (these latter two CPUs can use the same libraries, but any R5000 CPU uses newer libraries). Files relevant to these kinds of issues are located in directories such as /var/sysgen.

Patch Files. Installing software updates to parts of the OS or application software is a common task for admins. In general, patch files should not be installed unless they are needed, but sometimes an admin may not have any choice, eg. for security reasons, or Y2K compliance. Typically, patch updates are supplied on CDs in two separate categories (these names apply to SGIs; other UNIX vendors probably use a similar methodology): 1. Required/Recommended patches. 2. Fix-on-Fail Patches. Item 1 refers to patches which the vendor suggests the admin should definitely install. Typically, a CD containing such patches is accessed with inst/swmgr and an automatic installation carried out, ie. the admin lets the system work out which of the available required/recommended patches should be installed. This concept is known as installing a 'patch set'. When discussing system problems or issues with others (eg. technical support, or colleagues on the Net), the admin can then easily describe the OS state as being a particular revision modified by a particular dated patch set, eg. IRIX 6.5 with the April 1999 Patch Set. Item 2 refers to patches which only concern specific problems or issues, typically a single patch file for each problem. An admin should not install such patches unless they are required, ie. they are selectively installed as and when is necessary. For example, an unmodified installation of IRIX 6.2 contains a bug in the 'jot' editor program which affects they way in which jot accesses files across an NFS-mounted directory (the bug can cause jot to erase the file). To fix the bug, one installs patch number 2051 which is shown in the inst/swmgr patch description list as 'Jot fix for mmapping', but there's no need to install the patch if a machine running 6.2 is not using NFS.

Patch Inheritance. As time goes by, it is common for various bug fixes and updates from a number of patches to be brought together into a 'rollup' patch. Also, a patch file may contain the same fixes as an earlier patch plus some other additional fixes. Two issues arise from this: 1. If one is told to install a patch file of a particular number (eg. advice gained from someone on a newsgroup), it is usually the case that any later patch which has been declared to be a replacement for the earlier patch can be used instead. This isn't always

the case, perhaps due to specific hardware issues of a particular system, but in general a fix for a problem will be described as 'install patch or later'. The release notes for any patch file will describe what hardware platforms and OS revisions that patch is intended for, what patches it replaces, what bugs are fixed by the patch (official bug code numbers included), what other known bugs still exist, and what workarounds can be used to temporarily solve the remaining problems. 2. When a patch is installed, a copy of the effected files prior to installation, called a 'patch history', is created and safely stored away so that if ever the patch has to be removed at a later date, the system can restore the relevant files to the state they were in before the patch was first installed. Thus, installing patch files consumes disk space - how much depends on the patch concerned. The 'versions' command with the 'removehist' option can be used to remove the patch history for a particular patch, recovering disk space, eg.: 3. versions removehist patchSG0001537

would remove the patch history file for patch number 1537. To remove all patch histories, the command to use is: versions removehist "*"

Conflicts. When installing patches, especially of the Fix-on-Fail variety, an admin can come across a situation where a patch to be installed (A) is incompatible with one already present on the system (B). This usually happens when an earlier problem was dealt with using a more up-to-date patch than was actually necessary. The solution is to either remove B, then install an earlier but perfectly acceptable patch C and finally install A, or find a more up-to-date patch D which supersedes A and is thus compatible with B. Note: if the history file for a patch has been removed in order to save disk space, then it will not be possible to remove that patch from the system. Thus, if an admin encounters the situation described above, the only possible solution will be to find the more up-to-date patch D.

Exploiting Patch File Release Notes. The release notes for patches can be used to identify which patches are compatible, as well ascertain other useful information, especially to check whether a particular patch is the right one an admin is looking for (patch titles can sometimes be somewhat obscure). Since the release notes exist on the system in text form (stored in /usr/relnotes), one can use the grep command to search the release notes for information by hand, using appropriate commands. The commands 'relnotes' and 'grelnotes' can be used to view release notes. relnotes outputs only text. Without arguments, it shows a summary of all installed products for which release notes are available. One can then supply a product name - relnotes will respond with a list of chapter titles for that product. Finally, specifying a product name and a chapter number will output the actual text notes for the chosen chapter, or one can use '*' to display all

chapters for a product. grelnotes gives the same information in a browsable format displayed in a window, ie. grelnotes is a GUI interface to relnotes. See the man pages for these commands for full details. relnotes actually uses the man command to display information, ie. the release notes files are stored in the same compressed text format ('pack') used by online manual pages (man uses the 'unpack' command to decompress the text data). Thus, in order to grep-search through a release notes file, the file must first be uncompressed using the unpack command. This is a classic example of where the UNIX shell becomes very powerful, ie. one could write a shell script using a combination of find, ls, grep, unpack and perhaps other commands to allow one to search for specific items in release notes. Although the InfoSearch tool supplied with IRIX 6.5 allows one to search release notes, IRIX 6.2 does not have InfoSearch, so an admin might decide that writing such a shell script would prove very useful. Incidentally, this is exactly the kind of useful script which ends up being made available on the Net for free so that anyone can use it. For all I know, such a script already exists. Over time, entire collections of useful scripts are gathered together and eventually released as freeware (eg. GNU shell script tools). An admin should examine any such tools to see if they could be useful - a problem which an admin has to deal with may already have been solved by someone else two decades earlier.

Patch Subsystem Components. Like any other software product, a patch file is a software subsystem usually containing several sub-units, or components. When manually selecting a patch for installation, inst/swmgr may tag all sub-units for installation even if certain sub-units are not applicable (this can happen for an automatic selection too, perhaps because inst selects all of a patch's components by default). If this happens, any conflicts present will be displayed, preventing the admin from accidentally installing unwanted or irrelevant items. Remember that an installation cannot begin until all conflicts are resolved, though an admin can override this behaviour if desired. Thus, when manually installing a patch file (or files), I always check the individual sub-units to see what they are. In this way, I can prevent conflicts from arising in the first place by not selecting subsystems which I know are not relevant, eg. 64bit libraries which aren't needed for a system with a 32bit memory address kernel like Indy (INFO: all SGIs released after the Indigo R3000 in 1991 do 64bit processing, but the main kernel file does not need to be compiled using 64bit addressing extensions unless the system is one which might have a very large amount of memory, eg. an Origin2000 with 16GB RAM). Even when no conflicts are present, I always check the selected components to ensure no 'older version' items have been selected.

References: 1. Disk and File System Administration: 2. http://www.futuretech.vuurwerk.nl/disksfiles.html

3. How to Install IRIX 6.5: 4. http://www.futuretech.vuurwerk.nl/6.5inst.html

5. SGI General Performance Comparisons: 6. http://www.futuretech.vuurwerk.nl/perfcomp.html

Detailed Notes for Day 3 (Part 2) UNIX Fundamentals: Organising a network with a server. This discussion explains basic concepts rather than detailed ideas such as specific 'topologies' to use with large networks, or how to organise complex distributed file systems, or subdomains and address spaces - these are more advanced issues which most admins won't initially have to deal with, and if they do then the tasks are more likely to be done as part of a team. The SGI network in Ve24 is typical a modern UNIX platform in how it is organised. The key aspects of this organisation can be summarised as follows: 

   



 

A number of client machines and a server are connected together using a hub (24-port in this case) and a network comprised of 10Mbit Ethernet cable (100Mbit is more common in modern systems, with Gigabit soon to enter the marketplace more widely). Each client machine has its own unique identity, a local disk with an installed OS and a range of locally installed application software for use by users. The network has been configured to have its own subdomain name of a form that complies with the larger organisation of which it is just one part (UCLAN). The server has an external connection to the Internet. User accounts are stored on the server, on a separate external disk. Users who login to the client machines automatically find their own files available via the use of the NFS service. Users can work with files in their home directory (which accesses the server's external disk across the network) or use the temporary directories on a client machine's local disk for better performance. Other directories are NFS mounted from the server in order to save disk space and to centralise certain services (eg. /usr/share, /var/mail, /var/www). Certain aspects of the above are customised in places. Most networks are customised in certain ways depending on the requirements of users and the decisions taken by the admin and management. In this case, specifics include: o Some machines have better hardware internals, allowing for software installation setups that offer improved user application performance and services, eg. bigger disk permits /usr/share to be local instead of NFS-mounted, and extra vendor software, shareware and freeware can be installed. o The admin's account resides on an admin machine which is effectively also a client, but with minor modifications, eg. tighter security with respect to the rest of the network, and the admin's personal account resides on a disk attached to the admin machine. NFS is used to export the admin's home account area to the server and all other clients; custom changes to the admin's account definition allows the admin account to be treated just like any other user account (eg. accessible from within /home/staff). o The server uses a Proxy server in order to allow the client machines to access the external connection to the Internet. o Ordinary users cannot login to the server, ensuring that the server's resources are reserved for system services instead of running user programs. Normally, this

would be a more important factor if the server was a more powerful system than the clients (typical of modern organisations). In the case of the Ve24 network though, the server happens to have the same 133MHz R4600PC CPU as the client machines. Staff can login to the server however - an ability based on assumed privilege. o One client machine is using a more up-to-date OS version (IRIX 6.5) in order to permit the use of a ZIP drive, a device not fully supported by the OS version used on the other clients (IRIX 6.2). ZIP drives can be used with 6.2 at the commandline level, but the GUI environment supplied with 6.2 does not fully support ZIP devices. In order to support 6.5 properly, the client with the ZIP drive has more memory and a larger disk (most of the clients have 549MB system disks insufficient to install 6.5 which requires approximately 720MB of disk space for a default installation). o etc. This isn't a complete list, but the above are the important examples. Exactly how an admin configures a network depends on what services are to be provided, how issues such as security and access control are dealt with, Internet issues, available disk space and other resources, peripherals provided such as ZIP, JAZ, etc., and of course any policy directives decided by management. My own personal ethos is, in general, to put users first. An example of this ethos in action is that /usr/share is made local on any machine which can support it - accesses to such a local directory occur much faster than across a network to an NFS-mounted /usr/share on a server. Thus, searching for man pages, accessing online books, using the MIDI software, etc. is much more efficient/faster, especially when the network or server is busy.

NFS Issues. Many admins will make application software NFS-mounted, but this results in slower performance (unless the network is fast and the server capable of supplying as much data as can be handled by the client, eg. 100Mbit Ethernet, etc.) However, NFS-mounted application directories do make it easier to manage software versions, updates, etc. Traditional client/server models assume applications are stored on a server, but this is an old ethos that was designed without any expectation that the computing world would eventually use very large media files, huge applications, etc. Throwing application data across a network is a ridiculous waste of bandwidth and, in my opinion, should be avoided where possible (this is much more important for slower networks though, eg. 10Mbit). In the case of the Ve24 network, other considerations also come into play because of hardwarerelated factors, eg. every NFS mount point employed by a client system uses up some memory which is needed to handle the operational overhead of dealing with accesses to that mount point. Adding more mount points means using more memory on the client; for an Indy with 32MB RAM, using as many as a dozen mount points can result in the system running out of memory (I

tried this in order to offer more application software on the systems with small disks, but 32MB RAM isn't enough to support lots of NFS-mounted directories, and virtual memory is not an acceptable solution). This is a good example of how system issues should be considered when deciding on the hardware specification of a system. As with any computer, it is unwise to equip a UNIX system with insufficient resources, especially with respect to memory and disk space.

Network Speed. Similarly, the required speed of the network will depend on how the network will be used. What applications will users be running? Will there be a need to support high-bandwidth data such as video conferencing? Will applications be NFS-mounted or locally stored? What kind of system services will be running? (eg. web servers, databases, image/document servers, etc.) What about future expansion? All these factors and more will determine whether typical networking technologies such as 10Mbit, 100Mbit or Gigabit Ethernet are appropriate, or whether a different networking system such as ATM should be used instead. For example, MPC uses a fastswitching high-bandwidth network due to the extensive use of data-intensive applications which include video editing, special effects, rendering and animation. After installation, commands such as netstat, osview, ping and ttcp can be used to monitor network performance. Note that external companies, and vendor suppliers, can offer advice on suggested system topologies. For certain systems (eg. high-end servers), specific on-site consultation and analysis may be part of the service.

Storage. Deciding on appropriate storage systems and capacities can be a daunting task for a non-trivial network. Small networks such as the SGI network I run can easily be dealt with simply by ensuring that the server and clients all have large disks, that there is sufficient disk space for user accounts, and a good backup system is used, eg. DDS3 DAT. However, more complex networks (eg. banks, commercial businesses, etc.) usually need huge amounts of storage space, use very different types of data with different requirements (text, audio, video, documents, web pages, images, etc.), and must consider a whole range of issues which will determine what kind of storage solution is appropriate, eg.:     

preventing data loss, sufficient data capacity with room for future expansion, interupt-free fast access to data, failure-proof (eg. backup hub units/servers/UPS), etc.

A good source of advice may be the vendor supplying the systems hardware, though note that 3rd-party storage solutions can often be cheaper, unless there are other reasons for using a vendor-sourced storage solution (eg. architectural integration).

See the article listed in reference [1] for a detailed discussion on these issues.

Setting up a network can thus be summarised as follows:    

   

  

Decide on the desired final configuration (consultation process, etc.) Install the server with default installations of the OS. Install the clients with a default or expanded/customised configuration as desired. Construct the hardware connections. Modify the relevant setup files of a single client and the server so that one can rlogin to the server from the client and use GUI-based tools to perform further system configuration and administration tasks. Create, modify or install the files necessary for the server and clients to act as a coherent network, eg. /etc/hosts, .rhosts, etc. Setup other services such as DNS, NIS, etc. Setup any client-specific changes such as NFS mount points, etc. Check all aspects of security and access control, eg. make sure guest accounts are blocked if required, all client systems have a password for the root account, etc. Use any available FAQ (Frequently Asked Questions) files or vendor-supplied information as a source of advice on how to deal with these issues. Very usefully, IRIX 6.5 includes a high-level tool for controlling overall system and network security - the tool can be (and normally is) accessed via a GUI interface. Begin creating group entries in /etc/group ready for user accounts, and finally the user accounts themselves. Setup any further services required, eg. Proxy server for Internet access. etc.

The above have not been numbered in a rigid order since the tasks carried out after the very first step can usually be performed in a different order without affecting the final configuration. The above is only a guide.

Quotas. Employing disk quotas is a practice employed by most administrators as a means of controlling disk space usage by users. It is easy to assume that a really large disk capacity would mean an admin need not bother with quotas, but unfortunately an old saying definitely holds true: "Data will expand to fill the space available." Users are lazy where disk space is concerned, perhaps because it is not their job to manage the system as a whole. If quotas are not present on a system, most users simply don't bother deleting unwanted files. Alternatively, the quota management software can be used as an efficient disk accounting system by setting up quotas for a file system without using limit enforcement. IRIX employs a quota management system that is common amongst many UNIX variants. Examining the relevant commands (consult the 'SEE ALSO' section from the 'quotas' man page),

IRIX's quota system appears to be almost identical to that employed by, for example, HP-UX (Hewlett Packard's UNIX OS). There probably are differences between the two implementations, eg. issues concerning supported operations on particular types of file system, but in this case the quota system is typical of the kind of OS service which is very similar or identical across all UNIX variants. An important fact is that the quota software is part of the overall UNIX OS, rather than some hacked 3rd-party software addon. Quota software allows users to determine their current disk usage, and enables an admin to monitor available resources, how long a user is over their quota, etc. Quotas can be used not only to limit the amount of available disk space a user has, but also the number of files (inodes) which a user is permitted to create. Quotas consist of soft limits and hard limits. If a user's disk usage exceeds the soft limit, a warning is given on login, but the user can still create files. If disk usage continues to increase, the hard limit is the point beyond which the user will not be able to use any more disk space, at least until the usage is reduced so that it is sufficiently below the hard limit once more. Like most system services, how to setup quotas is explained fully in the relevant online book, "IRIX Admin: Disks and Filesystems". What follows is a brief summary of how quotas are setup under IRIX. Of more interest to an admin are the issues which surround quota management these are discussed shortly. To activate quotas on a file system, an extra option is added to the relevant entry in the /etc/fstab file so that the desired file system is set to have quotas imposed on all users whose accounts reside on that file system. For example, without quotas imposed, the relevant entry in yoda's /etc/fstab file looks like this: /dev/dsk/dks4d5s7 /home xfs rw 0 0

With quotas imposed, this entry is altered to be: /dev/dsk/dks4d5s7 /home xfs rw,quota 0 0

Next, the quotaon command is used to activate quotas on the root file system. A reboot causes the quota software to automatically detect that quotas should be imposed on /home and so the quota system is turned on for that file system. The repquota command is used to display quota statistics for each user. The edquota command is used to change quota values for a single user, or multiple users at once. With the -i option, edquota can also read in quota information from a file, allowing an admin to set quota limits for many users with a single command. With the -e option, repquota can output the current quota statistics to a file in a format that is suitable for use with edquota's -i option. Note: the editor used by edquota is vi by default, but an admin can change this by etting an environment variable called 'EDITOR', eg.: setenv EDITOR jot -f

The -f option forces jot to run in the foreground. This is necessary because the editor used by edquota must run in the foreground, otherwise edquota will simply see an empty file instead of quota data. Ordinary users cannot change quota limits.

Quota Management Issues. Most users do not like disk quotas. They are perceived as the information equivalent of a straitjacket. However, quotas are usually necessary in order to keep disk usage to a sensible level and to maintain a fair usage amongst all users. As a result, the most important decision an admin must make regarding quotas is what limit to actually set for users, either as a whole or individually. The key to amicable relations between an admin and users is flexibility, eg. start with a small to moderate limit for all (eg. 20MB). If individuals then need more space, and they have good reason to ask, then an admin should increase the user's quota (assuming space is available). Exactly what quota to set in the first instance can be decided by any sensible/reasonable schema. This is the methodology I originally adopted: 

The user disk is 4GB. I don't expect to ever have more than 100 users, so I set the initial quota to 40MB each.

In practice, as expected, some users need more, but most do not. Thus, erring on the side of caution while also being flexible is probably the best approach. Today, because the SGI network has a system with a ZIP drive attatched, and the SGIs offer reliable Internet access to the WWW, many students use the Ve24 machines solely for downloading data they need, copying or moving the data onto ZIP for final transfer to their PC accounts, or to a machine at home. Since the ZIP drive is a 100MB device, I altered the quotas to 50MB each, but am happy to change that to 100MB if anyone needs it (this allows for a complete ZIP image to be downloaded if required), ie. I am tailoring quota limits based on a specific hardware-related user service issue. If a user exceeds their quota, warnings are given. If they ask for more disk space, an admin would normally enquire as to whether the user genuinely needs more space, eg.: 

Does the user have unnecessary files lying around in their home directory somewhere? For example, movie files from the Internet, unwanted mail files, games files, object files or core dump files left over from application development, media files created by 'playing' with system tools (eg. the digital camera). What about their Netscape cache? Has it been set to too high a value? Do they have hidden files they're not aware of, eg.

.capture.tmp.* directories, capture.mv files, etc.? Can the user employ compression methods to save space? (gzip, pack, compress) If a user has removed all unnecessary files, but is still short of space, then unless there is some special reason for not increasing their quota, an admin should provide more space. Exceptions could include, for example, a system which has a genuine overall shortage of storage space. In such a situation, it is common for an admin to ask users to compress their files if possible, using the 'gzip', 'compress' or 'pack' commands. Users can use tar to create archives of many files prior to compression. There is a danger with asking users to compress files though: eventually, extra storage has to be purchased; once it has been, many users start uncompressing many of the files they earlier compressed. To counter this effect, any increase in storage space being considered should be large, say an order of magnitude, or at the very least a factor of 3 or higher (I'm a firm believer in future-proofing). Note that the find command can be used to locate files which are above a certain size, eg. those that are particularly large or in unexpected places. Users can use the du command to examine how much space their own directories and files are consuming. Note: if a user exceeds their hard quota limit whilst in the middle of a write operation such as using an editor, the user will find it impossible to save their work. Unfortunately, quitting the editor at that point will lose the contents of the file because the editor will have opened a file for writing already, ie. the opened file will have zero contents. The man page for quotas describes the problem along with possible solutions that a user can employ: "In most cases, the only way for a user to recover from over-quota conditions is to abort whatever activity is in progress on the filesystem that has reached its limit, remove sufficient files to bring the limit back below quota, and retry the failed program. However, if a user is in the editor and a write fails because of an over quota situation, that is not a suitable course of action. It is most likely that initially attempting to write the file has truncated its previous contents, so if the editor is aborted without correctly writing the file, not only are the recent changes lost, but possibly much, or even all, of the contents that previously existed. There are several possible safe exits for a user caught in this situation. He can use the editor ! shell escape command (for vi only) to examine his file space and remove surplus files. Alternatively, using csh, he can suspend the editor, remove some files, then resume it. A third possibility is to write the file to some other filesystem (perhaps to a file on /tmp) where the user's quota has not been exceeded. Then after rectifying the quota situation, the file can be moved back to the filesystem it belongs on." It is important that users be made aware of these issues if quotas are installed. This is also another reason why I constantly remind users that they can use /tmp and /var/tmp for temporary tasks. One machine in Ve24 (Wolfen) has an extra 549MB disk available which any user can write to, just in case a particularly complex task requiring alot of disk space must be carried out, eg. movie file processing.

Naturally, an admin can write scripts of various kinds to monitor disk usage in detailed ways, eg. regularly identify the heaviest consumers of disk resources; one could place the results into a regularly updated file for everyone to see, ie. a publicly readable "name and shame" policy (not a method I'd use unless absolutely necessary, eg. when individual users are abusing the available space for downloading game files).

UNIX Fundamentals: Installing/removing internal/external hardware. As explained in this course's introduction to UNIX, the traditional hardware platforms which run UNIX OSs have a legacy of top-down integrated design because of the needs of the market areas the systems are sold into. Because of this legacy, much of the toil normally associated with hardware modifications is removed. To a great extent, an admin can change the hardware internals of a machine without ever having to be concerned with system setup files. Most importantly, low-level issues akin to IRQ settings in PCs are totally irrelevant with traditional UNIX hardware platforms. By traditional I mean the long line of RISC-based systems from the various UNIX vendors such as Sun, IBM, SGI, HP, DEC and even Intel. This ease of use does not of course apply to ordinary PCs running those versions of UNIX which can be used with PCs, eg. Linux, OpenBSD, FreeBSD, etc.; for this category of system, the OS issues will be simpler (presumably), but the presence of a bottom-up-designed PC hardware platform presents the usual problems of compatible components, device settings, and other irritating low-level issues. This discussion uses the SGI Indy as an example system. If circumstances allow, a more up-todate example using the O2 system will also be briefly demonstrated in the practical session. Hardware from other UNIX vendors will likely be similar in terms of ease-of-access and modification, though it has to be said that SGI has been an innovator in this area of design. Many system components can be added to, or removed from a machine, or swapped between machines, without an admin having to change system setup files in order to make the system run smoothly after any alterations. Relevant components include:      

Memory units, Disk drives (both internal and external), Video or graphics boards that do not alter how the system would handle relevant processing operations. CPU subsystems which use the same instruction set and hardware-level initialisation libraries as are already installed. Removable storage devices, eg. ZIP, JAZ, Floptical, SyQuest, CDROM, DVD (where an OS is said to support it), DAT, DLT, QIC, etc. Any option board which does not impact on any aspect of existing system operation not related to the option board itself, eg. video capture, network expansion (Ethernet, HiPPI, TokenRing, etc.), SCSI expansion, PCI expansion, etc.

Further, the physical layout means the admin does not have to fiddle with numerous cables and wires. The only cables present in Indy are the two short power supply cables, and the internal SCSI device ribbon cable with its associated power cord. No cables are present for graphics boards, video options, or other possible expansion cards. Some years after the release of the Indy, SGI's O2 design allows one to perform all these sorts of component changes without having to fiddle with any cables or screws at all (the only exception being any PCI expansion, which most O2 users will probably never use anyway). This integrated approach is certainly true of Indy. The degree to which such an ethos applies to other specific UNIX hardware platforms will vary from system to system. I should imagine systems such as Sun's Ultra 5, Ultra 10 and other Ultra-series workstations are constructed in a similar way. One might expect that any system could have a single important component replaced without affecting system operation to any great degree, even though this is usually not the case with PCs, but it may come as a far greater surprise that an entire set of major internal items can be changed or swapped from one system to another without having to alter configuration files at all. Even when setup files do have to be changed, the actual task normally only involves either a simple reinstall of certain key OS software sub-units (the relevant items will be listed in accompanying documentation and release notes), or the installation of some additional software to support any new hardware-level system features. In some cases, a hardware alteration might require a software modification to be made from miniroot if the software concerned was of a type involved in normal system operation, eg. display-related graphics libraries which controlled how the display was handled given the presence of a particular graphics board revision. The main effect of this flexible approach is that an admin has much greater freedom to:    

modify systems as required, perhaps on a daily basis (eg. the way my external disk is attatched and removed from the admin machine every single working day), experiment with hardware configurations, eg. performance analysis (a field I have extensively studied with SGIs [2]), configure temporary setups for various reasons (eg. demonstration systems for visiting clients), effect maintenance and repairs, eg. cleaning, replacing a power supply, etc.

All this without the need for time-consuming software changes, or the irritating necessity to consult PC-targeted advice guides about devices (eg. ZIP) before changes are made. Knowing the scope of this flexibility with respect to a system will allow an admin to plan tasks in a more efficient manner, resulting in better management of available time.

An example of the above with respect to the SGI Indy would be as follows (this is an imaginary demonstration of how the above concepts could be applied in real-life):



An extensive component swap between two indys, plus new hardware installed.

Background information: CPUs. All SGIs use a design method which involves supplying a CPU and any necessary secondary cache plus interface ASICs on a 'daughterboard', or 'daughtercard'. Thus, replacing a CPU merely involves changing the daughtercard, ie. no fiddling with complex CPU insertion sockets, etc. Daughtercards in desktop systems can be replaced in seconds, certainly no more than a minute or two. The various CPUs available for Indy can be divided into two categories: those which support everything up to and including the MIPS III instruction set, and those which support all these plus the MIPS IV instruction set. The R4000, R4600 and R4400 CPUs all use MIPS III and are initialised on bootup with the same low-level data files, ie. the files stored in /var/sysgen. This covers the following CPUs: 100MHz 100MHz 100MHz 133MHz 133MHz 100MHz 150MHz 175MHz 200MHz

R4000PC R4000SC R4600PC R4600PC R4600SC R4400SC R4400SC R4400SC R4400SC

(no L2) (1MB L2) (no L2) (no L2) (512K L2) (1MB L2) (1MB L2) (1MB L2) (1MB L2)

Thus, two Indys with any of the above CPUs can have their CPUs swapped without having to alter system software. Similarly, the MIPS IV CPUs: 150MHz R5000PC (no L2) 150MHz R5000SC (512K L2) 180MHz R5000SC (512K L2)

can be treated as interchangeable between systems in the same way. The difference between an Indy which uses a newer vs. older CPU is that the newer CPUs require a more up-to-date version of the system PROM chip to be installed on the motherboard (a customer who orders an upgrade is suppled with the newer PROM if required).

Video/Graphics Boards. Indy can have three different boards which control display output:

8bit XL 24bit XL 24bit XZ

8bit and 24bit XL are designed for 2D applications. They are identical except for the addition of more VRAM to the 24bit version. XZ is designed for 3D graphics and so requires a slightly different installation of software graphics libraries to be installed in order to permit proper use. Thus, with respect to the XL version, an 8bit XL card can be swapped with a 24bit XL card with no need to alter system software. Indy can have two other video options:  

IndyVideo (provides video output ports as well as extra input ports), CosmoCompress (hardware-accelerate MJPEG video capture board).

IndyVideo does not require the installation of any extra software in order to be used. CosmoCompress does require some additional software to be installed (CosmoCompress compression API and libraries). Thus, IndyVideo could be installed without any post-installation software changes. swmgr can be used to install the CosmoCompress software after the option card has been installed.

Removable Media Devices. As stated earlier, no software modifications are required, unless specifically stated by the vendor. Once a device has its SCSI ID set appropriately and installed, it is recognised automatically and a relevant icon placed on the desktop for users to exploit. Some devices may require a group of DIP switches to be configured on the outside of the device, but that is all (settings to use for a particular system will be found in the supplied device manual). The first time I used a DDS3 DAT drive (Sony SDT9000) with an Indy, the only setup required was to set four DIP switches on the underside of the DAT unit to positions appropriate for use with an SGI (as detailed on the first page of the DAT manual). Connecting the DAT unit to the Indy, booting up and logging in, the DAT was immediately usable (icon available, etc.) No setup files, no software to install, etc. The first time I used a 32X CDROM (Toshiba CD-XM-6201B) not even DIP switches had to be set.

System Disks, Extra Disks. Again, installed disks are detected automatically and the relevant device files in /dev initialised to be treated as the communication points with the devices concerned. After bootup, the fx, mkfs and mount commands can be used to configure and mount new disks, while disks which already have a valid file system installed can be mounted immediately. GUI tools are available for performing these actions too.

Thus, consider two Indys:

System A

System B

200MHz R4400SC 24bit XL 128MB RAM 2GB disk IRIX 6.2

100MHz R4600PC 8bit XL 64MB RAM 1GB disk IRIX 6.2

Suppose an important company visitor is expected the next morning at 11am and the admin is asked to quickly prepare a decent demonstration machine, using a budget provided by the visiting company to cover any changes required (as a gift, any changes can be permanent). The admin orders the following extra items for next-day delivery:      

A new 4GB SCSI disk (Seagate Barracuda 7200rpm) IndyVideo board Floptical drive ZIP drive 32X Toshiba CDROM (external) DDS3 Sony DAT drive (external)

The admin decides to make the following changes (Steps 1 and 2 are carried out immediately; in order to properly support the ZIP drive, the admin needs to use IRIX 6.5 on B. The support contract means the CDs are already available.): 1. Swap the main CPU, graphics board and memory components between systems A and B. 2. Remove the 1GB disk from System B and install it as an option disk in System A. The admin uses fx and mkfs to redine the 1GB disk as an option drive, deciding to use the disk for a local /usr/share partition (freeing up perhaps 400MB of space from System A's 2GB disk). 3. The order arrives the next morning at 9am (UNIX vendors usually use couriers such as Fedex and DHL, so deliveries are normally very reliable). The 4GB disk is installed into System B (empty at this point) and the CDROM connected to the external SCSI port (SCSI ID 3). The admin then installs IRIX 6.5 onto the 4GB disk, a process which takes approximately 45 minutes. The system is powered down ready for the final hardware changes. 4. The IndyVideo board is installed in System B (sits on top of the 24bit XL board, 2 or 3 screws involved, no cables), along with the internal Floptical drive above the 4GB disk (SCSI ID set to 2). The DAT drive (SCSI ID set to 4) is daisy chained to the external CDROM. The ZIP drive is daisy chained to the DAT (SCSI ID 5 by default selector, terminator enabled). This can all be done in less than five minutes. 5. The system is rebooted, the admin logs in as root. All devices are recognised automatically and icons for each device (ZIP, CDROM, DAT, Floptical) are immediately present on the desktop and available for use. Final additional software installations can begin, ready for the visitor's arrival. An hour should be plenty of time to install specific application(s) or libraries that might be required for the visit.

I am confident that steps 1 and 2 could be completed in less than 15 minutes. Steps 3, 4 and 5 could be completed in less little more than an hour. Throughout the entire process, no OS or software changes have to be made to either System A, or to the 6.5 OS installed on System B's new 4GB after initial installation (ie. the ZIP, DAT and Floptical were not attatched to System B when the OS was installed, but they are correctly recognised by the default 6.5 OS when the devices are added afterwards). If time permits and interest is sufficient, almost all of this example can be demonstrated live (the exception is the IndyVideo board; such a board is not available for use with the Ve24 system at the moment). How does the above matter from an admin's point of view? The answer is confidence and lack of stress. I could tackle a situation such as described here in full confidence that I would not have to deal with any matters concerning device drivers, interupt addresses, system file modifications, etc. Plus, I can be sure the components will work perfectly with one another, constructed as they are as part of an integrated system design. In short, this integrated approach to system design makes the admin's life substantially easier.

The Visit is Over. Afterwards, the visitor donates funds for a CosmoCompress board and an XZ board set. Ordered that day, the boards arrive the next morning. The admin installs the CosmoCompress board into System B (2 or 3 more screws and that's it). Upon bootup, the admin installs the CosmoCompress software from the supplied CD with swmgr. With no further system changes, all the existing supplied software tools (eg. MediaRecorder) can immediately utilise the new hardware compression board. The 8bit XL board is removed from System A and replaced with the XZ board set. Using inst accessed via miniroot, the admin reinstalls the OS graphics libraries so that the appropriate libraries are available to exploit the new board. After rebooting the system, all existing software written in OpenGL automatically runs ten times faster than before, without modification.

Summary. Read available online books and manual pages on general hardware concepts thoroughly. Get to know the system - every machine will either have its own printed hardware guide, or an equivalent online book. Practice hardware changes before they are required for real. Consult any Internet-based information sources, especially newsgroup posts, 3rd-party web sites and hardware-related FAQ files.

When performing installations, follow all recommended procedures, eg. use an anti-static strap to eliminate the risk of static discharge damaging system components (especially important for handling memory items, but also just as relevant to any other device). Construct a hardware maintenance strategy for cleansing and system checking, eg. examine all mice on a regular basis to ensure they are dirt-free, use an air duster once a month to clear away accumulated dist and grime, clean the keyboards every two months, etc. Be flexible. System management policies are rarely static, eg. a sudden change in the frequency of use of a system might mean cleansing tasks need to be performed more often, eg. cleaning monitor screens. If you're not sure what the consequences of an action might be, call the vendor's hardware support service and ask for advice. Questions can be extremely detailed if need be - this kind of support is what such support services are paid to offer, so make good use of them. Before making any change to a system, whether hardware or software, inform users if possible. This is probably more relevant to software changes (eg. if a machine needs to be rebooted, use 'wall' to notify any users logged onto the machine at the time, ie. give them time to log off; if they don't, go and see why they haven't), but giving advance notice is still advisable for hardware changes too, eg. if a system is being taken away for cleaning and reinstallation, a user may want to retrieve files from /var/tmp prior to the system's removal, so place a notice up a day or so beforehand if possible.

References: 1. "Storage for the network", Network Week, Vol4 No.31, 28th April 1999, pp. 25 to 29, by Marshall Breeding. 2. SGI General Performance Comparisons: 3. http://www.futuretech.vuurwerk.nl/perfcomp.html

Detailed Notes for Day 3 (Part 3) UNIX Fundamentals: Typical system administration tasks. Even though the core features of a UNIX OS are handled automatically, there are still some jobs for an admin to do. Some examples are given here, but not all will be relevant for a particular network or system configuration.

Data Backup. A fundamental aspect of managing any computer system, UNIX or otherwise, is the backup of user and system data for possible retrieval purposes in the case of system failure, data corruption, etc. Users depend the admin to recover files that have been accidentally erased, or lost due to hardware problems.

Backup Media. Backup devices may be locally connected to a system, or remotely accessible across a network. Typical backup media types include:    

1/4" cartridge tape, 8mm cartridge tape (used infrequently today) DAT (very common) DLT (where lots of data must be archived) Floptical, ZIP, JAZ, SyQuest (common for user-level backups)

Backup tapes, disks and other media should be well looked after in a secure location [3].

Backup Tools. Software tools for archiving data include low-level format-independent tools such as dd, file and directory oriented tools such as tar and cpio, filesystem-oriented tools such as bru, standard UNIX utilities such as dump and restore (cannot be used with XFS filesystems - use xfsdump and xfsrestore instead), etc., and high-level tools (normally commercial packages) such as IRIS NetWorker. Some tools include a GUI frontend interface. The most commonly used program is tar, which is also widely used for the distribution of shareware and freeware software. Tar allows one to gather together a number of files and directories into a single 'tar archive' file which by convention should always have a '.tar' suffix. By specifying a device such as a DAT instead of an archive file, tar can thus be used to archive data directly to a backup medium. Tar files can also be compressed, usually with the .gz format (gzip and gunzip) though there are other compression utilities (compress, pack, etc.) Backup and restoration speed can be improved

by compressing files before any archiving process commences. Some backup devices have builtin hardware compression abilities. Note that files such as MPEG movies and JPEG images are already in a compressed format, so compressing these prior to backup is pointless. Straightforward networks and systems will almost always use a DAT drive as the backup device and tar as the software tool. Typically, the 'cron' job scheduling system is used to execute a backup at regular intervals, usually overnight. Cron is discussed in more detail below.

Backup Strategy. Every UNIX guide will recommend the adoption of a 'backup strategy', ie. a combination of hardware and software related management methods determined to be the most suitable for the site in question. A backup strategy should be rigidly adhered to once in place. Strict adherence allows an admin to reliably assess whether lost or damaged data is recoverable when a problem arises. Exactly how an admin performs backups depends upon the specifics of the site in question. Regardless of the chosen strategy, at least two full sets of reasonably current backups should always be maintained. Users should also be encouraged to make their own backups, especially with respect to files which are changed and updated often.

What/When to Backup. How often a backup is made depends on the system's frequency of use. For a system like the Ve24 SGI network, a complete backup of user data every night, plus a backup of the server's system disk once a week, is fairly typical. However, if a staff member decided to begin important research with commercial implications on the system, I might decide that an additional backup at noon each day should also be performed, or even hourly backups of just that person's account. Usually, a backup archives all user or system data, but this may not be appropriate for some sites. For example, an artist or animator may only care about their actual project files in their ~/Maya project directory (Maya is a professional Animation/Rendering package) rather than the files which define their user environment, etc. Thus, an admin might decide to only backup every users' Maya projects directory. This would, for example, have the useful side effect of excluding data such as the many files present in a user's .netscape/cache directory. In general though, all of a user's account is archived. If a change is to be made to a system, especially a server change, then separate backups should be performed before and after the change, just in case anything goes wrong. Since root file systems do not change very much, they can be backed up less frequently, eg. once per week. An exception might be if the admin wishes to keep a reliable record of system access logs which are part of the root file system, eg. those located in the files (for example):

/var/adm/SYSLOG /var/netscape/suitespot/proxy-sysname-proxy/logs

The latter of the two would be relevant if a system had a Proxy server installed, ie. 'sysname' would be the host name of the system. Backing up /usr and /var instead of the entire / root directory is another option - the contents of /usr and /var change more often than many other areas of the overall file system, eg. users' mail is stored in /var/mail and most executable programs are under /usr. In some cases, it isn't necessary to backup an entire root filesystem anyway. For example, the Indys in Ve24 all have more or less identical installations: all Indys with a 549MB disk have the same disk contents as each other, likewise for those with 2GB disks. The only exception is Wolfen which uses IRIX 6.5 in order to provide proper support for an attached ZIP drive. Thus, a backup of one of the client Indys need only concern specific key files such as /etc/hosts, /etc/sys_id, /var/flexlm/license.dat, etc. However, this policy may not work too well for servers (or even clients) because:  

an apparently small change, eg. adding a new user, installing a software patch, can affect many files, the use of GUI-based backup tools does not aid an admin in remembering which files have been archived.

For this reason, most admins will use tar, or a higher-level tool like xfsdump. Note that because restoring data from a DAT device is slower than copying data directly from disk to disk (especially modern UltraSCSI disks), an easier way to restore a client's system disk where all clients have identical disk contents - is to clone the disk from another client and then alter the relevant files; this is what I do if a problem occurs. Other backup devices can be much faster though [1], eg. DLT9000 tape streamer, or military/industrial grade devices such as the DCRsi 240 Digital Cartridge Recording System (30MB/sec) as was used to backup data during the development of the 777 aircraft, or the Ampex DIS 820i Automated Cartridge Library (scalable from 25GB to 6.4TB max capacity, 80MB/sec sustained record rate, 800MB/sec search/read rate, 30 seconds maximum search time for any file), or just a simple RAID backup which some sites may choose to use. It's unusual to use another disk as a backup medium, but not unheard of. Theoretically, it's the fastest possible backup medium, so if there's a spare disk available, why not? Some sites may even have a 'mirror' system whereby a backup server B copies exactly the changes made to an identical file system on the main server A; in the event of serious failure, server B can take over immediately. SGI's commercial product for this is called IRIS FailSafe, with a switchover time between A and B of less than a millisecond. Fail-safe server configurations like this are the ultimate form of backup, ie. all files are being backed up in real-time, and the support hardware has a backup too. Any safety-critical installation will probably use such methods. Special power supplies might be important too, eg. a UPS (Uninterruptable Power Supply) which gives some additional power for a few minutes to an hour or more after a power failure and

notifies the system to facilitate a safe shutdown, or a dedicated backup power generator could be used, eg. hospitals, police/fire/ambulance, airtraffic control, etc. Note: systems managed by more than one admin should be backed up more often; admin policies should be consistent.

Incremental Backup. This method involves only backing up files which have changed since the previous backup, based on a particular schedule. An incremental schema offers the same degree of 'protection' as an entire system backup and is faster since fewer files are archived each time, which means faster restoration time too (fewer files to search through on a tape). An example schedule is given in the online book, "IRIX Admin: Backup, Security, and Accounting': "An incremental scheme for a particular filesystem looks something like this: 1. On the first day, back up the entire filesystem. This is a monthly backup. 2. On the second through seventh days, back up only the files that changed from the previous day. These are daily backups. 3. On the eighth day, back up all the files that changed the previous week. This is a weekly backup. 4. Repeat steps 2 and 3 for four weeks (about one month). 5. After four weeks (about a month), start over, repeating steps 1 through 4. You can recycle daily tapes every month, or whenever you feel safe about doing so. You can keep the weekly tapes for a few months. You should keep the monthly tapes for about one year before recycling them."

Backup Using a Network Device. It is possible to archive data to a remote backup medium by specifying the remote host name along with the device name. For example, an ordinary backup to a locally attached DAT might look like this: tar cvf /dev/tape /home/pub

Or if no other relevant device was present:

tar cv /home/pub

For a remote device, simply add the remote host name before the file/directory path: tar cvf yoda:/dev/tape /home/pub

Note that if the tar command is trying to access a backup device which is not made by the source vendor, then '/dev/tape' may not work. In such cases, an admin would have to use a suitable lower-level device file, ie. one of the files in /dev/rmt - exactly which one can be determined by deciding on the required functionality of the device, as explained in the relevant device manual, along with the SCSI controller ID and SCSI device ID. Sometimes a particular user account name may have to be supplied when accessing a remote device, eg.: tar guest@yoda:/dev/tape /home/pub

This example wouldn't actually work on the Ve24 network since all guest accounts are locked out for security reasons, except on Wolfen. However, an equivalent use of the above syntax can be demonstrated using Wolfen's ZIP drive and the rcp (remote copy) command: rcp -r /home/pub guest.guest1@wolfen:/zip

Though note that the above use of rcp would not retain file time/date creation/modification information when copying the files to the ZIP disk (tar retains all information).

Automatic Backup With Cron. The job scheduling system called cron can be used to automatically perform backups, eg. overnight. However, such a method should not be relied upon - nothing is better than someone manually executing/observing a backup, ensuring that the procedure worked properly, and correctly labelling the tape afterwards. If cron is used, a typical entry in the root cron jobs schedule file (/var/spool/cron/crontabs/root) might look like this: 0 3 * * * /sbin/tar cf /dev/tape /home

This would execute a backup to a locally attached backup device at 3am every morning. Of course, the admin would have to ensure a suitable media was loaded before leaving at the end of each day. This is a case where the '&&' operator can be useful: in order to ensure no subsequent operation could alter the backed-up data, the 'eject' command could be employed thus: 0 3 * * * /sbin/tar cf /dev/tape /home && eject /dev/tape

Only after the tar command has finished will the backup media be ejected. Notice there is no 'v' option in these tar commands (verbose mode). Why bother? Nobody will be around to see the output. However, an admin could modify the command to record the output for later reading: 0 3 * * * /sbin/tar cvf /dev/tape /home > /var/tmp/tarlog && eject /dev/tape

Caring for Backup Media. This is important, especially when an admin is responsible for backing up commercially valuable, sensitive or confidential data. Any admin will be familiar with the usual common-sense aspects of caring for any storage medium, eg. keeping media away from strong magnetic fields, extremes of temperature and humidity, etc., but there are many other factors too. The "IRIX Admin: Backup, Security, and Accounting' guide contains a good summary of all relevant issues: "Storage of Backups Store your backup tapes carefully. Even if you create backups on more durable media, such as optical disks, take care not to abuse them. Set the write protect switch on tapes you plan to store as soon as a tape is written, but remember to unset it when you are ready to overwrite a previously-used tape. Do not subject backups to extremes of temperature and humidity, and keep tapes away from strong electromagnetic fields. If there are a large number of workstations at your site, you may wish to devote a special room to storing backups. Store magnetic tapes, including 1/4 in. and 8 mm cartridges, upright. Do not store tapes on their sides, as this can deform the tape material and cause the tapes to read incorrectly. Make sure the media is clearly labeled and, if applicable, write-protected. Choose a labelcolor scheme to identify such aspects of the backup as what system it is from, what level of backup (complete versus partial), what filesystem, and so forth. To minimize the impact of a disaster at your site, such as a fire, you may want to store main copies of backups in a different building from the actual workstations. You have to balance this practice, though, with the need to have backups handy for recovering files. If backups contain sensitive data, take the appropriate security precautions, such as placing them in a locked, secure room. Anyone can read a backup tape on a system that has the appropriate utilities.

How Long to Keep Backups You can keep backups as long as you think you need to. In practice, few sites keep system backup tapes longer than about a year before recycling the tape for new backups. Usually, data for specific purposes and projects is backed up at specific project milestones (for example, when a project is started or finished). As site administrator, you should consult with your users to determine how long to keep filesystem backups. With magnetic tapes, however, there are certain physical limitations. Tape gradually loses its flux (magnetism) over time. After about two years, tape can start to lose data. For long-term storage, re-copy magnetic tapes every year to year-and-a-half to prevent data loss through deterioration. When possible, use checksum programs, such as the sum(1) utility, to make sure data hasn't deteriorated or altered in the copying process. If you want to reliably store data for several years, consider using optical disk.

Guidelines for Tape Reuse You can reuse tapes, but with wear, the quality of a tape degrades. The more important the data, the more precautions you should take, including using new tapes. If a tape goes bad, mark it as "bad" and discard it. Write "bad" on the tape case before you throw it out so that someone doesn't accidentally try to use it. Never try to reuse an obviously bad tape. The cost of a new tape is minimal compared to the value of the data you are storing on it."

Backup Performance. Sometimes data archive/extraction speed may be important, eg. a system critical to a commercial operation fails and needs restoring, or a backup/archive must be made before a deadline. In these situations, it is highly advisable to use a fast backup medium, eg. DDS3 DAT instead of DDS1 DAT. For example, an earlier lecture described a situation where a fault in the Ve24 hub caused unnecessary fault-hunting. As part of that process, I restored the server's system disk from a backup tape. At the time, the backup device was a DDS1 DAT. Thus, to restore some 1.6GB of data from a standard 2GB capacity DAT tape, I had to wait approximately six hours for the restoration to complete (since the system was needed the next morning, I stayed behind well into the night to complete the operation).

The next day, it was clear that using a DDS1 was highly inefficient and time-wasting, so a DDS3 DAT was purchased immediately. Thus, if the server ever has to be restored from DAT again, and despite the fact it now has a larger disk (4GB with 2.5GB of data typically present), even a full restoration would only take three hours instead of six (with 2.5GB used, the restoration would finish in less than two hours). Tip: as explained in the lecture on hardware modifications and installations, consider swapping a faster CPU into a system in order to speedup a backup or restoration operation - it can make a significant difference [2].

Hints and Tips.  







    

Keep tape drives clean. Newer tapes deposit more dirt than old ones. Use du and df to check that a media will have enough space to store the data. Consider using data compression options if space on the media is at a premium (some devices may have extra device files which include a 'c' in the device name to indicate it supports hardware compression/decompression, eg. a DLT drive whose raw device file is /dev/rmt/tps0d5vc). There is no point using compression options if the data being archived is already compressed with pack, compress, gzip, etc. or is naturally compressed anyway, eg. an MPEG movie, JPEG image, etc. Use good quality media. Do not use ordinary audio DAT tapes with DAT drives for computer data backup; audio DAT tapes are of a lower quality than DAT tapes intended for computer data storage. Consider using any available commands to check beforehand that a file system to be backed up is not damaged or corrupted (eg. fsck). This will be more relevant to older file system types and UNIX versions, eg. fsck is not relevant to XFS filesystems (IRIX 6.x and later), but may be used with EFS file systems (IRIX 5.3 and earlier). Less important when dealing with a small number of items. Label all backups, giving full details, eg. date, time, host name, backup command used (so you or another admin will know how to extract the files later), general contents description, and your name if the site has more than one admin with responsibility for backup procedures. Verify a backup after it is made; some commands require specific options, while others provide a means of listing the contents of a media, eg. the -t option used with tar. Write-protect a media after a backup has finished. Keep a tally on the media of how many times it has been used. Consider including an index file at the very start of the backup on the media, eg.: ls -AlhFR /home > /home/0000index && tar cv /home

Note: such index files can be large.  

Exploit colour code schemes to denote special attributes, eg. daily vs. weekly vs. monthly tapes. Be aware of any special issues which may be relevant to the type of data being backed up. For example, movie files can be very large; on SGIs, tar requires the K option in order to archive files larger than 2GB. Use of this option may mean the archived media is not compatible with another vendor's version of tar.



Consult the online guides. Such guides often have a great deal of advice, examples, etc.

tar is a powerful command with a wide range of available options and is used on UNIX systems worldwide. It is typical of the kind of UNIX command for which an admin is well advised to read through the entire man page. Other commands in this category include find, rm, etc. Note: if compatibility between different versions of UNIX is an issue, one can use the lowerlevel dd command which allows one to specify more details about how the data is to be dealt with as it is sent to or received from a backup device, eg. changing the block size of the data. A related command is 'mt' which can be used to issue specific commands to a magnetic tape device, eg. print device details and default block size. If problems occur during backup/restore operations, remember to check /var/adm/SYSLOG for any relevant error messages (useful if one cannot be present to monitor the operation in person).

Restoring Data from Backup Media. Restoring non-root-filesystem data is trivial: just use the relevant extraction tool, eg.: tar xv /dev/tape

However, restoring the root '/' partition usually requires access to an appropriate set of OS CD(s) and a full system backup tape of the / partition. Further, many OSs may insist that backup and restore operations at the system level must be performed with a particular tool, eg. Backup and Restore. If particular tools were required but not used to create the backup, or if the system cannot boot to a state where normal extraction tools can be used (eg. damage to the /usr section of the filesystem) then a complete reinstallation of the OS must be done, followed by the extraction of the backup media ontop of the newly created filesystem using the original tool. Alternatively, a fresh OS install can be done, then a second empty disk inserted on SCSI ID 2, setup to be a root disk, the backup media extracted onto the second disk, then the volume header copied over using dhvtool or other command relevant to the OS being used (this procedure is similar to disk cloning). Finally, a quick swap of the disks so that the second disk is on SCSI ID 1 and the system is back to normal. I personally prefer this method since it's "cleaner", ie. one can never be sure that extracting files ontop of an existing file system will result in a final filesystem that is genuinely identical to the original. By using a second disk in this way, the psychological uncertainty is removed. Just like backing up data to a remote device, data can be restored from a remote device as well. An OS 'system recovery' menu will normally include an option to select such a restoration method - a full host:/path specification is required. Note that if a filesystem was archived with a leading / symbol, eg.: tar cvf /dev/tape /home/pub/movies/misc

then an extraction may fail if an attempt is made to extract the files without changing the equivalent extraction path, eg. if a student called cmpdw entered the following command with such a tape while in their home directory: tar xvf /dev/tape

then the command would fail since students cannot write to the top level of the /home directory. Thus, the R option can be used (or equivalent option for other commands) to remove leading / symbols so that files are extracted into the current directory, ie. if cmpdw entered: tar xvfR /dev/tape

then tar would place the /home data from the tape into the cmpdw's home directory, ie. cmpdw would see a new directory with the name: /home/students/cmpdw/home

Other Typical Daily Tasks. From my own experience, these are the types of task which most admins will likely carry out every day:  

  

 



Check disk usage across the system. Check system logs for important messages, eg. system errors and warnings, possible suspected access attempts from remote systems (hackers), suspicious user activity, etc. This applies to web server logs too (use script processing to ease analysis). Check root's email for relevant messages (eg. printers often send error messages to root in the form of an email). Monitor system status, eg. all systems active and accessible (ping). Monitor system performance, eg. server load, CPU-hogging processes running in background that have been left behind by a careless user, packet collision checks, network bandwidth checks, etc. Ensure all necessary system services are operating correctly. Tour the facilities for general reasons, eg. food consumed in rooms where such activity is prohibited, users who have left themselves logged in by mistake, a printer with a paper jam that nobody bothered to report, etc. Users are notoriously bad at reporting physical hardware problems - the usual response to a problem is to find an alternative system/device and let someone else deal with it. Dealing with user problems, eg. "Somebody's changed my password!" (ie. the user has forgotten their password). Admins should be accessible by users, eg. a public email address, web feedback form, post box by the office, etc. Of course, a user can always send an email to the root account, or to the admin's personal account, or simply visit the admin in person. Some systems, like Indy, may have additional abilities, eg. video conferencing: a user can use the InPerson software to request a live video/audio link to the admin's system, allowing 2-way communication (see the inperson man page). Other

 

 

facilities such as the talk command can also be employed to contact the admin, eg. at a remote site. It's up to the admin to decide how accessible she/he should be - discourage trivial interruptions. Work on improving any relevant aspect of system, eg. security, services available to users (software, hardware), system performance tuning, etc. Cleaning systems if they're dirty; a user will complain about a dirty monitor screen or sticking mouse behaviour, but they'll never clean them for you. Best to prevent complaints via regular maintenance. Consider other problem areas that may be hidden, eg. blowing loose toner out of a printer with an air duster can. Learning more about UNIX in general. Taking necessary breaks! A tired admin will make mistakes.

This isn't a complete list, and some admins will doubtless have additional responsibilities, but the above describes the usual daily events which define the way I manage the Ve24 network.

Useful file: /etc/motd The contents of this file will be echoed to stdout whenever a user activates a login shell. Thus, the message will be shown when:   

a user first logs in (contents in all visible shell windows), a user accesses another system using commands such as rlogin and telnet, a user creates a new console shell window; from the man page for console, "The console provides the operator interface to the system. The operating system and system utility programs display error messages on the system console."

The contents of /etc/motd are not displayed when the user creates a new shell using 'xterm', but is displayed when winterm is used. The means by which xterm/winterm are executed are irrelevant (icon, command, Toolchest, etc.) The motd file can be used as a simple way to notify users of any developments. Be careful of allowing its contents to become out of date though. Also note that the file is local to each system, so maintaining a consistent motd between systems might be necessary, eg. a script to copy the server's motd to all clients. Other possible ways to inform users of worthy news is the xconfirm command, which could be included within startup scripts, user setup files, etc. From the xconfirm man page: "xconfirm displays a line of text for each -t argument specified (or a file when the -file argument is used), and a button for each -b argument specified. When one of the buttons is pressed, the label of that button is written to xconfirm's standard output. The enter key activates the specified default button. This provides a means of communication/feedback from within shell scripts and a means to display useful information to a user from an application. Command line options are available to specify geometry, font style, frame

style, modality and one of five different icons to be presented for tailored visual feedback to the user." For example, xconfirm could be used to interactively warn the user if their disk quota has been exceeded.

UNIX Fundamentals: System bootup and shutdown, events, daemons. SGI's IRIX is based on System V with BSD enhancements. As such, the way an IRIX system boots up is typical of many UNIX systems. Some interesting features of UNIX can be discovered by investigating how the system starts up and shuts down. After power on and initial hardware-level checks, the first major process to execute is the UNIX kernel file /unix, though this doesn't show up in any process list as displayed by commands such as ps. The kernel then starts the init program to begin the bootup sequence, ie. init is the first visible process to run on any UNIX system. One will always observe init with a process ID of 1: % ps -ef | grep init | grep -v grep root 1 0 0 21:01:57 ?

0:00 /etc/init

init is used to activate, or 'spawn', other processes. The /etc/inittab file is used to determine what processes to spawn. The lecture on shell scripts introduced the init command, in a situation where a system was made to reboot using: init 6

The number is called a 'run level', ie. a software configuration of the system under which only a selected group of processes exist. Which processes correspond to which run level is defined in the /etc/inittab file. A system can be in any one of eight possible run levels: 0 to 6, s and S (the latter two are identical). The states which most admins will be familiar with are 0 (total shutdown and power off), 1 (enter system administration mode), 6 (reboot to default state) and S (or s) for 'single-user' mode, a state commonly used for system administration. The /etc/inittab file contains an 'initdefault' state, ie. the run level to enter by default, which is normally 2, 3 or 4. 2 is the most common, ie. the full multi-user state with all processes, daemons and services activated. The /etc/inittab file is constructed so that any special initialisation operations, such as mounting filesystems, are executed before users are allowed to access the system. The init man page has a very detailed description of these first few steps of system bootup. Here is a summary:

An initial console shell is created with which to begin spawning processes. The fact that a shell is used this early in the boot cycle is a good indication of how closely related shells are to UNIX in general. The scripts which init uses to manage processes are stored in the /etc/init.d directory. During bootup, the files in /etc/rc2.d are used to bring up system processes in the correct order (the /etc/rc0.d directory is used for shutdown - more on that later). These files are actually links to the equivalent script files in /etc/init.d. Each file in /etc/rc2.d (the 2 presumably corresponding to run level 2 by way of a naming convention) all begin with S followed by two digits (S for 'Spawn' perhaps), causing them to be executed in a specific order as determined by the first 3 characters of each file (alphanumeric). Thus, the first file run in the console shell is /etc/rc2.d/S00anounce (a link to /etc/init.d/announce - use 'more' or load this file into an editor to see what it does). init will run the script with appropriate arguments depending on whether the procedure being followed is a startup or shutdown, eg. 'start', 'stop', etc. The /etc/config directory is used by each script in /etc/init.d to decide what it should do. /etc/config contains files which correspond to files found in /etc/rc2.d with the same name. These /etc/config files contain simply 'on' or 'off'. The chkconfig command is used to test the appropriate file by each script, returning true or false depending on its contents and thus determining whether the script does anything. An admin uses chkconfig to set the various files' contents to on or off as desired, eg. to switch a system into stand-alone mode, turn off all network-related services on the next reboot: chkconfig chkconfig ckkconfig chkconfig init 6

network off nfs off yp off named off

Enter chkconfig on its own to see the current configuration states. Lower-level functions are performed first, beginning with a SCSI driver check to ensure that the system disk is going to be accessed correctly. Next, key file systems are mounted. Then the following steps occur, IF the relevant /etc/config file contains 'on' for any step which depends on that fact:    

 

A check to see if any system crash files are present (core dumps) and if so to send a message to stdout. Display company trademark information if present; set the system name. Begin system activity reporting daemons. Create a new OS kernel if any system changes have been made which require it (this is done by testing whether or not any of the files in /var/sysgen are newer than the /unix kernel file). Configure and activate network ports. etc.

Further services/systems/tasks to be activated if need be include ip-aliasing, system auditing, web servers, license server daemons, core dump manager, swap file configuration, mail daemon, removal of /tmp files, printer daemon, higher-level web servers such as Netscape Administration Server, cron, PPP, device file checks, and various end-user and application daemons such as the midi sound daemon which controls midi library access requests. This isn't a complete list, and servers will likely have more items to deal with than clients, eg. starting up DNS, NIS, security & auditing daemons, quotas, internet routing daemons, and more than likely a time daemon to serve as a common source of current time for all clients. It should be clear that the least important services are executed last - these usually concern userrelated or application-related daemons, eg. AppleTalk, Performance Co-Pilot, X Windows Display Manager, NetWare, etc. Even though a server or client may initiate many background daemon processes on bootup, during normal system operation almost all of them are doing nothing at all. A process which isn't doing anything is said to be 'idle'. Enter: ps -ef

The 'C' column shows the activity level of each process. No matter when one checks, almost all the C entries will be zero. UNIX background daemons only use CPU time when they have to, ie. they remain idle until called for. This allows a process which truly needs CPU cycles to make maximum use of available CPU time. The scripts in /etc/init.d may startup other services if necessary as well. Extra configuration/script files are often found in /etc/config in the form of a file called servicename.options, where 'servicename' is the name of the normal script run by init. Note: the 'verbose' file in /etc/config is used by scripts to dynamically redefine whether the echo command is used to output progress messages. Each script checks whether verbose mode is on using the chkconfig command; if on, then a variable called $ECHO is set to 'echo'; if off, $ECHO is set to something which is interpreted by a shell to mean "ignore everything that follows this symbol", so setting verbose mode to off means every echo command in every script (which uses the $ECHO test and set procedure) will produce no output at all - a simple, elegant and clean way of controlling system behaviour. When shutting a system down, the behaviour described above is basically just reversed. Scripts contained in the /etc/rc0.d directory perform the necessary actions, with the name prefixes determining execution order. Once again, the first three characters of each file name decide the alphanumeric order in which to execute the scripts; 'K' probably stands for 'Kill'. The files in /etc/rc0.d shutdown user/application-related daemons first, eg. the MIDI daemon. Comparing the contents of /etc/rc2.d and /etc/rc0.d, it can be seen that their contents are mirror images of each other. The alphanumeric prefixes used for the /etc/rc*.d directories are defined in such a way as to allow extra scripts to be included in those directories, or rather links to relevant scripts in

/etc/init.d. Thus, a custom 'static route' (to force a client to always route externally via a fixed route) can be defined by creating new links from /etc/rc2.d/S31network and /etc/rc0.2/K39network, to a custom file called network.local in /etc/init.d. There are many numerical gaps amongst the files, allowing for great expansion in the number of scripts which can be added in the future.

References: 1. Extreme Technologies: 2. http://www.futuretech.vuurwerk.nl/extreme.html

3. DDS1 vs. DDS3 DAT Performance Tests: 4. 5. 6. 7.

http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT1 http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT2 http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT3 http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT4

8. "Success With DDS Media", Hewlett Packard, Edition 1, February 1991.

Detailed Notes for Day 3 (Part 4) UNIX Fundamentals: Security and Access Control.

General Security. Any computer system must be secure, whether it's connected to the Internet or not. Some issues may be irrelevant for Intranets (isolated networks which may or may not use Internet-style technologies), but security is still important for any internal network, if only to protect against employee grievances or accidental damage. Crucially, a system should not be expanded to include external network connections until internal security has been dealt with, and individual systems should not be added to a network until they have been properly configured (unless the changes are of a type which cannot be made until the system is physically connected). However, security is not an issue which can ever be finalised; one must constantly maintain an up-to-date understanding of relevant issues and monitor the system using the various available tools such as 'last' (display recent logins; there are many other available tools and commands). In older UNIX variants, security mostly involved configuring the contents of various system/service setup files. Today, many UNIX OSs offer the admin a GUI-frontend security manager to deal with security issues in a more structured way. In the case of SGI's IRIX, version 6.5 has such a GUI tool, but 6.2 does not. The GUI tool is really just a convenient way of gathering together all the relevant issues concerning security in a form that is easier to deal with (ie. less need to look through man pages, online books, etc.) The security issues themselves are still the same. UNIX systems have a number of built-in security features which offer a reasonably acceptable level of security without the need to install any additional software. UNIX gives users a great deal of flexibility in how they manage and share their files and data; such convenience may be incompatible with an ideal site security policy, so decisions often have to be taken about how secure a system is going to be - the more secure a system is, the less flexible for users it becomes. Older versions of any UNIX variant will always be less secure than newer ones. If possible, an admin should always try and use the latest version in order to obtain the best possible default security. For example, versions of IRIX as old as 5.3 (circa 1994) had some areas of subtle system functionality rather open by default (eg. some feature or service turned on), whereas versions later than 6.0 turned off the features to improve the security of a default installation UNIX vendors began making these changes in order to comply with the more rigorous standards demanded by the Internet age. Standard UNIX security features include: 1. 2. 3. 4.

File ownership, File permissions, System activity monitoring tools, eg. who, ps, log files, Encryption-based, password-protected user accounts,

5. An encryption program (crypt) which any user can exploit.

Figure 60. Standard UNIX security features.

All except the last item above have already been discussed in previous lectures. The 'crypt' command can be used by the admin and users to encrypt data, using an encryption key supplied as an argument. Crypt employs an encryption schema based on similar ideas used in the German 'Enigma' machine in WWII, although crypt's implementation of the mathematical equivalent is much more complex, like having a much bigger and more sophisticated Enigma machine. Crypt is a satisfactorily secure program; the man page says, "Methods of attack on such machines are known, but not widely; moreover the amount of work required is likely to be large." However, since crypt requires the key to be supplied as an argument, commands such as ps could be used by others to observe the command in operation, and hence the key. This is crypt's only weakness. See the crypt man page for full details on how crypt is used.

Responsibility. Though an admin has to implement security policies and monitor the system, ordinary users are no less responsible for ensuring system security in those areas where they have influence and can make a difference. Besides managing their passwords carefully, users should control the availability of their data using appropriate read, write and execute file permissions, and be aware of the security issues surrounding areas such as accessing the Internet. Security is not just software and system files though. Physical aspects of the system are also important and should be noted by users as well as the admin. Thus:    

 

Any item not secured with a lock, cable, etc. can be removed by anyone who has physical access. Backups should be securely stored. Consider the use of video surveillance equipment and some form of metal-key/keycard/numeric-code entry system for important areas. Account passwords enable actions performed on the system to be traced. All accounts should have passwords. Badly chosen passwords, and old passwords, can compromise security. An admin should consider using password-cracking software to ensure that poorly chosen passwords are not in use. Group permissions for files should be set appropriately (user, group, others). Guest accounts can be used anonymously; if a guest account is necessary, the tasks which can be carried out when logged in as guest should be restricted. Having open guest accounts on multiple systems which do not have common ordinary accounts is unwise - it allows users to anonymously exchange data between such systems when their normal accounts would not

   













allow them to do so. Accounts such as guest can be useful, but they should be used with care, especially if they are left with no password. Unused accounts should be locked out, or backed up and removed. If a staff member leaves the organisation, passwords should be changed to ensure such former users do not retain access. Sensitive data should not be kept on systems with more open access such as anonymous ftp and modem dialup accounts. Use of the su command amongst users should be discouraged. Its use may be legitimate, but it encourages lax security (ordinary users have to exchange passwords in order to use su). Monitor the /var/adm/sulog file for any suspicious use of su. Ensure that key files owned by a user are writeable only by that user, thus preventing 'trojan horse' attacks. This also applies to root-owned files/dirs, eg. /, /bin, /usr/bin, /etc, /var, and so on. Use find and other tools to locate directories that are globally writeable - if such a directory is a user's home directory, consider contacting the user for further details as to why their home directory has been left so open. For added security, use an account-creation schema which sets users' home directories to not be readable by groups or others by default. Instruct users not to leave logged-in terminals unattended. The xlock command is available to secure an unattended workstation but its use for long periods may be regarded as inconsiderate by other users who are not able to use the terminal, leading to the temptation of rebooting the machine, perhaps causing the logged-in user to lose data. Only vendor-supplied software should be fully trusted. Commercial 3rd-party software should be ok as long as one has confidence in the supplier, but shareware or freeware software must be treated with care, especially if such software is in the form of precompiled ready-to-run binaries (precompiled non-vendor software might contain malicious code). Software distributed in source code form is safer, but caution is still required, especially if executables have to be owned by root and installed using the set-UID feature in order to run. Set-UID and set-GID programs have legitimate uses, but because they are potentially harmful, their presence on a system should be minimised. The find command can be used to locate such files, while older file system types (eg. EFS) can be searched with commands such as ncheck. Network hardware can be physically tapped to eavesdrop on network traffic. If security must be particularly tight, keep important network hardware secure (eg. locked cupboard) and regularly check other network items (cables, etc.) for any sign of attack. Consider using specially secure areas for certain hardware items, and make it easy to examine cabling if possible (keep an up-todate printed map to aid checks). Fibre-optic cables are harder to interfere with, eg. FDDI. Consider using video surveillance technologies in such situations. Espionage and sabotage are issues which some admins may have to be aware of, especially where commercially sensitive or government/police-related work data is being manipulated. Simple example: could someone see a monitor screen through a window using a telescope? What about RF radiation? Remote scanners can pickup stray monitor emissions, so consider appropriate RF shielding (Faraday Cage). What about insecure phone lines? Could someone, even an ordinary user, attach a modem to a system and dial out, or allow someone else to dial in? Keep up-to-date with security issues; monitor security-related sites such as www.rootshell.com, UKERNA, JANET, CERT, etc. [7]. Follow any extra advice given in vendor-specific security FAQ files (usually posted to relevant 'announce' or 'misc' newsgroups, eg. comp.sys.sgi.misc). Most UNIX vendors also have an anonymous ftp site from which customers can obtain security patches and other related information. Consider joining any specialised mailing lists that may be available.

 

If necessary tasks are beyond one's experience and capabilities, consider employing a vendorrecommended external security consultancy team. Exploit any special features of the UNIX system being used, eg. at night, an Indy's digital camera could be used to send single frames twice a second across the network to a remote system for subsequent compression, time-stamping and recording. NB: this is a real example which SGI once helped a customer to do in order to catch some memory thieves. Figure 61. Aspects of a system relevant to security.

Since basic security on UNIX systems relies primarily on login accounts, passwords, file ownership and file permissions, proper administration and adequate education of users is normally sufficient to provide adequate security for most sites. Lapses in security are usually caused by human error, or improper use of system security features. Extra security actions such as commercial security-related software are not worth considering if even basic features are not used or are compromised via incompetence. An admin can alter the way in which failed login attempts are dealt with by configuring the /etc/default/login file. There are many possibilities and options - see the 'login' reference page for details (man login). For example, an effective way to enhance security is to make repeated guessing of account passwords an increasingly slow process by penalising further login attempts with ever increasing delays between login failures. Note that GUI-based login systems may not support features such as this, though one can always deactivate them via an appropriate chkconfig command. Most UNIX vendors offer the use of hardware-level PROM passwords to provide an extra level of security, ie. a password is required from a users who attempts to gain access to any low-level hardware PROM-based 'Command Monitor', giving greater control over who can carry out admin-level actions. While PROM passwords cannot prevent physical theft (eg. someone stealing a disk and accessing its data by installing it as an option drive on another system), they do limit the ability of malicious users to boot a system using their own program or device (a common flaw with Mac systems), or otherwise harm the system at its lowest level. If the PROM password has been forgotten, the root user can reset it. If both are lost, then one will usually have to resort to setting a special jumper on the system motherboard, or temporarily removing the PROM chip altogether (the loss of power to the chip resets the password).

Shadow Passwords If the /etc/passwd file can be read by users, then there is scope for users to take a copy away to be brute-force tested with password-cracking software. The solution is to use a shadow password file called /etc/shadow - this is a copy of the ordinary password file (/etc/passwd) which cannot be accessed by non-root users. When in use, the password fields in /etc/passwd are replaced with an 'x'. All the usual password-related programs work in the same way as before, though shadow passwords are dealt with in a different way for systems using NIS (this is because NIS keeps all password data for ordinary users in a different file called /etc/passwd.nis). Users won't notice any

difference when shadow passwords are in use, except that they won't be able to see the encrypted form of their password anymore. The use of shadow passwords is activated simply by running the 'pwconv' program (see the man page for details). Shadow passwords are in effect as soon as this command has been executed.

Password Ageing. An admin can force passwords to age automatically, ensuring that users must set a new password at desired intervals, or no earlier than a certain interval, or even immediately. The passwd command is used to control the various available options. Note that NIS does not support password ageing.

Choosing Passwords. Words from the dictionary should not be used, nor should obvious items such as film characters and titles, names of relatives, car number plates, etc. Passwords should include obscure characters, digits and punctuation marks. Consider using and mixing words from other languages, eg. Finnish, Russian, etc. An admin should not use the same root password for more than one system, unless there is good reason. When a new account is created, a password should be set there and then. If the user is not immediately present, a default password such as 'password' might be used in the expectation that the user will login in immediately and change it to something more suitable. An admin should lockout the account if the password isn't changed after some duration: replace the password entry for the user concerned in the /etc/passwd file with anything that contains at least one character that is not used by the encryption schema, eg. '*'. Modern UNIX systems often include a minimum password length and may insist on certain rules about what a password can be, eg. at least one digit.

Network Security. As with other areas of security, GUI tools may be available for controlling network-related security issues, especially those concerning the Internet. Since GUI tools may vary between different UNIX OSs, this discussion deals mainly with the command line tools and related files. Reminder: there is little point in tightening network security if local security has not yet been dealt with, or is lax. Apart from the /etc/passwd file, the other important files which control network behaviour are:

/etc/hosts.equiv

A list of trusted hosts.

.rhosts

A list of hosts that are allowed access to a specific user account.

Figure 62. Files relevant to network behaviour.

These three files determine whether a host will accept an access request from programs such as rlogin, rcp, rsh, or rdist. Both hosts.equiv and .rhosts have reference pages (use 'man hosts.equiv' and 'man rhosts'). Suppose a user on host A attempts to access a remote host B. As long as the hosts.equiv file on B contains the host name of A, and B's /etc/passwd lists A's user ID as a valid account, then no further checks occur and the access is granted (all successful logins are recorded in /var/adm/SYSLOG). The hosts.equiv file used by the Ve24 Indys contains the following: localhost yoda.comp.uclan.ac.uk akira.comp.uclan.ac.uk ash.comp.uclan.ac.uk cameron.comp.uclan.ac.uk chan.comp.uclan.ac.uk conan.comp.uclan.ac.uk gibson.comp.uclan.ac.uk indiana.comp.uclan.ac.uk leon.comp.uclan.ac.uk merlin.comp.uclan.ac.uk nikita.comp.uclan.ac.uk ridley.comp.uclan.ac.uk sevrin.comp.uclan.ac.uk solo.comp.uclan.ac.uk spock.comp.uclan.ac.uk stanley.comp.uclan.ac.uk warlock.comp.uclan.ac.uk wolfen.comp.uclan.ac.uk woo.comp.uclan.ac.uk milamber.comp.uclan.ac.uk

Figure 63. hosts.equiv files used by Ve24 Indys.

Thus, once logged into one of the Indys, a user can rlogin directly to any of the other Indys without having to enter their password again, and can execute rsh commands, etc. A staff member logged into Yoda can login into any of the Ve24 Indys too (students cannot do this). The hosts.equiv files on Yoda and Milamber are completely different, containing only references to each other as needed. Yoda's hosts.equiv file contains: localhost milamber.comp.uclan.ac.uk

Figure 64. hosts.equiv file for yoda.

Thus, Yoda trusts Milamber. However, Milamber's hosts.equiv only contains: localhost

Figure 65. hosts.equiv file for milamber.

ie. Milamber doesn't trust Yoda, the rationale being that even if Yoda's root security is compromised, logging in to Milamber as root is blocked. Hence, even if a hack attack damaged the server and Ve24 clients, I would still have at least one fully functional secure machine with which to tackle the problem upon its discovery. Users can extend the functionality of hosts.equiv by using a .rhosts file in their home directory, enabling or disabling access based on host names, group names and specific user account names. The root login only uses the /.rhosts file if one is present - /etc/hosts.equiv is ignored. NOTE: an entry for root in /.rhosts on a local system allows root users on a remote system to gain local root access. Thus, including the root name in /.rhosts is unwise. Instead, file transfers can be more securely dealt with using ftp via a guest account, or through an NFS-mounted directory. An admin should be very selective as to the entries included in root's .rhosts file. A user's .rhosts file must be owned by either the user or root. If it is owned by anyone else, or if the file permissions are such that it is writeable by someone else, then the system ignores the contents of the user's .rhosts file by default. An admin may decide it's better to bar the use of .rhosts files completely, perhaps because an external network of unknown security status is connected. The .rhosts files can be barred by adding a -l option to the rshd line in /etc/inetd.conf (use 'man rshd' for further details). Thus, the relationship between the 20 different machines which form the SGI network I run is as follows:   

All the Indys in Ve24 trust each other, as well as Yoda and Milamber. Yoda only trusts Milamber. Milamber doesn't trust any system.

With respect to choosing root passwords, I decided to use the following configuration:  

All Ve24 systems have the same root password and the same PROM password. Yoda and Milamber have their own separate passwords, distinct from all others.

This design has two deliberate consequences: 

Ordinary users have flexible access between the Indys in Ve24,



If the root account of any of the Ve24 Indys is compromised, the unauthorised user will not be able to gain access to Yoda or Milamber as root. However, the use of NFS compromises such a schema since, for example, a root user on a Ve24 Indy could easily alter any files in /home, /var/mail, /usr/share and /mapleson.

With respect to the use of identical root and PROM passwords on the Ve24 machines: because Internet access (via a proxy server) has recently been setup for users, I will probably change the schema in order to hinder brute force attacks.

The /etc/passwd File and NIS. The NIS service enables users to login to a client by including the following entry as the last line in the client's /etc/passwd file: +::0:0:::

Figure 66. Additional line in /etc/passwd enabling NIS.

For simplicity, a + on its own can be used. I prefer to use the longer version so that if I want to make changes, the fields to change are immediately visible. If a user logs on with an account ID which is not listed in the /etc/passwd file as a local account, then such an entry at the end of the file instructs the system to try and get the account information from the NIS server, ie. Yoda. Since Yoda and Milamber do not include this extra line in /etc/passwd, students cannot login to them with their own ID anyway, no matter the contents of .rhosts and hosts.equiv.

inetd and inetd.conf inetd is the 'Internet Super-server'. inetd listens for requests for network services, executing the appropriate program for each request. inetd is started on bootup by the /etc/init.d/network script (called by the /etc/rc2.d/S30network link via the init process). It reads its configuration information from /etc/inetd.conf. By using a super-daemon in this way, a single daemon is able to invoke other daemons when necessary, reducing system load and using resources such as memory more efficiently. The /etc/inetd.conf file controls how various network services are configured, eg. logging options, debugging modes, service restrictions, the use of the bootp protocol for remote OS installation, etc. An admin can control services and logging behaviour by customising this file. A reference page is available with complete information ('man inetd').

Services communicate using 'port' numbers, rather like separate channels on a CB radio. Blocking the use of certain port numbers is a simple way of preventing a particular service from being used. Network/Internet services and their associated port numbers are contained in the /etc/services database. An admin can use the 'fuser' command to identify which processes are currently using a particular port, eg. to see the current use of TCP port 25: fuser 25/tcp

On Yoda, an output similar to the following would be given: yoda # fuser 25/tcp 25/tcp: 855o yoda # ps -ef | grep 855 | grep -v grep root 855 1 0 Apr 27 ? 5:01 /usr/lib/sendmail -bd -q15m

Figure 67. Typical output from fuser.

Insert (a quick example of typical information hunting): an admin wants to do the same on the ftp port, but can't remember the port number. Solution: use grep to find the port number from /etc/services: yoda 25# grep ftp /etc/services ftp-data 20/tcp ftp 21/tcp tftp 69/udp sftp 115/tcp yoda 26# fuser 21/tcp 21/tcp: 255o yoda 28# ps -ef | grep 255 | grep -v grep root 255 1 0 Apr 27 ? 0:04 /usr/etc/inetd senslm 857 255 0 Apr 27 ? 11:44 fam root 11582 255 1 09:49:57 pts/1 0:01 rlogind

An important aspect of the inetd.conf file is the user name field which determines which user ID each process runs under. Changing this field to a less privileged ID (eg. nobody) enables system service processes to be given lower access permissions than root, which may be useful for further enhancing security. Notice that services such as http (the WWW) are normally already set to run as nobody. Proxy servers should also run as nobody, otherwise http requests may be able to retrieve files such as /etc/passwd (however, some systems may have the nobody user defined so that it cannot run programs, so another user may have to be used - an admin can make one up). Another common modification made to inetd.conf in order to improve security is to restrict the use of the finger command, eg. with -S to prevent login status, home directory and shell information from being given out. Or more commonly the -f option is used which forces any finger request to just return the contents of a file, eg. yoda's entry for the finger service looks like this: finger stream tcp nowait guest /usr/etc/fingerd fingerd -f /etc/fingerd.message

Figure 68. Blocking the use of finger in the /etc/inetd.conf file.

Thus, any remote user who executes a finger request to yoda is given a brief message [3]. If changes are made to the inetd.conf file, then inetd must be notified of the changes, either by rebooting the system or via the following command (which doesn't require a reboot afterwards): killall -HUP inetd

Figure 69. Instructing inetd to restart itself (using killall).

In general, a local trusted network is less likely to require a highly restricted set of services, ie. modifying inetd.conf becomes more important when connecting to external networks, especially the Internet. Thus, an admin should be aware that creating a very secure inetd.conf file on an isolated network or Intranet may be unduly harsh on ordinary users.

X11 Windows Network Access The X Windows system is a window system available for a wide variety of different computer platforms which use bitmap displays [8]. Its development is managed by the X Consortium, Inc. On SGI IRIX systems, the X Windows server daemon is called 'Xsgi' and conforms to Release 6 of the X11 standard (X11R6). The X server, Xsgi, manages the flow of user/application input and output requests to/from client programs using a number of interprocess communication links. The xdm daemon acts as the display manager. Usually, user programs are running on the same host as the X server, but X Windows also supports the display of client programs which are actually running on remote hosts, even systems using completely different OSs and hardware platforms, ie. X is networktransparent. The X man page says: "X supports overlapping hierarchical subwindows and text and graphics operations, on both monochrome and color displays."

One unique side effect of this is that access to application mouse menus is independent of application focus, requiring only a single mouse click for such actions. For example, suppose two application windows are visible on screen:  

a jot editor session containing an unsaved file (eg. /etc/passwd.nis), a shell window which is partially obscuring the jot window.

With the shell window selected, the admin is about to run /var/yp/ypmake to reparse the password database file, but realises the file isn't saved. Moving the mouse over the partially hidden jot window, the admin holds down the right mouse button: this brings up jot's right-button menu (which may or may not be partly ontop of the shell window even though the jot window is at the back) from which the

admin clicks on 'Save'; the menu disappears, the file is saved, but the shell window is still on top of the jot window, ie. their relative front/back positions haven't changed during the operation.

The ability of X to process screen events independently of which application window is currently in focus is a surprisingly useful time-saving feature. Every time a user does an action like this, at least one extraneous mouse click is prevented; this can be shown by comparing to MS Windows interfaces: 



Under Win95 and Win98, trying to access an application's right-button menu when the application's window is currently not in focus requires at least two extraneous mouse clicks: the first click brings the application in focus (ie. to the front), the second brings up the menu, and a third (perhaps more if the original application window is now completely hidden) brings the original application window back to the front and in focus. Thus, X is at least 66% more efficient for carrying out this action compared to Win95/Win98. Under WindowsNT, attempting the same action requires at least one extraneous mouse click: the first click brings the application in focus and reveals the menu, and a second (perhaps more, etc.) brings the original application window back to the front and in focus. Thus, X is at least 50% more efficient for carrying out this action compared to NT.

The same effect can be seen when accessing middle-mouse menus or actions under X, eg. text can be highlighted and pasted to an application with the middle-mouse button even when that application is not in focus and not at the front. This is a classic example of how much more advanced X is over Microsoft's GUI interface technologies, even though X is now quite old. X also works in a way which links to graphics libraries such as OpenGL.

Note that most UNIX-based hardware platforms use video frame buffer configurations which allow a large number of windows to be present without causing colour map swapping or other side effects, ie. the ability to have multiple overlapping windows is a feature supported in hardware, eg. Indigo2 [6]. X is a widely used system, with emulators available for systems which don't normally use X, eg. Windows Exceed for PCs. Under the X Window System, users can run programs transparently on remote hosts that are part of the local network, and can even run applications on remote hosts across the Internet with the windows displayed locally if all the various necessary access permissions have been correctly set at both ends. An 'X Display Variable' is used to denote which host the application should attempt to display its windows on. Thus, assuming a connection with a remote host to which one had authorised telnet access (eg. haarlem.vuurwerk.nl), from a local host whose domain name is properly visible on the Internet (eg. thunder.uclan.ac.uk), then the local display of applications running on the remote host is enabled with a command such as: haarlem% set DISPLAY = thunder.uclan.ac.uk:0.0

I've successfully used this method while at Heriot Watt to run an xedit editor on a remote system in England but with the xedit window itself displayed on the monitor attached to the system I was physically using in Scotland. The kind of inter-system access made possible by X has nothing to do with login accouns, passwords, etc. and is instead controlled via the X protocols. The 'X' man page has full details, but note: the man page for X is quite large. A user can utilise the xhost command to control access to their X display. eg. 'xhost -' bars access from all users, while 'xhost +harry' gives X access to the user harry. Note that system-level commands and files which relate to xhost and X in general are stored in /var/X11/xdm.

Firewalls [4]. A firewall is a means by which a local network of trusted hosts can be connected to an external untrusted network, such as the Internet, in a more secure manner than would otherwise be the case. 'Firewall' is a conceptual idea which refers to a combination of hardware and software steps taken to setup a desired level of security; although an admin can setup a firewall via basic steps with as-supplied tools, all modern systems have commercial packages available to aid in the task of setting up a firewall environment, eg. Gauntlet for IRIX systems. As with other security measures, there is a tradeoff between ease of monitoring/administration, the degree of security required, and the wishes/needs of users. A drawback of firewalls is when a user has a legitimate need to access packets which are filtered out - an alternative is to have each host on the local network configured according to a strict security regime. The simplest form of a firewall is a host with more than one network interface, called a dualhomed host [9]. Such hosts effectively exist on two networks at once. By configuring such a host in an appropriate manner, it acts as a controllable obstruction between the local and external network, eg. the Internet.

A firewall does not affect the communications between hosts on an internal network; only the way in which the internal network interacts with the external connection is affected. Also, the presence of a firewall should not be used as an excuse for having less restrictive security measures on the internal network. One might at first think that Yoda could be described as a firewall, but it is not, for a variety of reasons. Ideally, a firewall host should be treated thus:  

   

no ordinary user accounts (root admin only, with a different password), as few services as possible (the more services are permitted, the greater is the chance of a security hole; newer, less-tested software is more likely to be at risk) and definitely no NIS or NFS, constantly monitored for access attempts and unusual changes in files, directories and software (commands: w, ps, 'versions changed', etc.), log files regularly checked (and not stored on the firewall host!), no unnecessary applications, no anonymous ftp!

Yoda breaks several of these guidelines, so it cannot be regarded as a firewall, even though a range of significant security measures are in place. Ideally, an extra host should be used, eg. an Indy (additional Ethernet card required to provide the second Ethernet port), or a further server such as Challenge S. A simple system like Indy is sufficient though, or other UNIX system such as an HP, Sun, Dec, etc. - a Linux PC should not be used though since Linux has too many security holes in its present form. [1]

Services can be restricted by making changes to files such as /etc/inetd.conf, /etc/services, and others. Monitoring can be aided via the use of free security-related packages such as COPS - this package can also check for bad file permission settings, poorly chosen passwords, system setup file integrity, root security settings, and many other things. COPS can be downloaded from:

ftp://ftp.cert.org/pub/tools/cops

Monitoring a firewall host is also a prime candidate for using scripts to automate the monitoring process. Other free tools include Tripwire, a file and directory integrity checker: ftp://ftp.cert.org/pub/tools/tripwire

With Tripwire, files are monitored and compared to information stored in a database. If files change when they're supposed to remain static according to the database, the differences are logged and flagged for attention. If used regularly, eg. via cron, action can be taken immediately if something happens such as a hacking attempt.

Firewall environments often include a router - a high speed packet filtering machine installed either privately or by the ISP providing the external connection. Usually, a router is installed inbetween a dual-homed host and the outside world [9]. This is how yoda is connected, via a router whose address is 193.61.250.33, then through a second router at 193.61.250.65 before finally reaching the JANET gateway at Manchester.

Routers are not very flexible (eg. no support for application-level access restriction systems such as proxy servers), but their packet-filtering abilities do provide a degree of security, eg. the router at 193.61.250.33 only accepts packets on the 193.61.250.* address space. However, because routers can block packet types, ports, etc. it is possible to be overly restrictive with their use, eg. yoda cannot receive USENET packets because they're blocked by the router. In such a scenario, users must resort to using WWW-based news services (eg. DejaNews) which

are obviously less secure than running and managing a locally controlled USENET server, as well as being more wasteful of network resources. Accessing sites on the web poses similar security problems to downloading and using Internetsourced software, ie. the source is untrusted, unless vendor-verified with checksums, etc. When a user accesses a site and attempts to retrieves data, what happens next cannot be predicted, eg. a malicious executable program could be downloaded (this is unlikely to damage root-owned files, but users could lose data if they're not careful). Users should be educated on these issues, eg. turning off Java script features and disallowing cookies if necessary. If web access is of particular concern with regard to security, one solution is to restrict web access to just a limited number of internal hosts.

Anonymous ftp. An anonymous FTP account allows a site to make information available to anyone, while still maintaining control over access issues. Users can login to an anonymous FTP account as 'anonymous' or 'ftp'. The 'chroot' command is used to put the user in the home directory for anonymous ftp access (~ftp), preventing access to other parts of the filesystem. A firewall host should definitely not have an anonymous FTP account. A site should not provide such a service unless absolutely necessary, but if it does then an understanding of how the anonymous FTP access system works is essential to ensuring site security, eg. preventing outside agents from using the site as a transfer point for pirated software. How an anon FTP account is used should be regularly monitored. Details of how to setup an anon FTP account can usually be found in a vendor's online information; for IRIX, the relevant source is the section entitled, "Setting Up an Anonymous FTP Account" in chapter three of the, "IRIX Admin: Networking and Mail" guide.

UNIX Fundamentals: Internet access: files and services. Email. For most users, the Internet means the World Wide Web ('http' service), but this is just one service out of many, and was in fact a very late addition to the Internet as a whole. Before the advent of the web, Internet users were familiar with and used a wide range of services, including:

       

telnet (interactive login sessions on remote hosts), ftp

(file/data transfer using continuous connections),

tftp

(file/data transfer using temporary connections)

NNTP

(Internet newsgroups, ie. USENET)

         

SMTP

(email)

gopher (remote host data searching and retrieval system) archie (another data-retrieval system) finger (probe remote site for user/account information) DNS

(Domain Name Service)

Exactly which services users can use is a decision best made by consultation, though some users may have a genuine need for particular services, eg. many public database systems on sites such as NASA are accessed by telnet only. Disallowing a service automatically improves security, but the main drawback will always be a less flexible system from a user's point of view, ie. a balance must be struck between the need for security and the needs of users. However, such discussions may be irrelevant if existing site policies already state what is permitted, eg. UCLAN's campus network has no USENET service, so users exploit suitable external services such as DejaNews [2]. For the majority of admins, the most important Internet service which should be appropriately configured with respect to security is the web, especially considering today's prevalence of Java, Java Script, and browser cookie files. It is all too easy for a modern web user to give out a surprising amount of information about the system they're using without ever knowing it. Features such as cookies and Java allow a browser to send a substantial amount of information to a remote host about the user's environment (machine type, OS, browser type and version, etc.); there are sites on the web which an admin can use to test how secure a user's browser environment is - the site will display as much information as it can extract using all methods, so if such sites can only report very little or nothing in return, then that is a sign of good security with respect to user-side web issues. There are many good web server software systems available, eg. Apache. Some even come free, or are designed for local Intranet use on each host. However, for enhanced security, a site should use a professional suite of web server software such as Netscape Enterprise Server; these packages come with more advanced control mechanisms and security management features, the configuration of which is controlled by GUI-based front-end servers, eg. Netscape Administration Server. Similarly, lightweight proxy servers are available, but a site should a professional solution, eg. Netscape Proxy Server. The GUI administration of web server software makes it much easier for an admin to configure security issues such as access and service restrictions, permitted data types, blocked sites, logging settings, etc. Example: after the proxy server on the SGI network was installed, I noticed that users of the campus-wide PC network were using Yoda as a proxy server, which would give them a faster service than the University's proxy server. A proxy server which is accessible in this way is said to be 'open'. Since all accesses from the campus PCs appear in the web logs as if they originate from the Novix security system (ie. there is no indication of individual workstation or user), any

illegal activity would be untraceable. Thus, I decided to prevent campus PCs from using Yoda as a proxy. The mechanism employed to achieve this was the ipfilterd program, which I had heard of before but not used. ipfilterd is a network packet-filtering daemon which screens all incoming IP packets based on source/destination IP address, physical network interface, IP protocol number, source/destination TCP/UDP port number, required service type (eg. ftp, telnet, etc.) or a combination of these. Up to 1000 filters can be used. To improve efficiency, a configurable memory caching mechanism is used to retain recently decided filter verdicts for a specified duration. ipfilterd operates by using a searchable database of packet-filtering clauses stored in the /etc/ipfilterd.conf file. Each incoming packet is compared with the filters in the file one at a time until a match is found; if no match occurs, the packet is rejected by default. Since filtering is a line-by-line database search process, the order in which filters are listed is important, eg. a reject clause to exclude a particular source IP address from Ethernet port ec0 would have no effect if an accept clause was earlier in the file that accepted all IP data from ec0, ie. in this case, the reject should be listed before the accept. IP addresses may be specified in hex, dot format (eg. 193.61.255.4 - see the man page for 'inet'), host name or fully-qualified host name. With IRIX 6.2, ipfilterd is not installed by default. After consulting with SGI to identify the appropriate source CD, the software was installed, /etc/ipfilterd.conf defined, and the system activated with: chkconfig -f ipfilterd on reboot

Since there was no ipfilterd on/off flag file in /etc/config by default, the -f forces the creation of such a file with the given state. Filters in the /etc/ipfilterd.conf file consist of a keyword and an expression denoting the type of filter to be used; available keywords are:

       

accept

Accept all packets matching this filter

reject

Discard all packets matching this filter (silently)

grab

Grab all packets matching this filter

define

Define a new macro

ipfilterd supports macros, with no limit to the number of macros used. Yoda's /etc/ipfilterd.conf file looks like this:

# # ipfilterd.conf # $Revision: 1.3 $ # # Configuration file for ipfilterd(1M) IP layer packet filtering. # Lines that begin with # are comments and are ignored. # Lines begin with a keyword, followed either by a macro definition or # by an optional interface filter, which may be followed by a protocol filter. # Both macros and filters use SGI's netsnoop(1M) filter syntax. # # The currently supported keywords are: # accept : accept all packets matching this filter # reject : silently discard packets matching this filter # define : define a new macro to add to the standard netsnoop macros # # See the ipfilterd(1M) man page for examples of filters and macros. # # The network administrator may find the following macros useful: # define ip.netAsrc (src&0xff000000)=$1 define ip.netAdst (dst&0xff000000)=$1 define ip.netBsrc (src&0xffff0000)=$1 define ip.netBdst (dst&0xffff0000)=$1 define ip.netCsrc (src&0xffffff00)=$1 define ip.netCdst (dst&0xffffff00)=$1 define ip.notnetAsrc not((src&0xff000000)=$1) define ip.notnetAdst not((dst&0xff000000)=$1) define ip.notnetBsrc not((src&0xffff0000)=$1) define ip.notnetBdst not((dst&0xffff0000)=$1) define ip.notnetCsrc not((src&0xffffff00)=$1) define ip.notnetCdst not((dst&0xffffff00)=$1) # # Additional macros: # # Filters follow: # accept -i ec0 reject -i ec3 ip.src 193.61.255.21 ip.dst 193.61.250.34 reject -i ec3 ip.src 193.61.255.22 ip.dst 193.61.250.34 accept -i ec3

Any packet coming from an SGI network machine is immediately accepted (traffic on the ec0 network interface). The web logs contained two different source IP addresses for accesses coming from the campus PC network. These are rejected first if detected; a final accept clause is then included so that all other types of packet are accepted. The current contents of Yoda's ipfilterd.conf file does mean that campus PC users will not be able to access Yoda as a web server either, ie. requests to www.comp.uclan.ac.uk by legitimate users will be blocked too. Thus, the above contents of the file are experimental. Further refinement is required so that accesses to Yoda's web pages are accepted, while requests which try to use Yoda as a proxy to access non-UCLAN sites are rejected. This can be done by using the ipfilterd-expression equivalent of the following if/then C-style statement:

if ((source IP is campus PC) and (destination IP is not Yoda)) then reject packet;

Using ipfilterd has system resource implications. Filter verdicts stored in the ipfilterd cache by the kernel take up memory; if the cache size is increased, more memory is used. A longer cache and/or a larger number of filters means a greater processing overhead before each packet is dealt with. Thus, for busy networks, a faster processor may be required to handle the extra load, and perhaps more RAM if an admin increases the ipfilterd kernel cache size. In order to monitor such issues and make decisions about resource implications as a result of using ipfilterd, the daemon can be executed with the -d option which causes extra logging information about each filter to be added to /var/adm/SYSLOG, ie. an /etc/config/ipfilterd.options file should be created, containing '-d'. As well as using programs like 'top' and 'ps' to monitor CPU loading and memory usage, log files should be monitored to ensure they do not become too large, wasting disk space (the same applies to any kind of log file). System logs are 'rotated' automatically to prevent this from happening, but other logs created by 3rd-party software usually are not; such log files are not normally stored in /var/adm either. For example, the proxy server logs are in this directory: /var/netscape/suitespot/proxy-sysname-proxy/logs

If an admin wishes to retain the contents of older system logs such as /var/adm/oSYSLOG, then the log file could be copied to a safe location at regular intervals, eg. once per night (the old log file could then be emptied to save space). A wise policy would be to create scripts which process the logs, summarising the data in a more intuitive form. General shell script methods and programs such as grep can be used for this.

The above is just one example of the typical type of problem and its consequences that admins come up against when managing a system: 

The first problem was how to give SGI network users Internet access, the solution to which was a proxy server. Unfortunately, this allowed campus-PC users to exploit Yoda as an open proxy, so ipfilterd was then employed to prevent such unauthorised use.

Thus, as stated in the introduction, managing system security is an ongoing, dynamic process.

Another example problem: in 1998, I noticed that some students were not using the SGIs (or not asking if they could) because they thought the machines were turned off, ie. the monitor powersaving feature would blank out the screen after some duration. I decided to alter the way the Ve24 Indys behaved so that monitor power-saving would be deactivated during the day, but would still happen overnight. The solution I found was to modify the /var/X11/xdm/Xlogin file. This file contains a section controlling monitor power-saving using the xset command, which normally looks like this:

#if [ -x /usr/bin/X11/xset ] ; then # /usr/bin/X11/xset s 600 3600 #fi

If these lines are uncommented (the hash symbols removed), a system whose monitor supports power-saving will tell the monitor to power down after ten minutes of unuse, after the last user logs out. With the lines still commented out, modern SGI monitors use power-saving by default anyway. I created two new files in /var/X11/xdm: -rwxr-xr-x -rwxr-xr-x

1 root 1 root

sys sys

1358 Oct 28 1361 Oct 28

1998 Xlogin.powersaveoff* 1998 Xlogin.powersaveon*

They are identical except for the the section concerning power-saving. Xlogin.powersaveoff contains: if [ -x /usr/bin/X11/xset ] ; then /usr/bin/X11/xset s 0 0 fi

while Xlogin.powersaveon contains: #if [ -x /usr/bin/X11/xset ] ; then # /usr/bin/X11/xset s 0 0 #fi

The two '0' parameters supplied to xset in the Xlogin.powersaveoff file have a special effect (see the xset man page for full details): the monitor is instructed to disable all power-saving features. The cron system is used to switch between the two files when no one is present: every night at 9pm and every morning at 8am, followed by a reboot after the copy operation is complete. The entries from the file /var/spool/cron/crontabs/cron on any of the Ve24 Indys are thus: # Alternate monitor power-saving. Turn it on at 9pm. Turn it off at 8am. 0 21 * * * /bin/cp /var/X11/xdm/Xlogin.powersaveon /var/X11/xdm/Xlogin && init 6& # 0 8 * * * /bin/cp /var/X11/xdm/Xlogin.powersaveoff /var/X11/xdm/Xlogin && init 6&

Hence, during the day, the SGI monitors are always on with the login logo/prompt visible students can see the Indys are active and available for use; during the night, the monitors turn themselves off due to the new xset settings. The times at which the Xlogin changes are made were chosen so as to occur when other cron jobs would not be running. Students use the Indys each day without ever noticing the change, unless they happen to be around at the right time to see the peculiar sight of 18 Indys all rebooting at once.

Static Routes. A simple way to enable packets from clients to be forwarded through an external connection is via the use of a 'static route'. A file called /etc/init.d/network.local is created with a simple script that adds a routing definition to the current routing database, thus enabling packets to be forwarded to their destination. To ensure the script is executed on bootup or shutdown, extra links are added to the /etc/rc0.d and /etc/rc2.d directories (the following commands need only be executed once as root): ln -s /etc/init.d/network.local /etc/rc0.d/K39network ln -s /etc/init.d/network.local /etc/rc2.d/S31network

Yoda once had a modem link to 'Demon Internet' for Internet access. A static route was used to allow SGI network clients to access the Internet via the link. The contents of /etc/init.d/network.local (supplied by SGI) was: #!/sbin/sh #Tag 0x00000f00 IS_ON=/sbin/chkconfig case "$1" in 'start') if $IS_ON network; then /usr/etc/route add default 193.61.252.1 1 fi ;; 'stop') /usr/etc/route delete default 193.61.252.1 ;; *) echo "usage: $0 {start|stop}" ;; esac

Note the use of chkconfig to ensure that a static route is only installed on bootup if the network is defined as active. The other main files for controlling Internet access are /etc/services and /etc/inetd.conf. These were discussed earlier.

Internet Access Policy. Those sites which choose to allow Internet access will probably want to minimise the degree to which someone outside the site can access internal services. For example, users may be able to telnet to remote hosts from a company workstation, but should the user be able to successfully telnet to that workstation from home in order to continue working? Such an ability would obviously be very useful to users, and indeed administrators, but there are security implications which may be prohibitive.

For example, students who have accounts on the SGI network cannot login to Yoda because the /etc/passwd file contains /dev/null as their default shell, ie. they can't login because their account 'presence' on Yoda itself does not have a valid shell - another cunning use of /dev/null. The /etc/passwd.nis file has the main user account database, so users can logon to the machines in Ve24 as desired. Thus, with the use of /dev/null in the password file's shell field, students cannot login to Yoda via telnet from outside UCLAN. Staff accounts on the SGI network do not have /dev/null in the shell field, so staff can indeed login to Yoda via telnet from a remote host. Ideally, I'd like students to be able to telnet to a Ve24 machine from a remote host, but this is not yet possible for reasons explained in Appendix A (detailed notes for Day 2 Part 1). There are a number of Internet sites which are useful sources of information on Internet issues, some relating to specific areas such as newsgroups. In fact, USENET is an excellent source of information and advice on dealing with system management, partly because of preprepared FAQ files, but also because of the many experts who read and post to the newsgroups. Even if site policy means users can't access USENET, an admin should exploit the service to obtain relevant admin information. A list of some useful reference sites are given in Appendix C.

Example Questions: 1. The positions of the 'accept ec0' and 'reject' lines in /etc/ipfilterd.conf could be swapped around without affecting the filtering logic. So why is the ec0 line listed first? The 'netstat -i' command (executed on Yoda) may be useful here. 2. What would an appropriate ipfilterd.conf filter (or filters) look like which blocked unauthorised use of Yoda as a proxy to connect to an external site but still allowed access to Yoda's own web pages via www.comp.uclan.ac.uk? Hint: the netsnoop command may be useful.

Course summary.

This course has focused on what an admin needs to know in order to run a UNIX system. SGI systems running IRIX 6.2 have been used as an example UNIX platform, with occasional mention of IRIX 6.5 as an example of how OSs evolve. Admins are, of course, ordinary users too, though they often do not use the same set of applications that other users do. Though an admin needs to know things an ordinary user does not, occasionally users should be made aware of certain issues, eg. web browser cookie files, choosing appropriate passwords etc. Like any modern OS, UNIX has a vast range of features and services. This course has not by any means covered them all (that would be impossible to do in just three days, or even thirty). Instead, the basic things a typical admin needs to know have been introduced, especially the

techniques used to find information when needed, and how to exploit the useful features of UNIX for daily administration. Whatever flavour of UNIX an admin has to manage, a great many issues are always the same, eg. security, Internet concepts, etc. Thus, an admin should consider purchasing relevant reference books to aid in the learning process. When writing shell scripts, knowledge of the C programming language is useful; since UNIX is the OS being used, a C programming book (mentioned earlier) which any admin will find particularly useful is: "C Programming in a UNIX Environment" Judy Kay & Bob Kummerfeld, Addison Wesley Publishing, 1989. ISBN: 0 201 12912 4

For further information on UNIX or related issues, read/post to relevant newsgroups using DejaNews; example newsgroups are given in Appendix D.

Background Notes: 1. UNIX OSs like IRIX can be purchased in a form that passes the US Department of Defence's Trusted-B1 security regulations (eg. 'Trusted IRIX'), whereas Linux doesn't come anywhere near such rigorous security standards as yet. The only UNIX OS (and in fact the only OS of any kind) which passes all of the US DoD's toughest security regulations is Unicos, made by Cray Research (a subsidiary of SGI). Unicos and IRIX will be merged sometime in the future, creating the first widely available commercial UNIX OS that is extremely secure - essential for fields such as banking, local and national government, military, police (and other emergency/crime services), health, research, telecoms, etc.

References: 2. DejaNews USENET Newsgroups, Reading/Posting service: http://www.dejanews.com/

4. "Firewalls: Where there's smoke...", Network Week, Vol4, No. 12, 2nd December 1998, pp. 33 to 37. 5. Gauntlet 3.2 for IRIX Internet Firewall Software: http://www.sgi.com/solutions/internet/products/gauntlet/

6. Framebuffer and Clipping Planes, Indigo2 Technical Report, SGI, 1994:

http://www.futuretech.vuurwerk.nl/i2sec4.html#4.3 http://www.futuretech.vuurwerk.nl/i2sec5.html#5.6.3

7. Useful security-related web sites: UKERNA: JANET: CERT: RootShell: 2600:

http://www.ukerna.ac.uk/ http://www.ja.net/ http://www.cert.org/ http://www.rootshell.com/ http://www.2600.com/mindex.html

8. "About the X Window System", part of X11.org: http://www.X11.org/wm/index.shtml

9. Images are from the online book, "IRIX Admin: Backup, Security, and Accounting.", Chapter 5.

Appendix B: 3. Contents of /etc/fingerd.message: Sorry, the finger service is not available from this host. However, thankyou for your interest in the Department of Computing at the University of Central Lancashire. For more information, please see: http://www.uclan.ac.uk/ http://www.uclan.ac.uk/facs/destech/compute/comphom.htm Or contact Ian Mapleson at [email protected] Regards, Ian. Senior Technician, Department of Computing, University of Central Lancashire, Preston, England, PR1 2HE. [email protected] Tel: (+44 -0) 1772 893297 Fax: (+44 -0) 1772 892913 Doom Help Service (DHS):

http://doomgate.gamers.org/dhs/

SGI/Future Technology/N64: http://sgi.webguide.nl/ BSc Dissertation (Doom): http://doomgate.gamers.org/dhs/diss/

Appendix C: Example web sites useful to administrators: AltaVista: http://altavista.digital.com/cgibin/query?pg=aq Webcrawler: http://webcrawler.com/ Lycos: http://www.lycos.com/ Yahoo: http://www.yahoo.com/ DejaNews: http://www.dejanews.com/ SGI Support: http://www.sgi.com/support/ SGI Tech/Advice Center: http://www.futuretech.vuurwerk.nl/sgi.html X Windows: http://www.x11.org/ Linux Home Page: http://www.linux.org/ UNIXHelp for Users: http://unixhelp.ed.ac.uk/ Hacker Security Update: http://www.securityupdate.com/ UnixVsNT: http://www.unix-vs-nt.org/ RootShell: http://www.rootshell.com/ UNIX System Admin (SunOS): http://sunos-wks.acs.ohiostate.edu/sysadm_course/html/sysadm-1.html

Appendix D: Example newsgroups useful to administrators: comp.security.unix comp.unix.admin comp.sys.sgi.admin comp.unix.admin comp.sys.sun.admin comp.sys.next.sysadmin comp.unix.aix comp.unix.cray comp.unix.misc comp.unix.questions comp.unix.shell comp.unix.solaris comp.unix.ultrix comp.unix.wizards comp.unix.xenix.misc comp.sources.unix comp.unix.bsd.misc comp.unix.sco.misc comp.unix.unixware.misc comp.sys.hp.hpux comp.unix.sys5.misc comp.infosystems.www.misc

Detailed Notes for Day 3 (Part 5) Project: Indy/Indy attack/defense (IRIX 5.3 vs. IRIX 6.5) The aim of this practical session, which lasts two hours, is to give some experience of how an admin typically uses a UNIX system to investigate a problem, locate information, construct and finally implement a solution. The example problem used will likely require:    

the use of online information (man pages, online books, release notes, etc.), writing scripts and exploiting shell script methods as desired, the use of a wide variety of UNIX commands, identifying and exploiting important files/directories,

and so on. A time limit on the task is included to provide some pressure, which often happens in real-world situations.

The problem situation is a simulated hacker attack/defense. Two SGI Indys are directly connected together with an Ethernet cable; one Indy, referred to here as Indy X, is using an older version of IRIX called IRIX 5.3 (1995), while the other (Indy Y) is using a much newer version, namely IRIX 6.5 (1998). Students will be split into two groups (A and B) of 3 or 4 persons each. For the first hour, group A is placed with Indy X, while group B is with Indy Y. For the second hour, the situation is reversed. Essentially, each group must try to hack the other group's system, locate and steal some key information (described below), and finally cripple the enemy machine. However, since both groups are doing this, each group must also defend against attack. Whether a group focuses on attack or defense, or a mixture of both, is for the group's members to decide during the preparatory stage. The first hour is is dealt with as follows: 



For the first 35 minutes, each group uses the online information and any available notes to form a plan of action. During this time, the Ethernet cable between the Indys X and Y is not connected, and separate 'Research' Indys are used for this investigative stage in order to prevent any kind of preparatory measures. Printers will be available if printouts are desired. After a short break of 5 minutes to prepare/test the connection between the two Indys and move the groups to Indys X and Y, the action begins. Each group must try to hack into the other group's Indy, exploiting any suspected weaknesses, whilst also defending against the other group's attack. In addition, the hidden data must be found, retrieved, and the enemy copy erased. The end goal is to shutdown the enemy system after retrieving the hidden data. How the shutdown is effected is entirely up to the group members.

At the end of the hour, the groups are reversed so that group B will now use an Indy running IRIX 5.3, while group A will use an Indy running IRIX 6.5. The purpose of this second attempt is to demonstrate how an OS evolves and changes over time with respect to security and OS features, especially in terms of default settings, online help, etc.

Indy Specifications. Both systems will have default installations of the respective OS version, with only minor changes to files so that they are aware of each other's existence (/etc/hosts, and so on). All systems will have identical hardware (133MHz R4600PC CPU, 64MB RAM, etc.) except for disk space: Indys with IRIX 6.5 will use 2GB disks, while Indys with IRIX 5.3 will use 549MB disks. Neither system will have any patches installed from any vendor CD updates. The hidden data which must be located and stolen from the enemy machine by each group is the Blender V1.57 animation and rendering archive file for IRIX 6.2: blender1.57_SGI_6.2_iris.tar.gz Size: 1228770 bytes.

For a particular Indy, the file will be placed in an appropriate directory in the file system, the precise location of which will only be made known to the group using that Indy - how an attacking group locates the file is up to the attackers to decide.

It is expected that groups will complete the task ahead of schedule; any spare time will be used for a discussion of relevant issues:    

Reliability of relying on default settings for security, etc. How to detect hacking in progress, especially if an unauthorised person is carrying out actions as root. Whose responsibility is it to ensure security? The admin or the user? If a hacker is 'caught', what kind of evidence would be required to secure a conviction? How reliable is the evidence?

END OF COURSE.

Figure Index for Detailed Notes. Day 1: Figure 1. A typical root directory shown by 'ls'. Figure 2. The root directory shown by 'ls -F /'. Figure 3. Important directories visible in the root directory. Figure 4. Key files for the novice administrator. Figure 5. Output from 'man -f file'. Figure 6. Hidden files shown with 'ls -a /'. Figure 7. Manipulating an NFS-mounted file system with 'mount'. Figure 8. The various available shells. Figure 9. The commands used most often by any user. Figure 10. Editor commands. Figure 11. The next most commonly used commands. Figure 12. File system manipulation commands. Figure 13. System Information and Process Management Commands. Figure 14. Software Management Commands. Figure 15. Application Development Commands. Figure 16. Online Information Commands (all available from the 'Toolchest') Figure 17. Remote Access Commands. Figure 18. Using chown to change both user ID and group ID. Figure 19. Handing over file ownership using chown.

Day 2: Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46.

IP Address Classes: bit field and width allocations. IP Address Classes: supported network types and sizes. The contents of the /etc/hosts file used on the SGI network. Yoda's /etc/named.boot file. The example named.boot file in /var/named/Examples. A typical find command. Using cat to quickly create a simple shell script. Using echo to create a simple one-line shell script. An echo sequence without quote marks. The command fails due to * being treated as a Using a backslash to avoid confusing the shell. Using find with the -exec option to execute rm. Using find with the -exec option to execute ls. Redirecting the output from find to a file. A simple script with two lines. The simple rebootlab script. The simple remountmapleson script. The daily tasks of an admin. Using df without options. The -k option with df to show data in K. Using df to report usage for the file Using du to report usage for several directories/files. Restricting du to a single directory. Forcing du to ignore symbolic links. Typical output from the ps command. Filtering ps output with grep. top shows a continuously updated output.

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

47. 48. 49. 50. 51. 51. 52. 53. 54. 55. 56. 57. 58. 59.

The IRIX 6.5 version of top, giving extra information. System information from osview. CPU information from osview. Memory information from osview. Network information from osview. Miscellaneous information from osview. Results from ttcp between two hosts on a 10Mbit network. The output from netstat. Example use of the ping command. The output from rup. The output from uptime. The output from w showing current user activity. Obtaining full domain addresses from w with the -W option. The output from rusers, showing who is logged on where.

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

60. 61. 62. 63. 64. 65. 66. 67. 68. 69.

Standard UNIX security features. Aspects of a system relevant to security. Files relevant to network behaviour. hosts.equiv files used by Ve24 Indys. hosts.equiv file for yoda. hosts.equiv file for milamber. Additional line in /etc/passwd enabling NIS. Typical output from fuser. Blocking the use of finger in the /etc/inetd.conf file. Instructing inetd to restart itself (using killall).

Day 3:

UNIX Administration Course Day 1: Part 1:

Introduction to the course. Introduction to UNIX. History of UNIX and key features. Comparison with other OSs.

Part 2:

The basics: files, UNIX shells, editors, commands. Regular Expressions and Metacharacters in Shells.

Part 3:

File ownership and access permissions. Online help (man pages, etc.)

Day 2: Part 1:

System identity (system name, IP address, etc.) Software: vendor, commercial, shareware, freeware (eg. GNU). Hardware features: auto-detection, etc. UNIX Characteristics: integration, stability, reliability, security, scalability, performance.

Part 2:

Shell scripts.

Part 3:

System monitoring tools and tasks.

Part 4:

Further shell scripts. Application development tools: compilers, debuggers, GUI toolkits, high-level APIs.

Day 3: Part 1:

Installing an OS and software: inst, swmgr. OS updates, patches, management issues.

Part 2:

Organising a network with a server. NFS. Quotas. Installing/removing internal/external hardware. SGI OS/software/hardware installation. Network setup.

Part 3:

Daily system administration tasks, eg. data backup. System bootup and shutdown, events, daemons.

Part 4:

Security/Access control: the law, firewalls, ftp. Internet access: relevant files and services. Course summary.

Part 5:

Exploring administration issues, security, hacking, responsibility, end-user support, the law (discussion). Indy/Indy attack/defense using IRIX 5.3 vs. IRIX 6.5 (two groups of 3 or 4 each).

Figures

Day 1: Part 1:

Introduction to the course. Introduction to UNIX. History Of UNIX and key features. Comparison with other OSs.

Introduction to UNIX and the Course. The UNIX operating system (OS) is widely used around the world, eg.   

The backbone of the Internet relies on UNIX-based systems and services, as do the systems used by most Internet Service Providers (ISPs). Major aspects of everyday life are managed using UNIX-based systems, eg. banks, booking systems, company databases, medical records, etc. Other 'behind the scenes' uses concern data-intensive tasks, eg. art, design, industrial design, CAD and computer animation to real-time 3D graphics, virtual reality, visual simulation & training, data visualisation, database management, transaction processing, scientific research, military applications, computational challenges, medical modeling, entertainment and games, film/video special effects, live on-air broadcast effects, space exploration, etc.

As an OS, UNIX is not often talked about in the media, perhaps because there is no single large company such as Microsoft to which one can point at and say, "There's the company in charge of UNIX." Most public talk is of Microsoft, Bill gates, Intel, PCs and other more visible aspects of the computing arena, partly because of the home-based presence of PCs and the rise of the Internet in the public eye. This is ironic because OSs like MS-DOS, Win3.1, Win95 and WinNT all draw many of their basic features from UNIX, though they lack UNIX's sophistication and power, mainly because they lack so many key features and a lengthy development history. In reality, a great deal of the everyday computing world relies on UNIX-based systems running on computers from a wide variety of vendors such as Compaq (Digital Equipment Corporation, or DEC), Hewlett Packard (HP), International Business Machines (IBM), Intel, SGI (was Silicon Graphics Inc., now just 'SGI'), Siemens Nixdorf, Sun Microsystems (Sun), etc. In recent years, many companies which previously relied on DOS or Windows have begun to realise that UNIX is increasingly important to their business, mainly because of what UNIX has to offer and why, eg. portability, security, reliability, etc. As demands for handling data grow, and companies embrace new methods of manipulating data (eg. data mining and visualisation), the need for systems that can handle these problems forces companies to look at solutions that are beyond the Wintel platform in performance, scalability and power. Oil companies such as Texaco [1] and Chevron [2] are typical organisations which already use UNIX systems extensively because of their data-intensive tasks and a need for extreme reliability and scalability. As costs have come down, along with changes in the types of available UNIX system (newer low-end designs, eg. Ultra5, O2, etc.), small and medium-sized companies are

looking towards UNIX solutions to solve their problems. Even individuals now find that older 2nd-hand UNIX systems have significant advantages over modern Wintel solutions, and many companies/organisations have adopted this approach too [3].

This course serves as an introduction to UNIX, its history, features, operation, use and services, applications, typical administration tasks, and relevant related topics such as the Internet, security and the Law. SGI's version of UNIX, called IRIX, is used as an example UNIX OS. The network of SGI Indys and an SGI Challenge S server I admin is used as an example UNIX hardware platform.

The course lasts three days, each day consisting of a one hour lecture followed by a two hour practical session in the morning, and then a three hour practical session in the afternoon; the only exceptions to this are Day 1 which begins with a two hour lecture, and Day 3 which has a 1 hour afternoon lecture. Detailed notes are provided for all areas covered in the lectures and the practical sessions. With new topics introduced step-by-step, the practical sessions enable first-hand familiarity with the topics covered in the lectures. As one might expect of an OS which has a vast range of features, capabilities and uses, it is not possible to cover everything about UNIX in three days, especially the more advanced topics such as kernel tuning which most administrators rarely have to deal with. Today, modern UNIX hardware and software designs allow even very large systems with, for example, 64 processors to be fully setup at the OS level in little more than an hour [4]. Hence, the course is based on the author's experience of what a typical UNIX user and administrator (admin) has to deal with, rather than attempting to present a highly compressed 'Grand Description of Everything' which simply isn't necessary to enable an admin to perform real-world system administration on a daily basis. For example, the precise nature and function of the Sendmail email system on any flavour of UNIX is not immediately easy to understand; looking at the various files and how Sendmail works can be confusing. However, in the author's experience, due to the way UNIX is designed, even a default OS installation without any further modification is sufficient to provide users with a fully functional email service [5], a fact which shouldn't be of any great surprise since email is a built-in aspect of any UNIX OS. Thus, the presence of email as a fundamental feature of UNIX is explained, but configuring and customising Sendmail is not.

History of UNIX Key: BTL = Bell Telephone Laboratories GE = General Electric WE = Western Electric MIT = Massachusetts Institute of Technology BSD = Berkeley Standard Domain

Summary History: 1957: BTL creates the BESYS OS for internal use. 1964: BTL needs a new OS, develops Multics with GE and MIT. 1969: UNICS project started at BTL and MIT; OS written using the B language. 1970: UNICS project well under way; anonymously renamed to UNIX. 1971: UNIX book published. 60 commands listed. 1972: C language completed (a rewritten form of B). Pipe concept invented. 1973: UNIX used on 16 sites. Kernel rewritten in C. UNIX spreads rapidly. 1974: Work spreads to Berkeley. BSD UNIX is born. 1975: UNIX licensed to universities for free. 1978: Two UNIX styles, though similar and related: System V and BSD. 1980s: Many companies launch their versions of UNIX, including Microsoft.

A push towards cross-platform standards: POSIX/X11/Motif Independent organisations with cross-vendor membership Control future development and standards. IEEE included. 1990s: 64bit versions of UNIX released. Massively scalable systems. Internet springs to life, based on UNIX technologies. Further Standardisation efforts (OpenGL, UNIX95, UNIX98).

Detailed History. UNIX is now nearly 40 years old. It began life in 1969 as a combined project run by BTL, GE and MIT, initially created and managed by Ken Thompson and Dennis Ritchie [6]. The goal was to develop an operating system for a large computer which could support hundreds of simultaneous users. The very early phase actually started at BTL in 1957 when work began on what was to become BESYS, an OS developed by BTL for their internal needs. In 1964, BTL started on the third generation of their computing resources. They needed a new operating system and so initiated the MULTICS (MULTIplexed operating and Computing System) project in late 1964, a combined research programme between BTL, GE and MIT. Due to differing design goals between the three groups, Bell pulled out of the project in 1969, leaving personnel in Bell's Computing Science and Research Center with no usable computing environment. As a response to this move, Ken Thompson and Dennis Ritchie offered to design a new OS for BTL, using a PDP-7 computer which was available at the time. Early work was done in a language designed for writing compilers and systems programming, called BCPL (Basic Combined Programming Language). BCPL was quickly simplified and revised to produce a better language called B. By the end of 1969 an early version of the OS was completed; a pun at previous work on Multics, it was named UNICS (UNIplexed operating and Computing System) - an "emasculated Multics". UNICS included a primitive kernel, an editor, assembler, a simple shell command interpreter and basic command utilities such as rm, cat and cp. In 1970, extra funding arose from BTL's internal use of UNICS for patent processing; as a result, the researchers obtained a DEC PDP-11/20 for further work (24K RAM). At that time, the OS used 12K, with the remaining 12K used for user programs and a RAM disk (file size limit was 64K, disk size limit was 512K). BTL's Patent Department then took over the project, providing funding for a newer machine, namely a PDP-11/45. By this time, UNICS had been abbreviated to UNIX - nobody knows whose idea it was to change the name (probably just phonetic convenience).

In 1971, a book on UNIX by Thompson and Ritchie described over 60 commands, including:

             

b

(compile a B program)

chdir

(change working directory)

chmod

(change file access permissions)

chown

(change file ownership)

cp

(copy a file)

ls

(list directory contents)

who

(show who is on the system)

Even at this stage, fundamentally important aspects of UNIX were already firmly in place as core features of the overall OS, eg. file ownership and file access permissions. Today, other operating systems such as WindowsNT do not have these features as a rigorously integrated aspect of the core OS design, resulting in a plethora of overhead issues concerning security, file management, user access control and administration. These features, which are very important to modern computing environments, are either added as convoluted bolt-ons to other OSs or are totally nonexistent (NT does have a concept of file ownership, but it isn't implemented very well; regrettably, much of the advice given by people from VMS to Microsoft on how to implement such features was ignored). In 1972, Ritchie and Thompson rewrote B to create a new language called C. Around this time, Thompson invented the 'pipe' - a standard mechanism for allowing the output of one program or process to be used as the input for another. This became the foundation of the future UNIX OS development philosophy: write programs which do one thing and do it well; write programs which can work together and cooperate using pipes; write programs which support text streams because text is a 'universal interface' [6]. By 1973, UNIX had spread to sixteen sites, all within AT&T and WE. First made public at a conference in October that year, within six months the number of sites using UNIX had tripled. Following a publication of a version of UNIX in 'Communications of the ACM' in July 1974, requests for the OS began to rapidly escalate. Crucially at this time, the fundamentals of C were complete and much of UNIX's 11000 lines of code were rewritten in C - this was a major breakthrough in operating systems design: it meant that the OS could be used on virtually any computer platform since C was hardware independent. In late 1974, Thompson went to University of California at Berkeley to teach for a year. Working with Bill Joy and Chuck Haley, the three developed the 'Berkeley' version of UNIX (named BSD, for Berkeley Software Distribution), the source code of which was widely distributed to

students on campus and beyond, ie. students at Berkeley and elsewhere also worked on improving the OS. BTL incorporated useful improvements as they arose, including some work from a user in the UK. By this time, the use and distribution of UNIX was out of BTL's control, largely because of the work at Berkeley on BSD. Developments to BSD UNIX added the vi editor, C-based shell interpreter, the Sendmail email system, virtual memory, and support for TCP/IP networking technologies (Transmission Control Protocol/Internet Protocol). Again, a service as important as email was now a fundamental part of the OS, eg. the OS uses email as a means of notifying the system administrator of system status, problems, reports, etc. Any installation of UNIX for any platform automatically includes email; by complete contrast, email is not a part of Windows3.1, Win95, Win98 or WinNT email for these OSs must be added separately (eg. Pegasus Mail), sometimes causing problems which would not otherwise be present. In 1975, a further revision of UNIX known as the Fifth Edition was released and licensed to universities for free. After the release of the Seventh Edition in 1978, the divergence of UNIX development along two separate but related paths became clear: System V (BTL) and BSD (Berkeley). BTL and Sun combined to create System V Release 4 (SVR4) which brought together System V with large parts of BSD. For a while, SVR4 was the more rigidly controlled, commercial and properly supported (compared to BSD on its own), though important work occurred in both versions and both continued to be alike in many ways. Fearing Sun's possible domination, many other vendors formed the Open Software Foundation (OSF) to further work on BSD and other variants. Note that in 1979, a typical UNIX kernel was still only 40K. Because of a legal decree which prevented AT&T from selling the work of BTL, AT&T allowed UNIX to be widely distributed via licensing schemas at minimal or zero cost. The first genuine UNIX vendor, Interactive Systems Corporation, started selling UNIX systems for automating office work. Meanwhile, the work at AT&T (various internal design groups) was combined, then taken over by WE, which became UNIX System Laboratories (now owned by Novell). Later releases included Sytem III and various releases of System V. Today, most popular brands of UNIX are based either on SVR4, BSD, or a combination of both (usually SVR4 with standard enhancements from BSD, which for example describes SGI's IRIX version perfectly). As an aside, there never was a System I since WE feared companies would assume a 'system 1' would be bug-ridden and so would wait for a later release (or purchase BSD instead!). It's worth noting the influence from the superb research effort at Xerox Parc, which was working on networking technologies, electronic mail systems and graphical user interfaces, including the proverbial 'mouse'. The Apple Mac arose directly from the efforts of Xerox Parc which, incredibly and much against the wishes of many Xerox Parc employees, gave free demonstrations to people such as Steve Jobs (founder of Apple) and sold their ideas for next to nothing ($50000). This was perhaps the biggest financial give-away in history [7]. One reason why so many different names for UNIX emerged over the years was the practice of AT&T to license the UNIX software, but not the UNIX name itself. The various flavours of UNIX may have different names (SunOS, Solaris, Ultrix, AIX, Xenix, UnixWare, IRIX, Digital UNIX, HP-UX, OpenBSD, FreeBSD, Linux, etc.) but in general the differences between them

are minimal. Someone who learns a particular vendor's version of UNIX (eg. Sun's Solaris) will easily be able to adapt to a different version from another vendor (eg. DEC's Digital UNIX). Most differences merely concern the names and/or locations of particular files, as opposed to any core underlying aspect of the OS. Further enhancements to UNIX included compilation management systems such as make and Imake (allowing for a single source code release to be compiled on any UNIX platform) and support for source code management (SCCS). Services such as telnet for remote communication were also completed, along with ftp for file transfer, and other useful functions. In the early 1980s, Microsoft developed and released its version of UNIX called Xenix (it's a shame this wasn't pushed into the business market instead of DOS). The first 32bit version of UNIX was released at this time. SCO developed UnixWare which is often used today by Intel for publishing performance ratings for its x86-based processors [8]. SGI started IRIX in the early 1980s, combining SVR4 with an advanced GUI. Sun's SunOS sprang to life in 1984, which became widely used in educational institutions. NeXT-Step arrived in 1989 and was hailed as a superb development platform; this was the platform used to develop the game 'Doom', which was then ported to DOS for final release. 'Doom' became one of the most successful and influential PC games of all time and was largely responsible for the rapid demand for better hardware graphics systems amongst home users in the early 1990s - not many people know that it was originally designed on a UNIX system though. Similarly, much of the development work for Quake was done using a 4-processor Digital Alpha system [9]. During the 1980s, developments in standardised graphical user interface elements were introduced (X11 and Motif) along with other major additional features, especially Sun's Networked File System (NFS) which allows multiple file systems, from multiple UNIX machines from different vendors, to be transparently shared and treated as a single file structure. Users see a single coherant file system even though the reality may involve many different systems in different physical locations. By this stage, UNIX's key features had firmly established its place in the computing world, eg. Multi-tasking and multi-user (many independent processes can run at once; many users can use a single system at the same time; a single user can use many systems at the same time). However, in general, the user interface to most UNIX variants was poor: mainly text based. Most vendors began serious GUI development in the early 1980s, especially SGI which has traditionally focused on visual-related markets [10]. From the point of view of a mature operating system, and certainly in the interests of companies and users, there were significant moves in the 1980s and early 1990s to introduce standards which would greatly simplify the cross-platform use of UNIX. These changes, which continue today, include: 



The POSIX standard [6], begun in 1985 and released in 1990: a suite of application programming interface standards which provide for the portability of application source code relating to operating system services, managed by the X/Open group. X11 and Motif: GUI and windowing standards, managed by the X Consortium and OSF.

 





UNIX95, UNIX98: a set of standards and guidelines to help make the various UNIX flavours more coherant and cross-platform. OpenGL: a 3D graphics programming standard originally developed by SGI as GL (Graphics Library), then IrisGL, eventually released as an open standard by SGI as OpenGL and rapidly adopted by all other vendors. Journaled file systems such as SGI's XFS which allow the creation, management and use of very large file systems, eg. multiple terabytes in size, with file sizes from a single byte to millions of terabytes, plus support for real-time and predictable response. EDIT (2008): Linux can now use XFS. Interoperability standards so that UNIX systems can seamlessly operate with non-UNIX systems such as DOS PCs, WindowsNT, etc.

Standards Notes POSIX: X/Open eventually became UNIX International (UI), which competed for a while with OSF. The US Federal Government initiated POSIX (essentially a version of UNIX), requiring all government contracts to conform to the POSIX standard - this freed the US government from being tied to vendor-specific systems, but also gave UNIX a major boost in popularity as users benefited from the industry's rapid adoption of accepted standards.

X11 and Motif: Programming directly using low-level X11/Motif libraries can be non-trivial. As a result, higher level programming interfaces were developed in later years, eg. the ViewKit library suite for SGI systems. Just as 'Open Inventor' is a higher-level 3D graphics API to OpenGL, ViewKit allows one to focus on developing the application and solving the client's problem, rather than having to wade through numerous low-level details. Even higher-level GUI-based toolkits exist for rapid application development, eg. SGI's RapidApp.

UNIX95, UNIX98: Most modern UNIX variants comply with these standards, though Linux is a typical exception (it is POSIX-compliant, but does not adhere to other standards). There are several UNIX variants available for PCs, excluding Alpha-based systems which can also use NT (MIPS CPUs could once be used with NT as well, but Microsoft dropped NT support for MIPS due to competition fears from Intel whose CPUs were not as fast at the time [11]): 

Linux insecure.

Open-architecture, free, global development,

      

OpenBSD

More rigidly controlled, much more secure.

FreeBSD

Somewhere inbetween the above two.

UnixWare

More advanced. Scalable. Not free.

There are also commercial versions of Linux which have additional features and services, eg. Red Hat Linux and Calderra Linux. Note that many vendors today are working to enable the various UNIX variants to be used with Intel's CPUs - this is needed by Intel in order to decrease its dependence on the various Microsoft OS products.

OpenGL: Apple was the last company to adopt OpenGL. In the 1990s, Microsoft attempted to force its own standards into the marketplace (Direct3D and DirectX) but this move was doomed to failure due to the superior design of OpenGL and its ease of use, eg. games designers such as John Carmack (Doom, Quake, etc.) decided OpenGL was the much better choice for games development. Compared to Direct3D/DirectX, OpenGL is far superior for seriously complex problems such as visual simulation, military/industrial applications, image processing, GIS, numerical simulation and medical imaging. In a move to unify the marketplace, SGI and Microsoft signed a deal in the late 1990s to merge DirectX and Direct3D into OpenGL - the project, called Fahrenheit, will eventually lead to a single unified graphics programming interface for all platforms from all vendors, from the lowest PC to the fastest SGI/Cray supercomputer available with thousands of processors. To a large degree, Direct3D will simply either be phased out in favour of OpenGL's methods, or focused entirely on consumer-level applications, though OpenGL will dominate in the final product for the entertainment market. OpenGL is managed by the OpenGL Architecture Review Board, an independent organisation with member representatives from all major UNIX vendors, relevant companies and institutions.

Journaled file systems: File systems like SGI's XFS running on powerful UNIX systems like CrayOrigin2000 can easily support sustained data transfer rates of hundreds of gigabytes per second. XFS has a maximum file size limit of 9 million terabytes.

The end result of the last 30 years of UNIX development is what is known as an 'Open System', ie. a system which permits reliable application portability, interoperability between different systems and effective user portability between a wide variety of different vendor hardware and software platforms. Combined with a modern set of compliance standards, UNIX is now a mature, well-understood, highly developed, powerful and very sophisticated OS. Many important features of UNIX do not exist in other OSs such as WindowsNT and will not do so for years to come, if ever. These include guaranteeable reliability, security, stability, extreme scalability (thousands of processors), proper support for advanced multi-processing with unified shared memory and resources (ie. parallel compute systems with more than 1 CPU), support for genuine real-time response, portability and an ever-increasing ease-of-use through highly advanced GUIs. Modern UNIX GUIs combine the familiar use of icons with the immense power and flexibility of the UNIX shell command line which, for example, supports full remote administration (a significant criticism of WinNT is the lack of any real command line interface for remote administration). By contrast, Windows2000 includes a colossal amount of new code which will introduce a plethora of new bugs and problems.

A summary of key UNIX features would be:   







Multi-tasking: many different processes can operate independently at once. Multi-user: many users can use a single machine at the same time; a single user can use multiple machines at the same time. Multi-processing: most commercial UNIX systems scale to at least 32 or 64 CPUs (Sun, IBM, HP), while others scale to hundreds or thousands (IRIX, Unicos, AIX, etc.; Blue Mountain [12], Blue Pacific, ASCI Red). Today, WindowsNT cannot reliably scale to even 8 CPUs. Intel will not begin selling 8-way chip sets until Q3 1999. Multi-threading: automatic parallel execution of applications across multiple CPUs and graphics systems when programs are written using the relevant extensions and libraries. Some tasks are naturally non-threadable, eg. Rendering animation frames for movies (each processor computes a single frame using a round-robin approach), while others lend themselves very well to parallel execution, eg. Computational Fluid Dynamics, Finite Element Analysis, Image Processing, Quantum Chronodynamics, weather modeling, database processing, medical imaging, visual simulation and other areas of 3D graphics, etc. Platform independence and portability: applications written on UNIX systems will compile and run on other UNIX systems if they're developed with a standards-based approach, eg. the use of ANSI C or C++, Motif libraries, etc.; UNIX hides the hardware architecture from the user, easing portability. The close relationship between UNIX and C, plus the fact that the UNIX shell is based on C, provides for a powerful development environment. Today, GUI-based development environments for UNIX systems also exist, giving even greater power and flexibility, eg. SGI's WorkShop Pro CASE tools and RapidApp. Full 64bit environment: proper support for very large memory spaces, up to hundreds of GB of RAM, visible to the system as a single combined memory space. Comparison: NT's current maximum limit is 4GB; IRIX's current commercial limit is 512GB, though









Blue Mountain's 6144-CPU SGI system has a current limit of 12000GB RAM (twice that if the CPUs were upgraded to the latest model). Blue Mountain has 1500GB RAM installed at the moment. Inter-system communication: services such as telnet, Sendmail, TCP/IP, remote login (rlogin), DNS, NIS, NFS, etc. Sophisticated security and access control. Features such as email and telnet are a fundamental part of UNIX, but they must be added as extras to other OSs. UNIX allows one to transparently access devices on a remote system and even install the OS using a CDROM, DAT or disk that resides on a remote machine. Note that some of the development which went into these technologies was in conjunction with the evolution of ArpaNet (the early Internet that was just for key US government, military, research and educational sites). File identity and access: unique file ownership and a logical file access permission structure provide very high-level management of file access for use by users and administrators alike. OSs which lack these features as a core part of the OS make it far too easy for a hacker or even an ordinary user to gain administrator-level access (NT is a typical example). System identity: every UNIX system has a distinct unique entity, ie. a system name and an IP (Internet Protocol) address. These offer numerous advantages for users and administrators, eg. security, access control, system-specific environments, the ability to login and use multiple systems at once, etc. Genuine 'plug & play': UNIX OSs already include drivers and support for all devices that the source vendor is aware of. Adding most brands of disks, printers, CDROMs, DATs, Floptical drives, ZIP or JAZ drives, etc. to a system requires no installation of any drivers at all (the downside of this is that a typical modern UNIX OS installation can be large, eg. 300MB). Detection and name-allocation to devices is largely automatic - there is no need to assign specific interrupt or memory addresses for devices, or assign labels for disk drives, ZIP drives, etc. Devices can be added and removed without affecting the long-term operation of the system. This also often applies to internal components such as CPUs, video boards, etc. (at least for SGIs).

UNIX Today. In recent years, one aspect of UNIX that was holding it back from spreading more widely was cost. Many vendors often charged too high a price for their particular flavour of UNIX. This made its use by small businesses and home users prohibitive. The ever decreasing cost of PCs, combined with the sheer marketing power of Microsoft, gave rise of the rapid growth of Windows and now WindowsNT. However, in 1993, Linus Torvalds developed a version of UNIX called Linux (he pronounces it rather like 'leenoox', rhyming with 'see-books') which was free and ran on PCs as well as other hardware platforms such as DEC machines. In what must be one of the most astonishing developments of the computer age, Linux has rapidly grown to become a highly popular OS for home and small business use and is now being supported by many major companies too, including Oracle, IBM, SGI, HP, Dell and others.

Linux does not have the sophistication of the more traditional UNIX variants such as SGI's IRIX, but Linux is free (older releases of IRIX such as IRIX 6.2 are also free, but not the very latest release, namely IRIX 6.5). This has resulted in the rapid adoption of Linux by many people and businesses, especially for servers, application development, home use, etc. With the recent announcement of support for multi-processing in Linux for up to 8 CPUs, Linux is becoming an important player in the UNIX world and a likely candidate to take on Microsoft in the battle for OS dominance. However, it'll be a while before Linux will be used for 'serious' applications since it does not have the rigorous development history and discipline of other UNIX versions, eg. Blue Mountain is an IRIX system consisting of 6144 CPUs, 1500GB RAM, 76000GB disk space, and capable of 3000 billion floating-point operations per second. This level of system development is what drives many aspects of today's UNIX evolution and the hardware which supports UNIX OSs. Linux lacks this top-down approach and needs a lot of work in areas such as security and support for graphics, but Linux is nevertheless becoming very useful in fields such as render-farm construction for movie studios, eg. a network of cheap PentiumIII machines, networked and running the free Linux OS, reliable and stable. The film "Titanic" was the first major film which used a Linuxbased render-farm, though it employed many other UNIX systems too (eg. SGIs, Alphas), as well as some NT systems. EDIT (2008): Linux is now very much used for serious work, running most of the planet's Internet servers, and widely used in movie studios for Flame/Smoke on professional x86 systems. It's come a long way since 1999, with new distributions such as Ubuntu and Gentoo proving very popular. At the high-end, SGI offers products that range from its shared-memory Linux-based Altix 4700 system with up to 1024 CPUs, to the Altix ICE, a highly expandable XEON/Linux cluster system with some sites using machines with tens of thousands of cores. UNIX has come a long way since 1969. Thompson and Ritchie could never have imagined that it would spread so widely and eventually lead to its use in such things as the control of the Mars Pathfinder probe which last year landed on Mars, including the operation of the Internet web server which allowed millions of people around the world to see the images brought back as the Martian event unfolded [13]. Today, from an administrator's perspective, UNIX is a stable and reliable OS which pretty much runs itself once it's properly setup. UNIX requires far less daily administration than other OSs

such as NT - a factor not often taken into account when companies form purchasing decisions (salaries are a major part of a company's expenditure). UNIX certainly has its baggage in terms of file structure and the way some aspects of the OS actually work, but after so many years most if not all of the key problems have been solved, giving rise to an OS which offers far superior reliability, stability, security, etc. In that sense, UNIX has very well-known baggage which is absolutely vital to safety-critical applications such as military, medical, government and industrial use. Byte magazine once said that NT was only now tackling OS issues which other OSs had solved years before [14]. Thanks to a standards-based and top-down approach, UNIX is evolving to remove its baggage in a reliable way, eg. the introduction of the NSD (Name Service Daemon) to replace DNS (Domain Name Service), NIS (Network Information Service) and aspects of NFS operation; the new service is faster, more efficient, and easier on system resources such as memory and network usage. However, in the never-ending public relations battle for computer systems and OS dominance, NT has firmly established itself as an OS which will be increasingly used by many companies due to the widespread use of the traditional PC and the very low cost of Intel's mass-produced CPUs. Rival vendors continue to offer much faster systems than PCs, whether or not UNIX is used, so I expect to see interesting times ahead in the realm of OS development. Companies like SGI bridge the gap by releasing advanced hardware systems which support NT (eg. the Visual Workstation 320 [15]), systems whose design is born out of UNIX-based experience. One thing is certain: some flavour of UNIX will always be at the forefront of future OS development, whatever variant it may be.

References 1. Texaco processes GIS data in order to analyse suitable sites for oil exploration. Their models can take several months to run even on large multi-processor machines. However, as systems become faster, companies like Texaco simply try to solve more complex problems, with more detail, etc. 2. Chevron's Nigerian office has, what was in mid-1998, the fastest supercomputer in Africa, namely a 16-processor SGI POWER Challenge (probably replaced by now with a modern 64-CPU Origin2000). A typical data set processed by the system is about 60GB which takes around two weeks to process, during which time the system must not go wrong or much processing time is lost. For individual work, Chevron uses Octane workstations which are able to process 750MB of volumetric GIS data in less than three seconds. Solving these types of problems with PCs is not yet possible. 3. The 'Tasmania Parks and Wildlife Services' (TPWS) organisation is responsible for the management and environmental planning of Tasmania's National Parks. They use modern systems like the SGI O2 and SGI Octane for modeling and simulation (virtual park models to aid in decision making and planning), but have found that much older systems such as POWER Series Predator and Crimson RealityEngine (SGI systems dating from

1992) are perfectly adequate for their tasks, and can still outperform modern PCs. For example, the full-featured pixel-fill rate of their RealityEngine system (320M/sec), which supports 48bit colour at very high resolutions (1280x2048 with 160MB VRAM), has still not been bettered by any modern PC solution. Real-time graphics comparisons at http://www.blender.nl/stuff/blench1.html show Crimson RE easily outperforming many modern PCs which ought to be faster given RE is 7 years old. Information supplied by Simon Pigot (TPWS SysAdmin). 4. "State University of New York at Buffalo Teams up with SGI for Next-Level Supercomputing Site. New Facility Brings Exciting Science and Competitive Edge to University": http://www.sgi.com/origin/successes/buffalo.html

5. Even though the email-related aspects of the Computing Department's SGI network have not been changed in any way from the default settings (created during the original OS installation), users can still email other users on the system as well as send email to external sites. 6. Unix history: http://virtual.park.uga.edu/hc/unixhistory.html

A Brief History of UNIX: http://pantheon.yale.edu/help/unixos/unix-intro.html

UNIX Lectures: http://www.sis.port.ac.uk/~briggsjs/csar4/U2.htm

Basic UNIX: http://osiris.staff.udg.mx/man/ingles/his.html

POSIX: Portable Operating System Interface: http://www.pasc.org/abstracts/posix.htm

7. "The Triumph of the Nerds", Channel 4 documentary. 8. Standard Performance Evaluation Corporation: http://www.specbench.org/

Example use of UnixWare by Intel for benchmark reporting: http://www.specbench.org/osg/cpu95/results/res98q3/cpu95-980831-03026.html http://www.specbench.org/osg/cpu95/results/res98q3/cpu95-980831-03023.html 9. "My Visit to the USA" (id Software, Paradigm Simulation Inc., NOA): http://doomgate.gamers.org/dhs/dhs/usavisit/dallas.html

10. Personal IRIS 4D/25, PCW Magazine, September 1990, pp. 186: http://www.futuretech.vuurwerk.nl/pcw9-90pi4d25.html

IndigoMagic User Environment, SGI, 1993 [IND-MAGIC-BRO(6/93)]. IRIS Indigo Brochure, SGI, 1991 [HLW-BRO-01 (6/91)]. "Smooth Operator", CGI Magazine, Vol4, Issue 1, Jan/Feb 1999, pp. 41-42. Digital Media World '98 (Film Effects and Animation Festival, Wembley Conference Center, London). Forty six pieces of work were submitted to the conference magazine by company attendees. Out of the 46 items, 43 had used SGIs; of these, 34 had used only SGIs.

11. "MIPS-based PCs fastest for WindowsNT", "MIPS Technologies announces 200MHz R4400 RISC microprocessor", "MIPS demonstrates Pentium-class RISC PC designs", all from IRIS UK, Issue 1, 1994, pp. 5. 12. Blue Mountain, Los Alamos National Laboratory: 13. http://www.lanl.gov/asci/ 14. http://www.lanl.gov/asci/bluemtn/ASCI_fly.pdf 15. http://www.lanl.gov/asci/bluemtn/bluemtn.html 16. http://www.lanl.gov/asci/bluemtn/t_sysnews.shtml http://www.lanl.gov/orgs/pa/News/111298.html#anchor263034

17. "Silicon Graphics Technology Plays Mission-Critical Role in Mars Landing" http://www.sgi.com/newsroom/press_releases/1997/june/jplmars_release.html "Silicon Graphics WebFORCE Internet Servers Power Mars Web Site, One of the World's Largest Web Events" http://www.sgi.com/newsroom/press_releases/1997/july/marswebforce_release.html "PC Users Worldwide Can Explore VRML Simulation of Mars Terrain Via the Internet" http://www.sgi.com/newsroom/press_releases/1997/june/vrmlmars_release.html 18. "Deja Vu All Over Again"; "Windows NT security is under fire. It's not just that there are holes, but that they are holes that other OSes patched years ago", Byte Magazine, Vol 22 No. 11, November 1997 Issue, pp. 81 to 82, by Peter Mudge and Yobie Benjamin. 19. VisualWorkstation320 Home Page: http://visual.sgi.com/

Day 1: Part 2:

The basics: files, UNIX shells, editors, commands. Regular Expressions and Metacharacters in Shells.

UNIX Fundamentals: Files and the File System. At the lowest level, from a command-line point of view, just about everything in a UNIX environment is treated as a file - even hardware entities, eg. Printers, disks and DAT drives. Such items might be described as 'devices' or with other terms, but at the lowest level they are visible to the admin and user as files somewhere in the UNIX file system (under /dev in the case of hardware devices). Though this structure may seem a little odd at first, it means that system commands can use a common processing and communication interface no matter what type of file they're dealing with, eg. Text, pipes, data redirection, etc. (these concepts are explained in more detail later). The UNIX file system can be regarded as a top-down tree of files and directories, starting with the top-most 'root' directory. A directory can be visualised as a filing cabinet, other directories as folders within the cabinet and individual files as the pieces of paper within folders. It's a useful analogy if one isn't familiar with file system concepts, but somewhat inaccurate since a directory in a computer file system can contain files on their own as well as other directories, ie. Most office filing cabinets don't have loose pieces of paper outside of folders. UNIX file systems can also have 'hidden' files and directories. In DOS, a hidden file is just a file with a special attribute set so that 'dir' and other commands do not show the file; by contrast, a hidden file in UNIX is any file which begins with a dot '.' (period) character, ie. the hidden status is a result of an aspect of the file's name, not an attribute that is bolted onto the file's general existence. Further, whether or not a user can access a hidden file or look inside a hidden directory has nothing to do with the fact that the file or directory is hidden from normal view (a hidden file in DOS cannot be written to). Access permissions are a separate aspect of the fundamental nature of a UNIX file and are dealt with later. The 'ls' command lists files and directories in the current directory, or some other part of the file system by specifying a 'path' name. For example: ls /

Will show the contents of the root directory, which may typically contain the following: CDROM bin debug

dev dumpster etc

home lib lib32

mapleson nsmail opt

proc root.home sbin

stand tmp unix

usr var

Figure 1. A typical root directory shown by 'ls'. Almost every UNIX system has its own unique root directory and file system, stored on a disk within the machine. The exception is a machine with no internal disk, running off a remote server in some way;

such systems are described as 'diskless nodes' and are very rare in modern UNIX environments, though still used if a diskless node is an appropriate solution.

Some of the items in Fig 1. Are files, while others are directories? If one uses an option '-F' with the ls command, special characters are shown after the names for extra clarity: /

- directory

*

- executable file

@

- link to another file or directory Elsewhere in the file system

Thus, using 'ls -F' gives this more useful output: CDROM/ Bin/ Debug/

dev/ dumpster/ etc/

home/ lib/ lib32/

mapleson/ nsmail/ opt/

proc/ root.home sbin/

stand/ tmp/ unix*

usr/ var/

Figure 2. The root directory shown by 'ls -F /'. Fig 2 shows that most of the items are in fact other directories. Only two items are ordinary files: 'unix' and 'root.home'. 'UNIX' is the main UNIX kernel file and is often several megabytes in size for today's modern UNIX systems - this is partly because the kernel must often include support for 64bit as well as older 32bit system components. 'root.home' is merely a file created when the root user accesses the WWW using Netscape, ie. an application-specific file.

Important directories in the root directory: /bin

- many as-standard system commands are here (links to /usr/bin)

/dev

- device files for keyboard, disks, printers, etc.

/etc

- system configuration files

/home

- user accounts are here (NFS mounted)

/lib

- library files used by executable programs

/sbin

- user applications and other commands

/tmp

- temporary directory (anyone can create files here). This directory is normally erased on bootup

/usr

- Various product-specific directories, system resource directories, locations of online help (/usr/share), header files of application development (usr/include), further system configuration files relating to low-level

hardware which are rarely touched even by an administrator (eg. /usr/cpu and /usr/gfx). /var

- X Windows files (/var/X11), system services files (eg. software licenses in /var/flexlm), various application related files (/var/netscape, /var/dmedia), system administration files and data (/var/adm, /var/spool) and a second temporary directory (/var/tmp) which is not normally erased on bootup (an administrator can alter the behaviour of both /tmp and /var/tmp).

/mapleson

- (non-standard) my home account is here, NFSmounted from the admin Indy called Milamber.

Figure 3. Important directories in the root directory. Comparisons with other UNIX variants such as HP-UX, SunOS and Solaris can be found in the many FAQ (Frequently Asked Questions) files available via the Internet [1].

Browsing around the UNIX file system can be enlightening but also a little overwhelming at first. However, an admin never has to be concerned with most parts of the file structure; lowlevel system directories such as /var/cpu are managed automatically by various system tasks and programs. Rarely, if ever, does an admin even have to look in such directories, never mind alter their contents (the latter is probably an unwise thing to do). From the point of view of a novice admin, the most important directory is /etc. It is this directory which contains the key system configuration files and it is these files which are most often changed when an admin wishes to alter system behaviour or properties. In fact, an admin can get to grips with how a UNIX system works very quickly, simply by learning all about the following files to begin with: /etc/sys_id

- the name of the system (may include full domain)

/etc/hosts

- summary of full host names (standard file, added to by the administrator)

/etc/fstab

- list of file systems to mount on bootup

/etc/passwd

- password file, contains user account information

/etc/group

- group file, contains details of all user groups

Figure 4. Key files for the novice administrator. Note that an admin also has a personal account, ie. an ordinary user account, which should be used for any task not related to system administration. More precisely, an admin should only be logged in as root when it is strictly necessary, mainly to avoid unintended actions, eg. accidental use of the 'rm' command.

A Note on the 'man' Command. The manual pages and other online information for the files shown in Fig 4 all list references to other related files, eg. the man page for 'fstab' lists 'mount' and 'xfs' in its 'SEE ALSO' section, as well as an entry called 'filesystems' which is a general overview document about UNIX file systems of all types, including those used by CDROMs and floppy disks. Modern UNIX releases contain a large number of useful general reference pages such as 'filesystems'. Since one may not know what is available, the 'k' and 'f' options can be used with the man command to offer suggestions, eg. 'man -f file' gives this output (the -f option shows all man page titles for entries that begin with the word 'file'): ferror, feof, clearerr, fileno (3S) file (1) file (3Tcl) File::Compare (3) File::Copy (3) File::DosGlob (3) File::Path (3) File::stat (3) filebuf (3C++) FileCache (3) fileevent (3Tk) FileHandle (3) filename_to_devname (2) filename_to_drivername (2) fileparse (3) files (7P) FilesystemManager (1M) filesystems: cdfs, dos, fat, EFS, hfs, mac, iso9660, cd-rom, kfs, nfs, XFS, rockridge (4) filetype (5) filetype, fileopen, filealtopen, wstype (1) routeprint, fileconvert (1)

stream status inquiries determine file type Manipulate file names and attributes Compare files or filehandles Copy files or filehandles DOS like globbing and then some create or remove a series of directories by-name interface to Perl's built-in stat() functions buffer for file I/O. keep more files open than the system permits Execute a script when a file becomes readable or writable supply object methods for filehandles determine the device name for the device file determine the device name for the device file split a pathname into pieces local files name service parser library view and manage filesystems

IRIX filesystem types K-AShare's filetype specification file determine filetype of specified file or files convert file to printer or to specified filetype

Figure 5. Output from 'man -f file'. 'man -k file' gives a much longer output since the '-k' option runs a search on every man page title containing the word 'file'. So a point to note: judicious use of the man command along with other online information is an effective way to learn how any UNIX system works and how to make changes to

system behaviour. All man pages for commands give examples of their use, a summary of possible options, syntax, further references, a list of any known bugs with appropriate workarounds, etc.

The next most important directory is probably /var since this is where the configuration files for many system services are often housed, such as the Domain Name Service (/var/named) and Network Information Service (/var/yp). However, small networks usually do not need these services which are aimed more at larger networks. They can be useful though, for example in aiding Internet access. Overall, a typical UNIX file system will have over several thousand files. It is possible for an admin to manage a system without ever knowing what the majority of the system's files are for. In fact, this is a preferable way of managing a system. When a problem arises, it is more important to know where to find relevant information on how to solve the problem, rather than try to learn the solution to every possible problem in the first instance (which is impossible). I once asked an experienced SGI administrator (the first person to ever use the massive Cray T3D supercomputer at the Edinburgh Parallel Computing Centre) what the most important thing in his daily working life was. He said it was a small yellow note book in which he had written where to find information about various topics. The book was an index on where to find facts, not a collection of facts in itself. Hidden files were described earlier. The '-a' option can be used with the ls command to show hidden files: ls -a /

gives: ./ ../ .Acroread.License .Sgiresources .cshrc .desktop-yoda/ .ebtpriv/ .expertInsight .insightrc .jotrc* .login .netscape/ .profile .rhosts

.sgihelprc .ssh/ .varupdate .weblink .wshttymode .zmailrc CDROM/ bin/ debug/ dev/ dumpster/ etc/ floppy/ home/

lib/ lib32/ mapleson/ nsmail/ opt/ proc/ sbin/ stand/ swap/ tmp/ unix* usr/ var/

Figure 6. Hidden files shown with 'ls -a /'. For most users, important hidden files would be those which configure their basic working environment when they login: .cshrc .login .profile

Other hidden files and directories refer to application-specific resources such as Netscape, or GUI-related resources such as the .desktop-sysname directory (where 'sysname' is the name of the host). Although the behaviour of the ls command can be altered with the 'alias' command so that it shows hidden files by default, the raw behaviour of ls can be accessed by using an absolute directory path to the command: /bin/ls

Using the absolute path to any file in this way allows one to ignore any aliases which may have been defined, as well as the normal behaviour of the shell to search the user's defined path for the first instance of a command. This is a useful technique when performing actions as root since it ensures that the wrong command is not executing by mistake.

Network File System (NFS) An important feature of UNIX is the ability to access a particular directory on one machine from another machine. This service is called the 'Network File System' (NFS) and the procedure itself is called 'mounting'. For example, on the machines in Ve24, the directory /home is completely empty - no files are in it whatsoever (except for a README file which is explained below). When one of the Indys is turned on, it 'mounts' the /home directory from the server 'on top' of the /home directory of the local machine. Anyone looking in the /home directory actually sees the contents of /home on the server. The 'mount' command is used to mount a directory on a file system belonging to a remote host onto some directory on the local host's filesystem. The remote host must 'export' a directory in order for other hosts to locally mount it. The /etc/exports file contains a list of directories to be exported. For example, the following shows how the /home directory on one of the Ve24 Indys (akira) is mounted off the server, yet appears to an ordinary user to be just another part of akira's overall file system (NB: the '#' indicates these actions are being performed as root; an ordinary user would not be able to use the mount command in this way): AKIRA 1# mount | grep YODA YODA:/var/www on /var/www type nfs (vers=3,rw,soft,intr,bg,dev=c0001) YODA:/var/mail on /var/mail type nfs (vers=3,rw,dev=c0002) YODA:/home on /home type nfs (vers=3,rw,soft,intr,bg,dev=c0003) AKIRA 1# ls /home dist/ projects/ pub/ staff/ students/ tmp/ yoda/ AKIRA 2# umount /home AKIRA 1# mount | grep YODA YODA:/var/www on /var/www type nfs (vers=3,rw,soft,intr,bg,dev=c0001) YODA:/var/mail on /var/mail type nfs (vers=3,rw,dev=c0002)

AKIRA 3# README AKIRA 4# AKIRA 5# dist/ AKIRA 6# CDROM/ bin/ debug/

ls /home mount /home ls /home projects/ ls / dev/ dumpster/ etc/

pub/

staff/

students/

tmp/

yoda/

home/ lib/ lib32/

mapleson/ nsmail/ opt/

proc/ root.home sbin/

stand/ tmp/ unix*

usr/ var/

Figure 7. Manipulating an NFS-mounted file system with 'mount'.

Each Indy has a README file in its local /home, containing: The /home filesystem from Yoda is not mounted for some reason. Please contact me immediately! Ian Mapleson, Senior Technician. 3297 (internal) [email protected]

After /home is remounted in Fig 7, the ls command no longer shows the README file as being present in /home, ie. when /home is mounted from the server, the local contents of /home are completely hidden and inaccessible. When accessing files, a user never has to worry about the fact that the files in a directory which has been mounted from a remote system actually reside on a physically separate disk, or even a different UNIX system from a different vendor. Thus, NFS gives a seamless transparent way to merge different files systems from different machines into one larger structure. At the department where I studied years ago [2], their UNIX system included Hewlett Packard machines running HP-UX, Sun machines running SunOS, SGIs running IRIX, DEC machines running Digital UNIX, PCs running an X-Windows emulator called Windows Exceed, and some Linux PCs. All the machines had access to a single large file structure so that any user could theoretically use any system in any part of the building (except where deliberately prevented from doing so via local system file alterations). Another example is my home directory /mapleson - this directory is mounted from the admin Indy (Technicians' office Ve48) which has my own extra external disk locally mounted. As far as the server is concerned, my home account just happens to reside in /mapleson instead of /home/staff/mapleson. There is a link to /mapleson from /home/staff/mapleson which allows other staff and students to access my directory without having to ever be aware that my home account files do not physically reside on the server. Every user has a 'home directory'. This is where all the files owned by that user are stored. By default, a new account would only include basic files such as .login, .cshrc and .profile. Admin customisation might add a trash 'dumpster' directory, user's WWW site directory for public access, email directory, perhaps an introductory README file, a default GUI layout, etc.

UNIX Fundamentals: Processes and process IDs. As explained in the UNIX history, a UNIX OS can run many programs, or processes, at the same time. From the moment a UNIX system is turned on, this process is initiated. By the time a system is fully booted so that users can login and use the system, many processes will be running at once. Each process has its own unique identification number, or process ID. An administrator can use these ID numbers to control which processes are running in a very direct manner. For example, if a user has run a program in the background and forgotten to close it down before logging off (perhaps the user's process is using up too much CPU time) then the admin can shutdown the process using the kill command. Ordinary users can also use the kill command, but only on processes they own. Similarly, if a user's display appears frozen due to a problem with some application (eg. Netscape) then the user can logon to a different system, login to the original system using rlogin, and then use the kill command to shutdown the process at fault either by using the specific process ID concerned, or by using a general command such as killall, eg.: killall netscape

This will shutdown all currently running Netscape processes, so using specific ID numbers is often attempted first. Most users only encounter the specifics of processes and how they work when they enter the world of application development, especially the lower-level aspects of inter-process communication (pipes and sockets). Users may often run programs containing bugs, perhaps leaving processes which won't close on their own. Thus, kill can be used to terminate such unwanted processes. The way in which UNIX manages processes and the resources they use is extremely tight, ie. it is very rare for a UNIX system to completely fall over just because one particular process has caused an error. 3rd-party applications like Netscape are usually the most common causes of process errors. Most UNIX vendors vigorously test their own system software to ensure they are, as far as can be ascertained, error-free. One reason why alot of work goes into ensuring programs are bug free is that bugs in software are a common means by which hackers try to gain root (admin) access to a system: by forcing a particular error condition, a hacker may be able to exploit a bug in an application. For an administrator, most daily work concerning processes is about ensuring that system resources are not being overloaded for some reason, eg. a user running a program which is forking itself repeatedly, slowing down a system to a crawl. In the case of the SGI system I run, staff have access to the SGI server, so I must ensure that staff do not carelessly run processes which hog CPU time. Various means are available by which an administrator can restrict the degree to which any particular process can utilise system resources, the most important being a process priority level (see the man pages for 'nice' and 'renice').

The most common process-related command used by admins and users is 'ps', which displays the current list of processes. Various options are available to determine which processes are displayed and in what output format, but perhaps the most commonly used form is this: ps -ef

which shows just about everything about every process, though other commands exist which can give more detail, eg. the current CPU usage for each process (osview). Note that other UNIX OSs (eg. SunOS) require slightly different options, eg. 'ps -aux' - this is an example of the kind of difference which users might notice between System V and BSD derived UNIX variants.

The Pipe. An important aspect of processes is inter-process communication. From an every day point of view, this involves the concept of pipes. A pipe, as the name suggests, acts as a communication link between two processes, allowing the output of one processes to be used as the input for another. The pipe symbol is a vertical bar '|'. One can use the pipe to chain multiple commands together, eg.: cat *.txt | grep pattern | sort | lp

The above command sequence dumps the contents of all the files in the current directory ending in .txt, but instead of the output being sent to the 'standard output' (ie. the screen), it is instead used as the input for the grep operation which scans each incoming line for any occurence of the word 'pattern' (grep's output will only be those lines which do contain that word, if any). The output from grep is then sorted by the sort program on a line-by-line basis for each file found by cat (in alphanumeric order). Finally, the output from sort is sent to the printer using lp. The use of pipes in this way provides an extremely effective way of combining many commands together to form more powerful and flexible operations. By contrast, such an ability does not exist in DOS, or in NT. Processes are explained further in a later lecture, but have been introduced now since certain process-related concepts are relevant when discussing the UNIX 'shell'.

UNIX Fundamentals: The Shell Command Interface. A shell is a command-line interface to a UNIX OS, written in C, using a syntax that is very like the C language. One can enter simple commands (shell commands, system commands, userdefined commands, etc.), but also more complex sequences of commands, including expressions and even entire programs written in a scripting language called 'shell script' which is based on C and known as 'sh' (sh is the lowest level shell; rarely used by ordinary users, it is often used by admins and system scripts). Note that 'command' and 'program' are used synonymously here.

Shells are not in any way like the PC DOS environment; shells are very powerful and offer users and admins a direct communication link to the core OS, though ordinary users will find there is a vast range of commands and programs which they cannot use since they are not the root user. Modern GUI environments are popular and useful, but some tasks are difficult or impossible to do with an iconic interface, or at the very least are simply slower to perform. Shell commands can be chained together (the output of one command acts as the input for another), or placed into an executable file like a program, except there is no need for a compiler and no object file - shell 'scripts' are widely used by admins for system administration and for performing common tasks such as locating and removing unwanted files. Combined with the facility for full-scale remote administration, shells are very flexible and efficient. For example, I have a single shell script 'command' which simultaneously reboots all the SGI Indys in Ve24. These shortcuts are useful because they minimise keystrokes and mistakes. An admin who issues lengthy and complex command lines repeatedly will find these shortcuts a handy and necessary time-saving feature. Shells and shell scripts can also use variables, just as a C program can, though the syntax is slightly different. The equivalent of if/then statements can also be used, as can case statements, loop structures, etc. Novice administrators will probably not have to use if/then or other more advanced scripting features at first, and perhaps not even after several years. It is certainly true that any administrator who already knows the C programming language will find it very easy to learn shell script programming, and also the other scripting languages which exist on UNIX systems such as perl (Practical Extraction and Report Language), awk (pattern scanning and processing language) and sed (text stream editor). perl is a text-processing language, designed for processing text files, extracting useful data, producing reports and results, etc. perl is a very powerful tool for system management, especially combined with other scripting languages. However, perl is perhaps less easy to learn for a novice; the perl man page says, "The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal)." I have personally never had to write a perl program as yet, or a program using awk or sed. This is perhaps a good example if any were required of how largely automated modern UNIX systems are. Note that the perl man page serves as the entire online guide to the perl language and is thus quite large. An indication of the fact that perl and similar languages can be used to perform complex processing operations can be seen by examining the humourous closing comment in the perl man page: "Perl actually stands for Pathologically Eclectic Rubbish Lister, but don't tell anyone I said that."

Much of any modern UNIX OS actually operates using shell scripts, many of which use awk, sed and perl as well as ordinary shell commands and system commands. These scripts can look quite complicated, but in general they need not be of any concern to the admin; they are often quite old (ie. written years ago), well understood and bug-free. Although UNIX is essentially a text-based command-driven system, it is perfectly possible for most users to do the majority or even all of their work on modern UNIX systems using just the

GUI interface. UNIX variants such as IRIX include advanced GUIs which combine the best of both worlds. It's common for a new user to begin with the GUI and only discover the power of the text interface later. This probably happens because most new users are already familiar with other GUI-based systems (eg. Win95) and initially dismiss the shell interface because of prior experience of an operating system such as DOS, ie. they perceive a UNIX shell to be just some weird form of DOS. Shells are not DOS, ie.:  

DOS is an operating system. Win3.1 is built on top of DOS, as is Win95, etc. UNIX is an operating system. Shells are a powerful text command interface to UNIX and not the OS itself. A UNIX OS uses shell techniques in many aspects of its operation.

Shells are thus nothing like DOS; they are closely related to UNIX in that the very first version of UNIX included a shell interface, and both are written in C. When a UNIX system is turned on, a shell is used very early in the boot sequence to control what happens and execute actions.

Because of the way UNIX works and how shells are used, much of UNIX's inner workings are hidden, especially at the hardware level. This is good for the user who only sees what she or he wants and needs to see of the file structure. An ordinary user focuses on their home directory and certain useful parts of the file system such as /var/tmp and /usr/share, while an admin will also be interested in other directories which contain system files, device files, etc. such as /etc, /var/adm and /dev. The most commonly used shells are: bsh

- Bourne Shell; standard/job control - command programming language

ksh

- modern alternative to bsh, but still restricted

csh

- Berkeley's C Shell; a better bsh - with many additional features

tcsh

- an enhanced version of csh

Figure 8. The various available shells. These offer differing degrees of command access/history/recall/editing and support for shell script programming, plus other features such as command aliasing (new names for user-defined sequences of one or more commands). There is also rsh which is essentially a restricted version of the standard command interpreter sh; it is used to set up login names and execution environments whose capabilities are more controlled than those of the standard shell. Shells such as csh and tcsh execute the file /etc/cshrc before reading the user's own .cshrc, .login and perhaps .tcshrc file if that exists.

Shells use the concept of a 'path' to determine how to find commands to execute. The 'shell path variable', which is initially defined in the user's .cshrc or .tcshrc file, consists of a list of directories, which may be added to by the user. When a command is entered, the shell environment searches each directory listed in the path for the command. The first instance of a file which matches the command is executed, or an error is given if no such executable command

is found. This feature allows multiple versions of the same command to exist in different locations (eg. different releases of a commercial application). The user can change the path variable so that particular commands will run a file from a desired directory. Try: echo $PATH

The list of directories is given. WARNING: the dot '.' character at the end of a path definition means 'current directory'; it is dangerous to include this in the root user's path definition (this is because a root user could run an ordinary user's program(s) by mistake). Even an ordinary user should think twice about including a period at the end of their path definition. For example, suppose a file called 'la' was present in /tmp and was set so that it could be run by any user. Enterting 'la' instead of 'ls' by mistake whilst in /tmp would fail to find 'la' in any normal system directory, but a period in the path definition would result in the shell finding la in /tmp and executing it; thus, if the la file contained malicious commands (eg. '/bin/rm -rf $HOME/mail'), then loss of data could occur.

Typical commands used in a shell include (most useful commands listed first): cd ls rm mv cat more find grep man mkdir rmdir pwd cmp lp df du mail passwd

-

change directory list contents of directory delete a file (no undelete!) move a file dump contents of a file display file contents in paged format search file system for files/directories scan file(s) using pattern matching read/search a man page (try 'man man') create a directory remove directory ('rm -r' has the same effect) print current absolute working directory show differences between two files print a file show disk usage show space used by directories/files send an email message to another user change password (yppasswd for systems with NIS)

Figure 9. The commands used most often by any user. Editors: vi

xedit jot nedit

- ancient editor. Rarely used (arcane), but occasionally useful, especially for remote administration.

- GUI editors (jot is old, nedit is - newer, xedit is very simple).

Figure 10. Editor commands. Most of these are not built-in shell commands. Enter 'man csh' or 'man tcsh' to see which commands are part of the shell and hence which are other system programs, eg. 'which' is a shell command, but 'grep' is not; 'cd' is a shell command, but 'ls' is not.

vi is an ancient editor developed in the very early days of UNIX when GUI-based displays did not exist. It is not used much today, but many admins swear by it - this is only really because they know it so well after years of experience. The vi editor can have its uses though, eg. for remote administration: if you happen to be using a Wintel PC in an Internet cafe and decide to access a remote UNIX system via telnet, the vi editor will probably be the only editor which you can use to edit files on the remote system. Jot has some useful features, especially for programmers (macros, "Electric C Mode"), but is old and contains an annoying colour map bug; this doesn't affect the way jot works, but does sometimes scramble on-screen colours within the jot window. SGI recommends nedit be used instead. xedit is a very simple text editor. It has an extremely primitive file selection interface, but has a rather nice search/replace mechanism. nedit is a newer GUI editor with more modern features. jot is specific to SGI systems, while vi, xedit and nedit exist on any UNIX variant (if not by default, then they can be downloaded in source code or executable format from relevant anonymous ftp sites).

Creating a new shell: sh, csh, tcsh, bsh, ksh - use man pages to see differences

I have configured the SGI machines in Ve24 to use tcsh by default due to the numerous extra useful features in tcsh, including file name completion (TAB), command-line editing, alias support, file listing in the middle of a typed command (CTRL-D), command recall/reuse, and many others (the man page lists 36 main extras compared to csh). Further commands: which chown chgrp chmod who rusers sleep sort spell

- show location of a command based on current path definition - change owner ID of a file - change group ID of a file - change file access permissions - show who is on the local system - show all users on local network - pause for a number of seconds - sort data into a particular order - run a spell-check on a file

split strings cut tr wc whoami write wall talk to_dos to_unix su

- split a file into a number of pieces - show printable text strings in a file - cut out selected fields of each line of a file - substitute/delete characters from a text stream or file - count the number of words in a file - show user ID - send message to another user - broadcast to all users on local system - request 1:1 communication link with another user - convert text file to DOS format (add CTRL-M and CTRL-Z) - convert text file to UNIX format (opposite of to_dos) - adopt the identity of another user (password usually required)

Figure 11. The next most commonly used commands.

Of the commands shown in Fig 11, only 'which' is a built-in shell command. Any GUI program can also be executed via a text command (the GUI program is just a highlevel interface to the main program), eg. 'fm' for the iconic file manager/viewer, 'apanel' for the Audio Panel, 'printers' for the Printer Manager, 'iconbook' for Icon Catalog, 'mouse' for customise mouse settings, etc. However, not all text commands will have a GUI equivalent - this is especially true of many system administration commands. Other categories are shown in Figs 12 to 17 below. fx mkfs mount ln tar gzip compress pack head tail

-

repartition a disk, plus other functions make a file system on a disk mount a file system (NFS) create a link to a file or directory create/extract an archive file compress a file (gunzip) compress a file (uncompress). Different format from gzip. - a further compression method (eg. used with man pages and release notes) - show the first few lines in a file - show the last few lines in a file

Figure 12. File system manipulation commands.

The tar command is another example where slight differences between UNIX variants exist with respect to default settings. However, command options can always be used to resolve such differences.

hinv uname gfxinfo sysinfo gmemusage ps top kill killall osview startconsole

-

show hardware inventory (SGI specific) show OS version show graphics hardware information (SGI-specific) print system ID (SGI-specific) show current memory usage display a snapshot of running process information constantly updated process list (GUI: gr_top) shutdown a process shutdown a group of processes system resource usage (GUI: gr_osview) system console, a kind of system monitoring xterm which applications will echo messages into

Figure 13. System Information and Process Management Commands.

inst swmgr versions

- install software (text-based) - GUI interface to inst (the preferred method; easier to use) - show installed software

Figure 14. Software Management Commands.

cc, CC, gcc make xmkmf lint cvd

- compile program (further commands may exist for other languages) - run program compilation script - Use imake on an Imakefile to create vendor-specific make file - check a C program for errors/bugs - CASE tool, visual debugger for C programs (SGI specific)

Figure 15. Application Development Commands.

relnotes man insight infosearch

-

software release notes (GUI: grelnotes) manual pages (GUI: xman) online books searchable interface to the above three (IRIX 6.5 and later)

Figure 16. Online Information Commands (all available from the 'Toolchest')

telnet ftp

- open communication link - file transfer

ping traceroute nslookup finger

-

send test packets display traced route to remote host translate domain name into IP address probe remote host for user information

Figure 17. Remote Access Commands.

This is not a complete list! And do not be intimidated by the apparent plethora of commands. An admin won't use most of them at first. Many commands are common to any UNIX variant, while those that aren't (eg. hinv) probably have equivalent commands on other UNIX platforms. Shells can be displayed in different types of window, eg. winterm, xterm. xterms comply with the X11 standard and offer a wider range of features. xterms can be displayed on remote displays, as can any X-based application (this includes just about every program one ever uses). Security note: the remote system must give permission or be configured to allow remote display (xhost command). If one is accessing a UNIX system via an older text-only terminal (eg. VT100) then the shell operates in 'terminal' mode, where the particular characteristics of the terminal in use determine how the shell communicates with the terminal (details of all known terminals are stored in the /usr/lib/terminfo directory). Shells shown in visual windows (xterms, winterms, etc.) operate a form of terminal emulation that can be made to exactly mimic a basic text-only terminal if required. Tip: if one ever decides to NFS-mount /usr/lib to save space (thus normally erasing the contents of /usr/lib on the local disk), it is wise to at least leave behind the terminfo directory on the local disk's /usr/lib; thus, should one ever need to logon to the system when /usr/lib is not mounted, terminal communication will still operate normally.

The lack of a fundamental built-in shell environment in WindowsNT is one of the most common criticisms made by IT managers who use NT. It's also why many high-level companies such as movie studios do not use NT, eg. no genuine remote administration makes it hard to manage clusters of several dozen systems all at once, partly because different systems may be widely dispersed in physical location but mainly because remote administration makes many tasks considerably easier and more convenient.

Regular Expressions and Metacharacters. Shell commands can employ regular expressions and metacharacters which can act as a means for referencing large numbers of files or directories, or other useful shortcuts. Regular expressions are made up of a combination of alphanumeric characters and a series of punctuation characters that have special meaning to the shell. These punctuation characters are called metacharacters when they are used for their special meanings with shell commands.

The most common metacharacter is the wildcard '*', used to reference multiple files and/or directories, eg.: Dump the contents of all files in the current directory to the display: cat *

Remove all object files in the current directory: rm *.o

Search all files ending in .txt for the word 'Alex': grep Alex *.txt

Print all files beginning with 'March' and ending in '.txt': lp March*.txt

Print all files beginning with 'May': lp May*

Note that it is not necessary to use 'May*.*' - this is because the dot is just another character that can be a valid part of any UNIX file name at any position, ie. a UNIX file name may include multiple dots. For example, the Blender shareware animation program archive file is called: blender1.56_SGI_6.2_ogl.tar.gz

By contrast, DOS has a fixed file name format where the dot is a rigid aspect of any file name. UNIX file names do not have to contain a dot character, and can even contain spaces (though such names can confuse the shell unless one encloses the entire name in quotes ""). Other useful metacharacters relate to executing previously entered commands, perhaps with modification, eg. the '!' is used to recall a previous command, as in: !! !grep

- Repeat previous command - Repeat the last command which began with 'grep'

For example, an administrator might send 20 test packets to a remote site to see if the remote system is active: ping -c 20 www.sgi.com

Following a short break, the administrator may wish to run the same command again, which can be done by entering '!!'. Minutes later, after entering other commands, the admin might want to run the last ping test once more, which is easily possible by entering '!ping'. If no other command had since been entered beginning with 'p', then even just '!p' would work.

The '^' character can be used to modify the previous command, eg. suppose I entered: grwp 'some lengthy search string or whatever' *

grep has been spelled incorrectly here, so an error is given ('gwrp: Command not found'). Instead of typing the whole line again, I could enter: ^wê

The shell searches the previous command for the first appearance of 'w', replaces that letter with 'e', displays the newly formed command as a means of confirmation and then executes the command. Note: the '^' operator can only search for the first occurrence of the character or string to be changed, ie. in the above example, the word 'whatever' is not changed to 'ehatever'. The parameter to search for, and the pattern to replace any targets found, can be any standard regular expression, ie. a valid sequence of ASCII characters. In the above example, entering '^grwp^grep^' would have had the same effect, though is unnecessarily verbose. Note that characters such as '!' and '^' operate entirely within the shell, ie. they are not 'memorised' as discrete commands. Thus, within a tcsh, using the Up-Arrow key to recall the previous command after the '^wê' command sequence does not show any trace of the '^wê' action. Only the corrected, executed command is shown. Another commonly used character is the '&' symbol, normally employed to control whether or not a process executed from with a shell is run in the foreground or background. As explained in the UNIX history, UNIX can run many processes at once. Processes employ a parental relationship whereby a process which creates a new process (eg. a shell running a program) is said to be creating a child process. The act of creating a new process is called forking. When running a program from within a shell, the prompt may not come back after the command is entered - this means the new process is running in 'foreground', ie. the shell process is suspended until such time as the forked process terminates. In order to run the process in background, which will allow the shell process to carry on as before and still be used, the '&' symbol must be included at the end of the command. For example, the 'xman' command normally runs in the foreground: enter 'xman' in a shell and the prompt does not return; close the xman program, or type CTRL-C in the shell window, and the shell prompt returns. This effectively means the xman program is 'tied' to the process which forked it, in this case the shell. If one closes the shell completely (eg. using the top-left GUI button, or a kill command from a different shell) then the xman window vanishes too. However, if one enters: xman &

then the xman program is run in the 'background', ie. the shell prompt returns immediately (note the space is optional, ie. 'xman&' is also valid). This means the xman session is now independent of the process which forked it (the shell) and will still exist even if the shell is closed.

Many programs run in the background by default, eg. swmgr (install system software). The 'fg' command can be used to bring any process into the foreground using the unqiue process ID number which every process has. With no arguments, fg will attempt to bring to the foreground the most recent process which was run in the background. Thus, after entering 'xman&', the 'fg' command on its will make the shell prompt vanish, as if the '&' symbol had never been used. A process currently running in the foreground can be deliberately 'suspended' using the CTRL-Z sequence. Try running xman in the foreground within a shell and then typing CTRL-Z - the phrase 'suspended' is displayed and the prompt returns, showing that the xman process has been temporarily halted. It still exists, but is frozen. Try using the xman program at this point: notice that the menus cannot be accessed and the window overlay/underlay actions are not dealt with anymore. Now go back to the shell and enter 'fg' - the xman program is brought back into the foreground and begins running once more. As a final example, try CTRL-Z once more, but this time enter 'bg'. Now the xman process is pushed fully into the background. Thus, if one intends to run a program in the background but forgets to include the '&' symbol, then one can use CTRL-Z followed by 'bg' to place the process in the background. Note: it is worth mentioning at this point an example of how I once observed Linux to be operating incorrectly. This example, seen in 1997, probably wouldn't happen today, but at the time I was very surprised. Using a csh shell on a PC running Linux, I ran the xedit editor in the background using: xedit&

Moments later, I had cause to shutdown the relevant shell, but the xedit session terminated as well, which should not have happened since the xedit process was supposed to be running in background. Exactly why this happened I do not know - presumably there was a bug in the way Linux handled process forking which I am sure has now been fixed. However, in terms of how UNIX is supposed to work, it's a bug which should not have been present. Actually, since many shells such as tcsh allow one to recall previous commands using the arrow keys, and to edit such commands using Alt/CTRL key combinations and other keys, the need to use metacharacter such as '!' and '^' is lessened. However, they're useful to know in case one encounters a different type of shell, perhaps as a result of a telnet session to a remote site where one may not have any choice over which type of shell is used.

Standard Input (stdin), Standard Output (stdout), Standard Error (stderr). As stated earlier, everything in UNIX is basically treated as a file. This even applies to the concept of where output from a program goes to, and where the input to a program comes from. The relevant files, or text data streams, are called stdin and stdout (standard 'in', standard 'out'). Thus, whenever a command produces a visible output in a shell, what that command is actually doing is sending its output to the file handle known as stdout. In the case of the user typing commands in a shell, stdout is defined to be the display which the user sees.

Similarly, the input to a command comes from stdin which, by default, is the keyboard. This is why, if you enter some commands on their own, they will appear to do nothing at first, when in fact they are simply waiting for input from the stdin stream, ie. the keyboard. Enter 'cat' on its own and see what happens; nothing at first, but then enter any text sequence - what you enter is echoed back to the screen, just as it would be if cat was dumping the contents of a file to the screen. This stdin input stream can be temporarily redefined so that a command takes its input from somewhere other than the keyboard. This is known as 'redirection'. Similarly, the stdout stream can be redirected so that the output goes somewhere other than the display. The '' symbols are used for data redirection. For example: ps -ef > file

This runs the ps command, but sends the output into a file. That file could then be examined with cat, more, or loaded into an editor such as nedit or jot. Try: cat > file

You can then enter anything you like until such time as some kind of termination signal is sent, either CTRL-D which acts to end the text stream, or CTRL-C which stops the cat process. Type 'hello', press Enter, then press CTRL-D. Enter 'cat file' to see the file's contents. A slightly different form of output redirection is '>>' which appends a data stream to the end of an existing file, rather than completely overwriting its current contents. Enter: cat >> file

and type 'there!' followed by Enter and then CTRL-D. Now enter 'cat file' and you will see: % cat file hello there!

By contrast, try the above again but with the second operation also using the single '>' operator. This time, the files contents will only be 'there!'. And note that the following has the same effect as 'cat file' (why?): cat < file

Anyone familiar with C++ programming will recognise this syntax as being similar to the way C++ programs display output. Input and output redirection is used extensively by system shell scripts. Users and administrators can use these operators as a quick and convenient way for managing program input and output. For example, the output from a find command could be redirected into a file for later

examination. I often use 'cat > whatever' as a quick and easy way to create a short file without using an editor. Error messages from programs and commands are also often sent to a different output stream called stderr - by default, stderr is also the relevant display window, or the Console Window if one exists on-screen. The numeric file handles associated with these three text streams are: 0 1 2

- stdin - stdout - stderr

These numbers can be placed before the < and > operators to select a particular stream to deal with. Examples of this are given in the notes on shell script programming (Day 2).

The '&&' combination allows one to chain commands together so that each command is only executed if the preceding command was successful, eg.: run_my_prog_which_takes_hours > results && lp results

In this example, some arbitrary program is executed which is expected to take a long time. The program's output is redirected into a file called results. If and only if the program terminates successfully will the results file be sent to the default printer by the lp program. Note: any error encountered by the program will also have the error message stored in the results file. One common use of the && sequence is for on-the-spot backups: cd /home && tar cv . && eject

This sequence changes directory to the /home area, archives the contents of /home to DAT and ejects the DAT tape once the archive process has completed. Note that the eject command without any arguments will search for a default removable media device, so this example assumes there is only one such device, a DAT drive, attached to the system. Otherwise, one could use 'eject /dev/tape' to be more specific.

The semicolon can also be used to chain commands together, but in a manner which does not require each command to be successful in order for the next command to be executed, eg. one could run two successive find commands, searching for different types of file, like this (try executing this command in the directory /mapleson/public_html/sgi): find . -name "*.gz" -print; find . -name "*.mpg" -print

The output given is: ./origin/techreport/compcon97_dv.pdf.gz

./origin/techreport/origin_chap7.pdf.gz ./origin/techreport/origin_chap6.pdf.gz ./origin/techreport/origin_chap5.pdf.gz ./origin/techreport/origin_chap4.pdf.gz ./origin/techreport/origin_chap3.pdf.gz ./origin/techreport/origin_chap2.pdf.gz ./origin/techreport/origin_chap1.5.pdf.gz ./origin/techreport/origin_chap1.0.pdf.gz ./origin/techreport/compcon_paper.pdf.gz ./origin/techreport/origin_techrep.pdf.tar.gz ./origin/techreport/origin_chap1-7TOC.pdf.gz ./pchall/pchal.ps.gz ./o2/phase/phase6.mpg ./o2/phase/phase7.mpg ./o2/phase/phase4.mpg ./o2/phase/phase5.mpg ./o2/phase/phase2.mpg ./o2/phase/phase3.mpg ./o2/phase/phase1.mpg ./o2/phase/phase8.mpg ./o2/phase/phase9.mpg

If one changes the first find command so that it will give an error, the second find command still executes anyway: % find /tmp/gurps -name "*.gz" -print ; find . -name "*.mpg" -print cannot stat /tmp/gurps No such file or directory ./o2/phase/phase6.mpg ./o2/phase/phase7.mpg ./o2/phase/phase4.mpg ./o2/phase/phase5.mpg ./o2/phase/phase2.mpg ./o2/phase/phase3.mpg ./o2/phase/phase1.mpg ./o2/phase/phase8.mpg ./o2/phase/phase9.mpg

However, if one changes the ; to && and runs the sequence again, this time the second find command will not execute because the first find command produced an error: % find /tmp/gurps -name "*.gz" -print && find . -name "*.mpg" -print cannot stat /tmp/gurps No such file or directory

As a final example, enter the following: find /usr -name "*.htm*" -print & find /usr -name "*.rgb" -print &

This command runs two separate find processes, both in the background at the same time. Unlike the previous examples, the output from each command is displayed first from one, then from the other, and back again in a non-deterministic manner, as and when matching files are located by each process. This is clear evidence that both processes are running at the same time. To shut

down the processes, either use 'killall find' or enter 'fg' followed by the use of CTRL-C twice (or one could use kill with the appropriate process IDs, identifiable using 'ps -ef | grep find'). When writing shell script files, the ; symbol is most useful when one can identify commands which do not depend on each other. This symbol, and the other symbols described here, are heavily used in the numerous shell script files which manage many aspects of any modern UNIX OS. Note: if non-dependent commands are present in a script file or program, this immediately allows one to imagine the idea of a multi-threaded OS, ie. an OS which can run many processes in parallel across multiple processors. A typical example use of such a feature would be batch processing scripts for image processing of medical data, or scripts that manage database systems, financial accounts, etc.

References: 1. HP-UX/SUN Interoperability Cookbook, Version 1.0, Copyright 1994 Hewlett-Packard Co.: 2. http://www.hp-partners.com/ptc_public/techsup/SunInterop/

comp.sys.hp.hpux FAQ, Copyright 1995 by Colin Wynd: http://hpux.csc.liv.ac.uk/hppd/FAQ/

3. Department of Computer Science and Electrical Engineering, Heriot Watt University, Riccarton Campus, Edinburgh, Scotland: 4. 5. http://www.cee.hw.ac.uk/

Day 1: Part 3:

File ownership and access permissions. Online help (man pages, etc.)

UNIX Fundamentals: File Ownership

UNIX has the concept of file 'ownership': every file has a unique owner, specified by a user ID number contained in /etc/passwd. When examining the ownership of a file with the ls command, one always sees the symbolic name for the owner, unless the corresponding ID number does not exist in the local /etc/passwd file and is not available by any system service such as NIS. Every user belongs to a particular group; in the case of the SGI system I run, every user belongs to either the 'staff' or 'students' group (note that a user can belong to more than one group, eg. my network has an extra group called 'projects'). Group names correspond to unique group IDs and are listed in the /etc/group file. When listing details of a file, usually the symbolic group name is shown, as long as the group ID exists in the /etc/group file, or is available via NIS, etc. For example, the command: ls -l /

shows the full details of all files in the root directory. Most of the files and directories are owned by the root user, and belong to the group called 'sys' (for system). An exception is my home account directory /mapleson which is owned by me. Another example command: ls -l /home/staff

shows that every staff member owns their particular home directory. The same applies to students, and to any user which has their own account. The root user owns the root account (ie. the root directory) by default. The existence of user groups offers greater flexibility in how files are managed and the way in which users can share their files with other users. Groups also offer the administrator a logical way of managing distinct types of user, eg. a large company might have several groups: accounts clerical investors management security

The admin decides on the exact names. In reality though, a company might have several internal systems, perhaps in different buildings, each with their own admins and thus possibly different group names.

UNIX Fundamentals: Access Permissions Every file also has a set of file 'permissions'; the file's owner can set these permissions to alter who can read, write or execute the file concerned. The permissions for any file can be examined using the ls command with the -l option, eg.: % ls -l /etc/passwd -rw-r--r-1 root uuugggooo

owner

sys group

1306 Jan 31 17:07

/etc/passwd

size

name

date

mod

Each file has three sets of file access permissions (uuu, ggg, ooo), relating to:   

the files owner, ie. the 'user' field the group which the file's owner belongs to the 'rest of the world' (useful for systems with more than one group)

This discussion refers to the above three fields as 'user', 'group' and 'others'. In the above example, the three sets of permissions are represented by field shown as uuugggooo, ie. the main system password file can be read by any user that has access to the relevant host, but can only be modified by the root user. The first access permission is separate and is shown as a 'd' if the file is a directory, or 'l' if the file is a link to some other file or directory (many examples of this can be found in the root directory and in /etc).

Such a combination of options offers great flexibility, eg. one can have private email (user-only), or one can share documents only amongst one's group (eg. staff could share exam documents, or students could share files concerning a Student Union petition), or one can have files that are accessible by anyone (eg. web pages). The same applies to directories, eg. since a user's home directory is owned by that user, an easy way for a user to prevent anyone else from accessing their home directory is to remove all read and execute permissions for groups and others. File ownership and file access permissions are a fundamental feature of every UNIX file, whether that file is an ordinary file, a directory, or some kind of special device file. As a result, UNIX as an OS has inherent built-in security for every file. This can lead to problems if the wrong permissions are set for a file by mistake, but assuming the correct permissions are in place, a file's security is effectively secured. Note that no non-UNIX operating system for PCs yet offers this fundamental concept of fileownership at the very heart of the OS, a feature that is definitely required for proper security. This is largely why industrial-level companies, military, and government institutions do not use NT systems where security is important. In fact, only Cray's Unicos (UNIX) operating system passes all of the US DoD's security requirements.

Relevant Commands: chown - change file ownership chgrp - change group status of a file chmod

- change access permissions for one or more files

For a user to alter the ownership and/or access permissions of a file, the user must own that file. Without the correct ownership, an error is given, eg. assuming I'm logged on using my ordinary 'mapleson' account: % chown mapleson var var - Operation not permitted % chmod go+w /var chmod() failed on /var: Operation not permitted % chgrp staff /var /var - Operation not permitted

All of these operations are attempting to access files owned by root, so they all fail. Note: the root user can access any file, no matter what ownership or access permissions have been set (unless a file owned by root has had its read permission removed). As a result, most hacking attempts on UNIX systems revolve around trying to gain root privileges. Most ordinary users will rarely use the chown or chgrp commands, but administrators may often use them when creating accounts, installing custom software, writing scripts, etc. For example, an admin might download some software for all users to use, installing it somewhere in /usr/local. The final steps might be to change the ownership of every newly installed file so ensure that it is owned by root, with the group set to sys, and then to use chmod to ensure any newly installed executable programs can be run by all users, and perhaps to restrict access to original source code. Although chown is normally used to change the user ID of a file, and chgrp the group ID, chown can actually do both at once. For example, while acting as root: yoda 1# echo hello > file yoda 2# ls -l file -rw-r--r-1 root sys yoda 3# chgrp staff file yoda 4# chown mapleson file yoda 5# ls -l file -rw-r--r-1 mapleson staff yoda 6# /bin/rm file yoda 7# echo hello > file yoda 8# ls -l file -rw-r--r-1 root sys

6 May

2 21:50 file

6 May

2 21:50 file

6 May

2 21:51 file

yoda 9# chown mapleson.staff file yoda 10# ls -l file -rw-r--r-1 mapleson staff

6 May

2 21:51 file

Figure 18. Using chown to change both user ID and group ID.

Changing File Permissions: Examples. The general syntax of the chmod command is: chmod [-R]

Where defines the new set of access permissions. The -R option is optional (denoted by square brackets []) and can be used to recursively change the permissions for the contents of a directory. can be defined in two ways: using Octal (base-8) numbers or by using a sequence of meaningful symbolic letters. This discussion covers the symbolic method since the numeric method (described in the man page for chmod) is less intuitive to use. I wouldn't recommend an admin use Octal notation until greater familiarity with how chmod works is attained. can be summarised as containing three parts: U operator P

where U is one or more characters corresponding to user, group, or other; operator is +, -, or =, signifying assignment of permissions; and P is one or more characters corresponding to the permission mode. Some typical examples would be: chmod go-r file chmod ugo+rx file chmod ugo=r file

- remove read permission for groups and others - add read/execute permission for all - set permission to read-only for all users

A useful abbreviation in place of 'ugo' is 'a' (for all), eg.: chmod a+rx file chmod a=r file

- give read and execute permission for all - set to read-only for all

For convenience, if the U part is missing, the command automatically acts for all, eg.: chmod -x file chmod =r file

- remove executable access from everyone - set to read-only for everyone

though if a change in write permission is included, said change only affects user, presumably for better security:

chmod +w file chmod +rwx file user chmod -rw file

- add write access only for user - add read/execute for all, add write only for - remove read from all, remove write from user

Note the difference between the +/- operators and the = operator: + and - add or take away from existing permissions, while = sets all the permissions to a particular state, eg. consider a file which has the following permissions as shown by ls -l: -rw-------

The command 'chmod +rx' would change the permissions to: -rwxr-xr-x

while the command 'chmod =rx' would change the permissions to: -r-xr-xr-x

ie. the latter command has removed the write permission from the user field because the rx permissions were set for everyone rather than just added to an existing state. Further examples of possible permissions states can be found in the man page for ls.

A clever use of file ownership and groups can be employed by anyone to 'hand over' ownership of a file to another user, or even to root. For example, suppose user alex arranges with user sam to leave a new version of a project file (eg. a C program called project.c) in the /var/tmp directory of a particular system at a certain time. User alex not only wants sam to be able to read the file, but also to remove it afterwards, eg. move the file to sam's home directory with mv. Thus, alex could perform the following sequence of commands: cp project.c /var/tmp cd /var/tmp chmod go-rwx project.c chown sam project.c

-

copy the file change directory remove all access for everyone else change ownership to sam

Figure 19. Handing over file ownership using chown.

Fig 19 assumes alex and sam are members of the same group, though an extra chgrp command could be used before the chown if this wasn't the case, or a combinational chown command used to perform both changes at once. After the above commands, alex will not be able to read the project.c file, or remove it. Only sam has any kind of access to the file. I once used this technique to show students how they could 'hand-in' project documents to a lecturer in a way which would not allow students to read each others' submitted work.

Note: it can be easy for a user to 'forget' about the existence of hidden files and their associated permissions. For example, someone doing some confidential movie editing might forget or not even know that temporary hidden files are often created for intermediate processing. Thus, confidential tasks should always be performed by users inside a sub-directory in their home directory, rather than just in their home directory on its own. Experienced users make good use of file access permissions to control exactly who can access their files, and even who can change them. Experienced administrators develop a keen eye and can spot when a file has unusual or perhaps unintended permissions, eg.: -rwxrwxrwx

if a user's home directory has permissions like this, it means anybody can read, write and execute files in that directory: this is insecure and was probably not intended by the user concerned. A typical example of setting appropriate access permissions is shown by my home directory: ls -l /mapleson

Only those directories and files that I wish to be readable by anyone have the group and others permissions set to read and execute. Note: to aid security, in order for a user to access a particular directory, the execute permission must be set on for that directory as well as read permission at the appropriate level (user, group, others). Also, only the owner of a file can change the permissions or ownership state for that file (this is why a chown/chgrp sequence must have the chgrp done first, or both at once via a combinational chown).

The Set-UID Flag. This special flag appears as an 's' instead of 'x' in either the user or group fields of a file's permissions, eg.: % ls -l /sbin/su -rwsr-xr-x 1 root

sys

40180 Apr 10 22:12 /sbin/su*

The online book, "IRIX Admin: Backup, Security, and Accounting", states: "When a user runs an executable file that has either of these permissions, the system gives the user the permissions of the owner of the executable file."

An admin might use su to temporarily become root or another user without logging off. Ordinary users may decide to use it to enable colleagues to access their account, but this should be discouraged since using the normal read/write/execute permissions should be sufficient.

Mandatory File Locking. If the 'l' flag is set in a file's group permissions field, then the file will be locked while another user from the same group is accessing the file. For example, file locking allows a user to gather data from multiple users in their own group via a group-writable file (eg. petition, questionnaire, etc.), but blocks simultaneous file-write access by multiple users - this prevents data loss which might otherwise occur via two users writing to a file at the same time with different versions of the file.

UNIX Fundamentals: Online Help From the very early days of UNIX, online help information was available in the form of manual pages, or 'man' pages. These contain an extensive amount of information on system commands, program subroutines, system calls and various general references pages on topics such as file systems, CPU hardware issues, etc. The 'man' command allows one to search the man page database using keywords, but this textbased interface is still somewhat restrictive in that it does not allow one to 'browse' through pages at will and does not offer any kind of direct hyperlinked reference system, although each man pages always includes a 'SEE ALSO' section so that one will know what other man pages are worth consulting. Thus, most modern UNIX systems include the 'xman' command: a GUI interface using X Window displays that allows one to browse through man pages at will and search them via keywords. System man pages are actually divided into sections, a fact which is not at all obvious to a novice user of the man command. By contrast, xman reveals immediately the existence of these different sections, making it much easier to browse through commands. Since xman uses the various X Windows fonts to display information, the displayed text can incorporate special font styling such as italics and bold text to aid clarity. A man page shown in a shell can use bright characters and inverted text, but data shown using xman is much easier to read, except where font spacing is important, eg. enter 'man ascii' in a shell and compare it to the output given by xman (use xman's search option to bring up the man page for ascii). xman doesn't include a genuine hypertext system, but the easy-to-access search option makes it much more convenient to move from one page to another based on the contents of a particular 'SEE ALSO' section. Most UNIX systems also have some form of online book archive. SGIs use the 'Insight' library system which includes a great number of books in electronic form, all written using hypertext techniques. An ordinary user would be expected to begin their learning process by using the online books rather than the man pages since the key introductory books guide the user through the basics of using the system via the GUI interface rather than the shell interface.

SGIs also have online release notes for each installed software product. These can be accessed via the command 'grelnotes' which gives a GUI interface to the release notes archive, or one can use relnotes in a shell or terminal window. Other UNIX variants probably also have a similar information resource. Many newer software products also install local web pages as a means of providing online information, as do 3rd-party software distributions. Such web pages are usually installed somewhere in /usr/local, eg. /usr/local/doc. The URL format 'file:/file-path' is used to access such pages, though an admin can install file links with the ln command so that online pages outside of the normal file system web area (/var/www/htdocs on SGIs) are still accessible using a normal http format URL. In recent years, there have been moves to incorporate web technologies into UNIX GUI systems. SGI began their changes in 1996 (a year before anyone else) with the release of the O2 workstation. IRIX 6.3 (used only with O2) included various GUI features to allow easy integration between the existing GUI and various web features, eg. direct iconic links to web sites, and using Netscape browser window interface technologies for system administration, online information access, etc. Most UNIX variants will likely have similar features; on SGIs with the latest OS version (IRIX 6.5), the relevant system service is called InfoSearch - for the first time, users have a single entry point to the entire online information structure, covering man pages, online books and release notes. Also, extra GUI information tools are available for consulting "Quick Answers" and "Hints and Shortcuts". These changes are all part of a general drive on UNIX systems to make them easier to use. Unlike the xman resource, viewing man pages using InfoSearch does indeed hyperlink references to other commands and resources throughout each man page. This again enhances the ability of an administrator, user or application developer to locate relevant information.

Summary: UNIX systems have a great deal of online information. As the numerous UNIX variants have developed, vendors have attempted to improve the way in which users can access that information, ultimately resulting in highly evolved GUI-based tools that employ standard windowing technologies such as those offered by Netscape (so that references may include direct links to web sites, ftp sites, etc.), along with hypertext techniques and search mechanisms. Knowing how to make the best use of available documentation tools can often be the key to effective administration, ie. locating answers quickly as and when required.

Detailed Notes for Day 2 (Part 1) UNIX Fundamentals: System Identity, IP Address, Domain Name, Subdomain.

Every UNIX system has its own unique name, which is the means by which that machine is referenced on local networks and beyond, eg. the Internet. The normal term for this name is the local 'host' name. Systems connected to the Internet employ naming structures that conform to existing structures already used on the Internet. A completely isolated network can use any naming scheme. Under IRIX, the host name for a system is stored in the /etc/sys_id file. The name may be up to 64 alphanumeric characters in length and can include hyphens and periods. Period characters '.' are not part of the real name but instead are used to separate the sequence into a domain-name style structure (eg. www.futuretech.vuurwerk.nl). The SGI server's host name is yoda, the fullyqualified version of which is written as yoda.comp.uclan.ac.uk. The choice of host names is largely arbitrary, eg. the SGI network host names are drawn from my video library (I have chosen names designed to be short without being too uninteresting). On bootup, a system's /etc/rc2.d/S20sysetup script reads its /etc/sys_id file to determine the local host name. From then onwards, various system commands and internal function calls will return that system name, eg. the 'hostname' and 'uname' commands (see the respective man pages for details). Along with a unique identity in the form of a host name, a UNIX system has its own 32bit Internet Protocol (IP) address, split for convenience into four 8bit integers separated by periods, eg. yoda's IP address is 193.61.250.34, an address which is visible to any system anywhere on the Internet. IP is the network-level communications protocol used by Internet systems and services. Various extra options can be used with IP layer communications to create higher-level services such as TCP (Transmission Control Protocol). The entire Internet uses the TCP/IP protocols for communication. A system which has more than one network interface (eg. multiple Ethernet ports) must have a unique IP address for each port. Special software may permit a system to have extra addresses, eg. 'IP Aliasing', a technique often used by an ISP to provide a more flexible service to its customers. Note: unlike predefined Ethernet addresses (every Ethernet card has its own unique address), a system's IP address is determined by the network design, admin personnel, and external authorities. Conceptually speaking, an IP address consists of two numbers: one represents the network while the other represents the system. In order to more efficiently make use of the numerous possible address 'spaces', four classes of addresses exist, named A, B, C and D. The first few bits of an address determines its class:

Class A B C D use]

Initial Binary Bit Field 0 10 110 1110

No. of Bits for the Network Number

No. of Bits for The Host Number

7 24 14 16 21 8 [special 'multicast' addresses for internal network

Figure 20. IP Address Classes: bit field and width allocations.

This system allows the Internet to support a range of different network sizes with differing maximum limits on the number of systems for each type of network: Class A No. of networks: No. of systems each:

128 16777214

Class B

Class C

Class D

16384 65534

2097152 254

[multicast] [multicast]

Figure 21. IP Address Classes: supported network types and sizes.

The numbers 0 and 255 are never used for any host. These are reserved for special uses. Note that a network which will never be connected to the Internet can theoretically use any IP address and domain/subdomain configuration.

Which class of network an organisation uses depends on how many systems it expects to have within its network. Organisations are allocated IP address spaces by Internet Network Information Centers (InterNICs), or by their local ISP if that is how they are connected to the Internet. An organisation's domain name (eg. uclan.ac.uk) is also obtained from the local InterNIC or ISP. Once a domain name has been allocated, the organisation is free to setup its own network subdomains such as comp.uclan.ac.uk (comp = Computing Department), within which an individual host would be yoda.comp.uclan.ac.uk. A similar example is Heriot Watt University in Edinburgh (where I studied for my BSc) which has the domain hw.ac.uk, with its Department of Computer Science and Electrical Engineering using a subdomain called cee.hw.ac.uk, such that a particular host is www.cee.hw.ac.uk (see Appendix A for an example of what happens when this methodology is not followed correctly). UCLAN uses Class C addresses, with example address spaces being 193.61.255 and 193.61.250. A small number of machines in the Computing Department use the 250 address space, namely the SGI server's external Ethernet port at 193.61.250.34, and the NT server at 193.61.250.35 which serves the NT network in Ve27. Yoda has two Ethernet ports; the remaining port is used to connect to the SGI Indys via a hub this port has been defined to use a different address space, namely 193.61.252. The machines' IP addresses range from 193.61.252.1 for yoda, to 193.61.252.23 for the admin Indy; .20 to .22 are kept available for two HP systems which are occasionally connected to the network, and for a future plan to include Apple Macs on the network. The IP addresses of the Indys using the 252 address space cannot be directly accessed outside the SGI network or, as the jargon goes, 'on the other side' of the server's Ethernet port which is being used for the internal network. This automatically imposes a degree of security at the physical level. IP addresses and host names for systems on the local network are brought together in the file /etc/hosts. Each line in this file gives an IP address, an official hostname and then any name aliases which represent the same system, eg. yoda.comp.uclan.ac.uk is also known as www.comp.uclan.ac.uk, or just yoda, or www, etc. When a system is first booted, the ifconfig command uses the /etc/hosts file to assign addresses to the various available Ethernet network interfaces. Enter 'more /etc/hosts' or 'nedit /etc/hosts' to examine the host names file for the particular system you're using. NB: due to the Internet's incredible expansion in recent years, the world is actually beginning to run out of available IP addresses and domain names; at best, existing top-level domains are being heavily overused (eg. .com, .org, etc.) and the number of allocatable network address spaces is rapidly diminishing, especially if one considers the possible expansion of the Internet into Russia, China, the Far East, Middle East, Africa, Asia and Latin America. Thus, there are moves afoot to change the Internet so that it uses 128bit instead of 32bit IP addresses. When this will happen is unknown, but such a change would solve the problem.

Special IP Addresses Certain reserved IP addresses have special meanings, eg. the address 127.0.0.1 is known as the 'loopback' address (equivalent host name 'localhost') and always refers to the local system which one happens to be using at the time. If one never intends to connect a system to the Internet, there's no reason why this default IP address can't be left as it is with whatever default name assigned to it in the /etc/hosts file (SGIs always use the default name, "IRIS"), though most people do change their system's IP address and host name in case, for example, they have to connect their system to the network used at their place of work, or to provide a common naming scheme, group ID setup, etc. If a system's IP address is changed from the default 127.0.0.1, the exact procedure is to add a new line to the /etc/hosts file such that the system name corresponds to the information in /etc/sys_id. One must never remove the 127.0.0.1 entry from the /etc/hosts file or the system will not work properly. The important lines of the /etc/hosts file used on the SGI network are shown in Fig 22 below (the appearance of '[etc]' in Fig 22 means some text has been clipped away to aid clarity). # This entry must be present or the system will not work. 127.0.0.1 localhost # SGI Server. Challenge S. 193.61.252.1 yoda.comp.uclan.ac.uk yoda www.comp.uclan.ac.uk www [etc] # Computing Services router box link. 193.61.250.34 gate-yoda.comp.uclan.ac.uk gate-yoda # SGI Indys in Ve24, except milamber which is in Ve47. 193.61.252.2 193.61.252.3 193.61.252.4 193.61.252.5 193.61.252.6 193.61.252.7 193.61.252.8 193.61.252.9 193.61.252.10 193.61.252.11 193.61.252.12 193.61.252.13 193.61.252.14 193.61.252.15 193.61.252.16 193.61.252.17 193.61.252.18 193.61.252.19 193.61.252.23

akira.comp.uclan.ac.uk akira ash.comp.uclan.ac.uk ash cameron.comp.uclan.ac.uk cameron chan.comp.uclan.ac.uk chan conan.comp.uclan.ac.uk conan gibson.comp.uclan.ac.uk gibson indiana.comp.uclan.ac.uk indiana leon.comp.uclan.ac.uk leon merlin.comp.uclan.ac.uk merlin nikita.comp.uclan.ac.uk nikita ridley.comp.uclan.ac.uk ridley sevrin.comp.uclan.ac.uk sevrin solo.comp.uclan.ac.uk solo spock.comp.uclan.ac.uk spock stanley.comp.uclan.ac.uk stanley warlock.comp.uclan.ac.uk warlock wolfen.comp.uclan.ac.uk wolfen woo.comp.uclan.ac.uk woo milamber.comp.uclan.ac.uk milamber

[etc] Figure 22. The contents of the /etc/hosts file used on the SGI network.

One example use of the localhost address is when a user accesses a system's local web page structure at: http://localhost/

On SGIs, such an address brings up a page about the machine the user is using. For the SGI network, the above URL always brings up a page for yoda since /var/www is NFS-mounted from yoda. The concept of a local web page structure for each machine is more relevant in company Intranet environments where each employee probably has her or his own machine, or where different machines have different locally stored web page information structures due to, for example, differences in available applications, etc.

The BIND Name Server (DNS). If a site is to be connected to the Internet, then it should use a name server such as BIND (Berkeley Internet Name Domain) to provide an Internet Domain Names Service (DNS). DNS is an Internet-standard name service for translating hostnames into IP addresses and vice-versa. A client machine wishing to access a remote host executes a query which is answered by the DNS daemon, called 'named'. Yoda runs a DNS server and also a Proxy server, allowing the machines in Ve24 to access the Internet via Netscape (telnet, ftp, http, gopher and other services can be used). Most of the relevant database configuration files for a DNS setup reside in /var/named. A set of example configuration files are provided in /var/named/Examples - these should be used as templates and modified to reflect the desired configuration. Setting up a DNS database can be a little confusing at first, thus the provision of the Examples directory. The files which must be configured to provide a functional DNS are: /etc/named.boot /var/named/root.cache /var/named/named.hosts /var/named/named.rev /var/named/localhost.rev

If an admin wishes to use a configuration file other than /etc/named.boot, then its location should be specified by creating a file called /etc/config/named.options with the following contents (or added to named.options if it already exists): -b some-other-boot-file

After the files in /var/named have been correctly configured, the chkconfig command is used to set the appropriate variable file in /etc/config: chkconfig named on

The next reboot will activate the DNS service. Once started, named reads initial configuration information from the file /etc/named.boot, such as what kind of server it should be, where the DNS database files are located, etc. Yoda's named.boot file looks like this: ; ; Named boot file for yoda.comp.uclan.ac.uk. ; directory /var/named cache primary primary primary primary forwarders

. comp.uclan.ac.uk 0.0.127.IN-ADDR.ARPA 252.61.193.IN-ADDR.ARPA 250.61.193.IN-ADDR.ARPA 193.61.255.3 193.61.255.4

root.cache named.hosts localhost.rev named.rev 250.rev

Figure 23. Yoda's /etc/named.boot file.

Looking at the contents of the example named.boot file in /var/named/Examples, the differences are not that great: ; ; boot file for authoritative master name server for Berkeley.EDU ; Note that there should be one primary entry for each SOA record. ; ; sortlist 10.0.0.0 directory

/var/named

; type file

domain

source host/file

cache primary primary primary

. Berkeley.EDU 32.128.IN-ADDR.ARPA 0.0.127.IN-ADDR.ARPA

root.cache named.hosts named.rev localhost.rev

backup

Figure 24. The example named.boot file in /var/named/Examples.

Yoda's file has an extra line for the /var/named/250.rev file; this was an experimental attempt to make Yoda's subdomain accessible outside UCLAN, which failed because of the particular configuration of a router box elsewhere in the communications chain (the intention was to enable students and staff to access the SGI network using telnet from a remote host). For full details on how to configure a typical DNS, see Chapter 6 of the online book, "IRIX Admin: Networking and Mail". A copy of this Chapter has been provided for reference. As an example of how identical DNS is across UNIX systems, see the issue of Network Week [10] which has an article on configuring a typical DNS. Also, a copy of each of Yoda’s DNS files which I had to configure is included for reference. Together, these references should serve as an

adequate guide to configuring a DNS; as with many aspects of managing a UNIX system, learning how someone else solved a problem and then modifying copies of what they did can be very effective. Note: it is not always wise to use a GUI tool for configuring a service such as BIND [11]. It's too easy for ill-tested grandiose software management tools to make poor assumptions about how an admin wishes to configure a service/network/system. Services such as BIND come with their own example configuration files anyway; following these files as a guide may be considerably easier than using a GUI tool which itself can cause problems created by whoever wrote the GUI tool, rather than the service itself (in this case BIND).

Proxy Servers A Proxy server acts as a go-between to the outside world, answering client requests for data from the Internet, calling the DNS system to obtain IP addresses based on domain names, opening connections to the Internet perhaps via yet another Proxy server elsewhere (the Ve24 system uses Pipex as the next link in the communications chain), and retrieving data from remote hosts for transmission back to clients. Proxy servers are a useful way of providing Internet access to client systems at the same time as imposing a level of security against the outside world, ie. the internal structure of a network is hidden from the outside world due to the operational methods employed by a Proxy server, rather like the way in which a representative at an auction can act for an anonymous client via a mobile phone during the bidding. Although there are more than a dozen systems in Ve24, no matter which machine a user decides to access the Internet from, the access will always appear to a remote host to be coming from the IP address of the closest proxy server, eg. the University web server would see Yoda as the accessing client. Similarly, I have noticed that when I access my own web site in Holland, the site concerned sees my access as if it had come from the proxy server at Pipex, ie. the Dutch system cannot see 'past' the Pipex Proxy server. There are various proxy server software solutions available. A typical package which is easy to install and configure is the Netscape Proxy Server. Yoda uses this particular system.

Network Information Service (NIS) It is reasonably easy to ensure that all systems on a small network have consistent /etc/hosts files using commands such as rcp. However, medium-sized networks consisting of dozens to hundreds of machines may present problems for administrators, especially if the overall setup consists of several distinct networks, perhaps in different buildings and run by different people. For such environments, a Network Information Service (NIS) can be useful. NIS uses a single system on the network to act as the sole trusted source of name service information - this system is known as the NIS master. Slave servers may be used to which copies of the database on the NIS master are periodically sent, providing backup services should the NIS master system fail.

Client systems locate a name server when required, requesting data based on a domain name and other relevant information.

Unified Name Service Daemon (UNS, or more commonly NSD). Extremely recently, the DNS and NIS systems have been superseded by a new system called the Unified Name Service Daemon, or NSD for short. NSD handles requests for domain information in a considerably more efficient manner, involving fewer system calls, replacing multiple files for older services with a single file (eg. many of the DNS files in /var/named are replaced by a single database file under NSD), and allowing for much larger numbers of entries in data files, etc. However, NSD is so new that even I have not yet had an opportunity to examine properly how it works, or the way in which it correlates to the older DNS and NIS services. As a result, this course does not describe DNS, NIS or NSD in any great detail. This is because, given the rapid advance of modern UNIX OSs, explaining the workings of DNS or NIS would likely be a pointless task since any admin beginning her or his career now is more likely to encounter the newer NSD system which I am not yet comfortable with. Nevertheless, administrators should be aware of the older style services as they may have to deal with them, especially on legacy systems. Thus, though not discussed in these lectures, some notes on a typical DNS setup are provided for further reading [10]. Feel free to login to the SGI server yourself with: rlogin yoda

and examine the DNS and NIS configuration files at your leisure; these may be found in the /var/named and /var/yp directories. Consult the online administration books for further details.

UNIX Fundamentals: UNIX Software Features Software found on UNIX systems can be classified into several types:     

System software: items provided by the vendor as standard. Commercial software: items purchased either from the same vendor which supplied the OS, or from some other commercial 3rd-party. Shareware software: items either supplied with the OS, or downloaded from the Internet, or obtained from some other source such as a cover magazine CD. Freeware software: items supplied in the same manner as Shareware, but using a more open 'conditions of use'. User software: items created by users of a system, whether that user is an admin or an ordinary user.

System Software

Any OS for any system today is normally supplied on a set of CDs. As the amount of data for an OS installation increases, perhaps the day is not far away when vendors will begin using DVDs instead. Whether or not an original copy of OS CDs can be installed on a system depends very much on the particular vendor, OS and system concerned. Any version of IRIX can be installed on an SGI system which supports that particular version of IRIX - this ability to install the OS whether or not one has a legal right to use the software is simply a practice SGI has adopted over the years. SGI could have chosen to make OS installation more difficult by requiring license codes and other details at installation time, but instead SGI chose a different route. What is described here applies only to SGI's IRIX OS. SGI decided some time ago to adopt a strategy of official software and hardware management which makes it extremely difficult to make use of 'pirated' software. The means by which this is achieved is explained in the System Hardware section below, but the end result is a policy where any version IRIX older than the 'current' version is free by default. Thus, since the current release of IRIX is 6.5, one could install IRIX 6.4, 6.3, 6.2 (or any older version) on any appropriate SGI system (eg. installing IRIX 6.2 on a 2nd-hand Indy) without having to worry about legal issues. There's nothing to stop one physically installing 6.5 if one had the appropriate CDs (ie. the software installation tools and CDs do not include any form of installation protection or copy protection), but other factors might make for trouble later on if the user concerned did not apply for a license at a later date, eg. attempting to purchase commercial software and licenses for the latest OS release. It is highly likely that in future years, UNIX vendors will also make their current OSs completely free, probably as a means of combating WindowsNT and other rivals. As an educational site operating under an educational license agreement, UCLAN's Computing Department is entitled to install IRIX 6.5 on any of the SGI systems owned by the Computing Department, though at present most systems use the older IRIX 6.2 release for reasons connected with system resources on each machine (RAM, disk space, CPU power). Thus, the idea of a license can have two meanings for SGIs: 



A theoretical 'legal' license requirement which applies, for example, to the current release of IRIX, namely IRIX 6.5 - this is a legal matter and doesn't physically affect the use of IRIX 6.5 OS CDs. A real license requirement for particular items of software using license codes, obtainable either from SGI or from whatever 3rd-party the software in question was purchased.

Another example of the first type is the GNU licensing system, explained in the 'Freeware Software' section below (what the GNU license is and how it works is fascinatingly unique).

Due to a very early top-down approach to managing system software, IRIX employs a high-level software installation structure which ensures that:  



It is extremely easy to add, remove, or update software, especially using the GUI software tool called Software Manager (swmgr is the text command name which can be entered in a shell). Changes to system software are handled correctly with very few, if any, errors most of the time; 'most' could be defined as 'rarely, if ever, but not never'. A real world example might be to state that I have installed SGI software elements thousands of times and rarely if ever encountered problems, though I have had to deal with some issues on occasion. Software 'patches' (modificational updates to existing software already installed) are handled in such a way as to allow the later removal of said patches if desired, leaving the system in exactly its original state as if the patch had never been installed.

As an example of software installation reliability, my own 2nd-hand Indigo2 at home has been in use since March 1998, was originally installed with IRIX 6.2, updated with patches several times, added to with extra software over the first few months of ownership (mid-1998), then upgraded to IRIX 6.5, added to with large amounts of freeware software, then upgraded to IRIX 6.5.1, then 6.5.2, then 6.5.3, and all without a single software installation error of any kind. In fact, my Indigo2 hasn't crashed or given a single error since I first purchased it. As is typical of any UNIX system which is/was widely used in various industries, most if not all of the problems ever encountered on the Indigo2 system have been resolved by now, producing an incredibly stable platform. In general, the newer the system and/or the newer the software, then the greater number of problems there will be to deal with, at least initially.

Thankfully, OS revisions largely build upon existing code and knowledge. Plus, since so many UNIX vendors have military, government and other important customers, there is incredible pressure to be very careful when planning changes to system or application software. Intensive testing is done before any new version is released into the marketplace (this contrasts completely with Microsoft which deliberately allows the public to test Beta versions of its OS revisions as a means of locating bugs before final release - a very lazy way to handle system testing by any measure). Because patches often deal with release versions of software subsystems, and many software subsystems may have dependencies on other subsystems, the issue of patch installation is the most common area which can cause problems, usually due to unforseen conflicts between individual versions of specific files. However, rigorous testing and a top-down approach to tracking release versions minimises such problems, especially since all UNIX systems come supplied with source code version/revision tracking tools as-standard, eg. SCCS. The latest 'patch CD' can usually be installed automatically without causing any problems, though it is wise for an administrator to check what changes are going to be made before commencing any such installation, just in case. The key to such a high-level software management system is the concept of a software 'subsystem'. SGI has developed a standard means by which a software suite and related files (manual pages, release notes, data, help documents, etc.) are packaged together in a form suitable for installation by the usual software installation tools such as inst and swmgr. Once this mechanism was carefully defined many years ago, insisting that all subsequent official software

releases comply with the same standard ensures that the opportunity for error is greatly minimised, if not eliminated. Sometimes, certain 3rd-party applications such as Netscape can display apparent errors upon installation or update, but these errors are usually explained in accompanying documentation and can always be ignored. Each software subsystem is usually split into several sub-units so that only relevant components need be installed as desired. The sub-units can then be examined to see the individual files which would be installed, and where. When making updates to software subsystems, selecting a newer version of a subsystem automatically selects only the relevant sub-units based on which subunits have already been installed, ie. new items will not automatically be selected. For ease of use, an admin can always choose to execute an automatic installation or removal (as desired), though I often select a custom installation just so that I can see what's going on and learn more about the system as a result. In practice, I rarely need to alter the default behaviour anyway. The software installation tools automatically take care not to overwrite existing configuration files when, for example, installing new versions (ie. upgrades) of software subsystems which have already been installed (eg. Netscape). In such cases, both the old and new configuration files are kept and the user (or admin) informed that there may be a need to decide which of the two files to keep, or perhaps to copy key data from the old file to the new file, deleting the old file afterwards.

Commercial Software A 3rd-party commercial software package may or may not come supplied in a form which complies with any standards normally used by the hardware system vendor. UNIX has a long history of providing a generic means of packaging software and files in an archive which can be downloaded, uncompressed, dearchived, compiled and installed automatically, namely the 'tar.gz' archive format (see the man pages for tar and gzip). Many commercial software suppliers may decide to sell software in this format. This is ok, but it does mean one may not be able to use the usual software management tools (inst/swmgr in the case of SGIs) to later remove the software if desired. One would have to rely on the supplier being kind enough to either provide a script which can be used to remove the software, or at the very least a list of which files get installed where. Thankfully, it is likely that most 3rd-parties will at least try to use the appropriate distribution format for a particular vendor's OS. However, unlike the source vendor, one cannot be sure that the 3rd-party has taken the same degree of care and attention to ensure they have used the distribution format correctly, eg. checking for conflicts with other software subsystems, providing product release notes, etc. Commercial software for SGIs may or may not use the particular hardware feature of SGIs which SGI uses to prevent piracy, perhaps because exactly how it works is probably itself a licensed product from SGI. Details of this mechanism are given in the System Hardware section below.

Shareware Software The concept of shareware is simple: release a product containing many useful features, but which has more advanced features and perhaps essential features limited, restricted, or locked out entirely, eg. being able to save files, or working on files over a particular size. A user can download the shareware version of the software for free. They can test out the software and, if they like it, 'register' the software in order to obtain either the 'full' (ie. complete) version, or some kind of encrypted key or license code that will unlock the remaining features not accessible or present in the shareware version. Registration usually involves sending a small fee, eg. $30, to the author or company which created the software. Commonly, registration results in the author(s) sending the user proper printed and bound documentation, plus regular updates to the registered version, news releases on new features, access to dedicated mailing lists, etc. The concept of shareware has changed over the years, partly due to the influence of the computer game 'Doom' which, although released as shareware in name, actually effectively gave away an entire third of the complete game for free. This was a ground-breaking move which proved to be an enormous success, earning the company which made the game (id Software, Dallas, Texas, USA) over eight million $US and a great deal of respect and loyalty from gaming fans. Never before had a company released shareware software in a form which did not involve deliberately 'restricting' key aspects of the shareware version. As stated above, shareware software is often altered so that, for example, one could load files, work on them, make changes, test out a range of features, but (crucially) not save the results. Such shareware software is effectively not of any practical use on its own, ie. it serves only as a kind of hands-on advertisement for the full version. Doom was not like this at all. One could play an entire third of the game, including over a network against other players. Today, other creative software designers have adopted a similar approach, perhaps the most famous recent example of which is 'Blender' [1], a free 3D rendering and animation program for UNIX and (as of very soon) WindowsNT systems. In its as-supplied form, Blender can be used to do a great deal of work, creating 3D scenes, renderings and animations easily on a par with 3D Studio Max, even though some features in Blender are indeed locked out in the shareware version. However, unlike conventional traditional concepts, Blender does allow one to save files and so can be used for useful work. It has spread very rapidly in the last few months amongst students in educational sites worldwide, proving to be of particular interest to artists and animators who almost certainly could not normally afford a commercial package which might cost hundreds or perhaps thousands of pounds. Even small companies have begun using Blender. However, supplied documentation for Blender is limited. As a 'professional level' system, it is unrealistic to expect to be able to get the best out of it without much more information on how it works and how to use it. Thus, the creators of Blender, a company called NaN based in Holland, makes most of their revenue by offering a very detailed 350 page printed and bound manual for

about $50 US, plus a sequence of software keys which make available the advanced features in Blender. Software distribution concepts such as the above methods used by NaN didn't exist just a few years ago, eg. before 1990. The rise of the Internet, certain games such as Doom, the birth of Linux, and changes in the way various UNIX vendors manage their business have caused a quantum leap in what people think of as shareware. Note that the same caveat stated earlier with respect to software quality also applies to shareware, and to freeware too, ie. such software may or may not use the normal distribution method associated with a particular UNIX platform - in the case of SGIs, the 'inst' format. Another famous example of shareware is the XV [2] image-viewer program, which offers a variety of functions for image editing and image processing (even though its author insists it's really just an image viewer). XV does not have restricted features, but it is an official shareware product which one is supposed to register if one intends to use the program for commercial purposes. However, as is typical with many modern shareware programs, the author stipulates that there is no charge for personal (non-commercial) or educational use.

Freeware Software Unlike shareware software, freeware software is exactly that: completely free. There is no concept of registration, restricted features, etc. at all. Until recently, even I was not aware of the vast amount of free software available for SGIs and UNIX systems in general. There always has been free software for UNIX systems, but as in keeping with other changes by UNIX vendors over the past few years, SGI altered its application development support policy in 1997 to make it much easier for users to make use of freeware on SGI systems. Prior to that time, SGI did not make the system 'header' files (normally kept in /usr/include) publicly available. Without these header files, one could not compile any new programs even if one had a free compiler. So, SGI adopted a new stance whereby the header files, libraries, example source code and other resources are provided free, but its own advanced compiler technologies (the MIPS Pro Compilers) remain commercial products. Immediately, anyone could then write their own applications for SGI systems using the supplied CDs (copies of which are available from SGI's ftp site) in conjunction with free compilation tools such as the GNU compilers. As a result, the 2nd-hand market for SGI systems in the USA has skyrocketed, with extremely good systems available at very low cost (systems which cost 37500 pounds new can now be bought for as little as 500 pounds, even though they can still be better than modern PCs in many respects). It is highly likely that other vendors have adopted similar strategies in recent years (most of my knowledge concerns SGIs). Sun Microsystems made its SunOS free for students some years ago (perhaps Solaris too); my guess is that a similar compiler/development situation applies to systems using SunOS and Solaris as well - one can write applications using free software and

tools. This concept probably also applies to HP systems, Digital UNIX systems, and other flavours of UNIX. Linux is a perfect example of how the ideas of freeware development can determine an OS' future direction. Linux was meant to be a free OS from its very inception - Linus Torvalds, its creator, loathes the idea of an OS supplier charging for the very platform upon which essential software is executed. Although Linux is receiving considerable industry support these days, Linus is wary of the possibility of Linux becoming more commercial, especially as vendors such as Red Hat and Caldera offer versions of Linux with added features which must be paid for. Whether or not the Linux development community can counter these commercial pressures in order to retain some degree of freeware status and control remains to be seen. Note: I'm not sure of the degree to which completely free development environments on a quality-par with GNU are available for MS Windows-based systems (whether that involves Win95, Win98, WinNT or even older versions such as Win3.1).

The GNU Licensing System The GNU system is, without doubt, thoroughly unique in the modern era of copyright, trademarks, law suits and court battles. It can be easily summarised as a vast collection of free software tools, but the detail reveals a much deeper philosophy of software development, best explained by the following extract from the main GNU license file that accompanies any GNUbased program [3]: "The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.

We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all."

Reading the above extract, it is clear that those responsible for the GNU licensing system had to spend a considerable amount of time actually working out how to make something free! Free in a legal sense that is. So many standard legal matters are designed to restrict activities, the work put into the GNU Free Software Foundation makes the license document read like some kind of software engineer's nirvana. It's a serious issue though, and the existence of GNU is very important in terms of the unimaginable amount of creative work going on around the world which would not otherwise exist (without GNU, Linux would probably not exist). SGI, and other UNIX vendors I expect, ships its latest OS (IRIX 6.5) with a CD entitled 'Freeware', which not only contains a vast number of freeware programs in general (everything from spreadsheets and data plotting to games, audio/midi programming and molecular modeling), but also a complete, pre-compiled inst-format distribution of the entire GNU archive: compilers, debugging tools, GNU versions of shells and associated utilities, calculators, enhanced versions of UNIX commands and tools, even higher-level tools such as a GUI-based file manager and shell tool, and an absolutely superb Photoshop-style image editing tool called GIMP [4] (GNU Image Manipulation Program) which is extendable by the user. The individual software subsystems from the Freeware CD can also be downloaded in precompiled form from SGI's web site [5]. The February 1999 edition of SGI's Freeware CD contains 173 different software subsystems, 29 of which are based on the GNU licensing system (many others are likely available from elsewhere on the Internet, along with further freeware items). A printed copy of the contents of the Feb99 Freeware CD is included with the course notes for further reading. Other important freeware programs which are supplied separately from such freeware CD distributions (an author may wish to distribute just from a web site), include the Blue Moon Rendering Tools (BMRT) [6], a suite of advanced 3D ray-tracing and radiosity tools written by one of the chief architects at Pixar animation studios - the company which created "Toy Story", "Small Soldiers" and "A Bug's Life". Blender can output files in Inventor format, which can then be converted to RIB format for use by BRMT. So why is shareware and freeware important? Well, these types of software matter because, today, it is perfectly possible for a business to operate using only shareware and/or freeware software. An increasingly common situation one comes across is an entrepreneurial multimedia firm using Blender, XV, GIMP, BMRT and various GNU tools to manage its entire business,

often running on 2nd-hand equipment using free versions of UNIX such as Linux, SunOS or IRIX 6.2! I know of one such company in the USA which uses decade-old 8-CPU SGI servers and old SGI workstations such as Crimson RealityEngine and IRIS Indigo. The hardware was acquired 2nd-hand in less than a year. Whether or not a company decides to use shareware or freeware software depends on many factors, especially the degree to which a company feels it must have proper, official support. Some sectors such as government, medical and military have no choice: they must have proper, fully guaranteeable hardware and software support because of the nature of the work they do, so using shareware or freeware software is almost certainly out of the question. However, for medium-sized or smaller companies, and especially home users or students, the existence of shareware and freeware software, combined with the modern approaches to these forms of software by today's UNIX vendors, offers whole new avenues of application development and business ideas which have never existed before as commercially viable possibilities.

System Hardware The hardware platforms supplied by the various UNIX vendors are, like UNIX itself today, also designed and managed with a top-down approach. The world of PCs has always been a bottom-up process of putting together a mish-mash of different components from a wide variety of sources. Motherboards, video cards, graphics cards and other components are available in a plethora of types of varying degrees of quality. This bottom-up approach to systems design means it's perfectly possible to have a PC with a good CPU, good graphics card, good video card, but an awful motherboard. If the hardware is suspect, problems faced by the user may appear to be OS-related when in fact they could be down to poor quality hardware. It's often difficult or impossible to ascertain the real cause of a problem sometimes system components just don't work even though they should, or a system suddenly stops recognising the presence of a device; these problems are most common with peripherals such as CDROM, DVD, ZIP, sound cards, etc. Dealing only with hardware systems designed specifically to run a particular vendor's UNIX variant, the situation is very different. The vendor maintains a high degree of control over the design of the hardware platform. Hence, there is opportunity to focus on the unique requirements of target markets, quality, reliability, etc. rather than always focusing on absolute minimum cost which inevitably means cutting corners and making tradeoffs. This is one reason why even very old UNIX systems, eg. multi-processor systems from 1991 with (say) eight 33MHz CPUs, are still often found in perfect working order. The initial focus on quality results in a much lower risk of component failure. Combined with generous hardware and software support policies, hardware platforms for traditional UNIX systems are far more reliable than PCs. My personal experience is with hardware systems designed by SGI, about which I know a great deal. Their philosophy of design is typical of most UNIX hardware vendors (others would be

Sun, HP, IBM, DEC, etc.) and can be contrasted very easily with the way PCs are designed and constructed: UNIX low-end: "What can we give the customer for 5000?" mid-range: "What can we give the customer for 15000?" high-end: "What can we give the customer for 65000+?" PC: "How cheap can we make a machine which offers a particular feature set and level of ability?"

Since the real driving force behind PC development is the home market, especially games, the philosophy has always been to decide what features a typical 'home' or 'office' PC ought to have and then try and design the cheapest possible system to offer those features. This approach has eventually led to incredibly cut-throat competition, creating new concepts such as the 'sub-$1000' PC, and even today's distinctly dubious 'free PC', but in reality the price paid by consumers is the use of poor quality components which do not integrate well, especially components from different suppliers. Hardware problems in PCs are common, and now unavoidable. In Edinburgh, I know of a high-street PC store which always has a long queue of customers waiting to have their particular problem dealt with. By contrast, most traditional UNIX vendors design their own systems with a top-down approach which focuses on quality. Since the vendor usually has complete control, they can ensure a much greater coherence of design and degree of integration. System components work well with each other because all parts of the system were designed with all the other parts in mind. Another important factor is that a top-down approach allows vendors to innovate and develop new architectural designs, creating fundamentally new hardware techniques such as SMP and S2MP processing, highly scalable systems, advanced graphics architectures, and perhaps most importantly of all from a customer's point of view: much more advanced CPU designs (Alpha, MIPS, SPARC, PA-RISC, POWER series, etc.) Such innovations and changes in design concept are impossible in the mainstream PC market: there is too much to lose by shifting from the status-quo. Everything follows the lowest common denominator. The most obvious indication of these two different approaches is that UNIX hardware platforms have always been more expensive than PCs, but that is something which should be expected given that most UNIX platforms are deliberately designed to offer a much greater feature set, better quality components, better integration, etc. A good example is the SGI Indy. With respect to absolute cost, the Indy was very expensive when it was first released in 1993, but because of what it offered in terms of hardware and software features it was actually a very cheap system compared to trying to put together a PC with a similar feature set. In fact, Indy offered features such as hardware-accelerated 3D graphics at high resolution (1280x1024) and 24bit colour at a time when such features did not exist at all for PCs. PCW magazine said in its original review [7] that to give a PC the same standard features and abilities, such as ISDN, 4-channel 16bit stereo sound with multiple stereo I/O sockets, S-

Video/Composite/Digital video inputs, NTSC-resolution CCD digital camera, integrated SCSI, etc. would have cost twice as much as an Indy. SGI set out to design a system which would include all these features as-standard, so the end result was bound to cost several thousand pounds, but that was still half the cost of trying to cobble together a collection of mis-matched components from a dozen different companies to produce something which still would not have been anywhere near as good. As PCW put it, the Indy - for its time - was a great machine offering superb value if one was the kind of customer which needed its features and would be able to make good use of them. Sun Microsystems adopted a similar approach to its recent Ultra5, Ultra10 and other systems: provide the user with an integrated design with a specific feature set that Sun knew its customers wanted. SGI did it again with their O2 system, released in October 1996. O2 has such a vast range of features (highly advanced for its time) that few ordinary customers would find themselves using most or all of them. However, for the intended target markets (ranging from CAD, design, animation, film/video special effects, video editing to medical imaging, etc.) the O2 was an excellent system. Like most UNIX hardware systems, O2 today is not competitive in certain areas such as basic 3D graphics performance (there are exceptions to this), but certain advanced and unique architectural features mean it's still purchased by customers who require those features. This, then, is the key: UNIX hardware platforms which offer a great many features and highquality components are only a good choice if one:  

is the kind of customer which definitely needs those features values the ramifications of using a better quality system that has been designed top-down: reliability, quality, long-term value, ease of maintenance, etc.

One often observes people used to PCs asking why systems like O2, HP's Visualize series, SGI's Octane, Sun's Ultra60, etc. cost so much compared compared to PCs. The reason for the confusion is that the world of PCs focuses heavily on the abilities of the main CPU, whereas all UNIX vendors have, for many years, made systems which include as much dedicated acceleration hardware as possible, easing the burden on the main CPU. For the home market, systems like the Amiga pioneered this approach; unfortunately, the company responsible fort the Amiga doomed itself to failure as a result of various marketing blunders.

From an admin's point of view, the practical side effect of having to administer and run a UNIX hardware platform is that there is far, far less effort needed in terms of configuring systems at the hardware level, or having to worry about different system hardware components operating correctly with one other. Combined with the way most UNIX variants deal with hardware devices (ie. automatically and transparently most of the time), a UNIX admin can swap hardware components between different systems from the same vendor without any need to alter system software, ie. any changes in system hardware configuration are dealt with automatically. Further, many UNIX vendors use certain system components that are identical (usually memory, disks and backup devices), so admins can often swap generic items such as disks between different vendor platforms without having to reconfigure those components (in the case of disks)

or worry about damaging either system. SCSI disks are a good example: they are supplied preformatted, so an admin should never have to reformat a SCSI disk. Swapping a SCSI disk between different vendor platforms may require repartitioning of the disk, but never a reformat. In the 6 years I've been using SGIs, I've never had to format a SCSI disk. Examining a typical UNIX hardware system such as Indy, one notices several very obvious differences compared to PCs:   

There are far fewer cables in view. Components are positioned in such a way as to greatly ease access to all parts of the system. The overall design is highly integrated so that system maintenance and repairs/replacements are much easier to carry out.

Thus, problems that are solvable by the admin can be dealt with quickly, while problems requiring vendor hardware support assistance can be fixed in a short space of time by a visiting technician, which obviously reduces costs for the vendor responsible by enabling their engineers to deal with a larger number of queries in the same amount of time.

Just as with the approaches taken to hardware and software design, the way in which support contracts for UNIX systems operate also follow a top-down approach. Support costs can be high, but the ethos is similar: you get what you pay for - fast no-nonsense support when it's needed. I can only speak from experience of dealing with SGIs, but I'm sure the same is true of other UNIX vendors. Essentially, if I encounter a hardware problem of some kind, the support service always errs on the side of caution in dealing with the problem, ie. I don't have to jump through hoops in order to convince them that there is a problem - they accept what I say and organise a visiting technician to help straight away (one can usually choose between a range of response times from 1 hour to 5 days). Typically, unless the technician can fix the problem on-site in a matter of minutes, then some, most, or even all of the system components will be replaced if necessary to get the system in working order once more. For example, when I was once encountering SCSI bus errors, the visiting engineer was almost at the point of replacing the motherboard, video card and even the main CPU (several thousand pounds worth of hardware in terms of new-component replacement value at the time) before some extra further tests revealed that it was in fact my own personal disk which was causing the problem (I had an important jumper clip missing from the jumper block). In other words, UNIX vendor hardware support contracts tend to place much less emphasis on the customer having to prove they have a genuine problem. I should imagine this approach exists because many UNIX vendors have to deal with extremely important clients such as government, military, medical, industrial and other sectors (eg. safety critical systems). These are customers with big budgets who don't want to waste time messing around with details while their faulty system is losing them money - they expect the vendor to help them get their system working again as soon as possible.

Note: assuming a component is replaced (eg. motherboard), even if the vendor's later tests show the component to be working correctly, it is not returned to the customer, ie. the customer keeps the new component. Instead, most vendors have their own dedicated testing laboratories which pull apart every faulty component returned to them, looking for causes of problems so that the vendor can take corrective action if necessary at the production stage, and learn any lessons to aid in future designs. To summarise the above:  

A top-down approach to hardware design means a better feature set, better quality, reliability, ease of use and maintenance, etc. As a result, UNIX hardware systems can be costly. One should only purchase such a system if one can make good use of the supplied features, and if one values the implications of better quality, etc., despite the extra cost.

However, a blurred middle-ground between the top-down approach to UNIX hardware platforms and the bottom-up approach to the supply of PCs is the so-called 'vendor-badged' NT workstation market. In general, this is where UNIX vendors create PC-style hardware systems that are still based on off-theshelf components, but occasionally include certain modifications to improve performance, etc. beyond what one normally sees of a typical PC. The most common example is where vendors such as Compaq supply systems which have two 64bit PCI busses to increase available system bandwidth.

All these systems are targeted at the 'NT workstation' market. Cynics say that such systems are just a clever means of placing a 'quality' brand name on ordinary PC hardware. However, such systems do tend to offer a better level of quality and integration that ordinary PCs (even expensive ordinary PCs), but an inevitable ironic side effect is that these vendor-badged systems do cost more. Just as with traditional UNIX hardware systems, whether or not that cost is worth it depends on customers' priorities. Companies such as movie studios regard stability and reliability as absolutely critical, which is why most studios do not use NT [8]. Those that do, especially smaller studios (perhaps because of limited budgets) will always go for vendor-badged NT workstations rather than purchasing systems from PC magazines and attempting to cobble together a reliable platform. The extra cost is worth it. There is an important caveat to the UNIX hardware design approach: purchasing what can be a very good UNIX hardware system is a step that can easily be ruined by not equipping that system in the first instance with sufficient essential system resources such as memory capacity, disk space, CPU power and (if relevant) graphics/image/video processing power. Sometimes, situations like this occur because of budget constraints, but the end result may be a system which cannot handle the tasks for which it was purchased. If such mis-matched purchases are made, it's usually a good sign that the company concerned is using a bottom-up approach to making decisions about whether or not to buy a hardware platform that has been built using a top-down approach. The irony is plain to see. Since admins often have to advise on hardware purchases or upgrades, a familiarity with these issues is essential. Conclusion: decide what is needed to solve the problem. Evaluate which systems offer appropriate solutions. If there no system is affordable, do not compromise on essentials such as

memory or disk as a means of lowering cost - choose a different platform instead such as good quality NT system, or a system with lower costs such as an Intel machine running Linux, etc. Similarly, it makes no sense to have a good quality UNIX system, only to then adopt a strategy of buying future peripherals (eg. extra disks, memory, printers, etc.) that are of poor quality. In fact, some UNIX vendors may not offer or permit hardware support contracts unless the customer sticks to using approved 3rd-party hardware sources. Summary: UNIX hardware platforms are designed top-down, offer better quality components, etc., but tend to be more expensive as a result. Today, an era when even SGI has started to sell systems that support WindowsNT, the philosophy is still the same: design top-down to give quality hardware, etc. Thus, SGI's WindowsNT systems start at around 2500 pounds - alot by the standards of any home user, but cheap when considering the market in general. The same caveat applies though: such a system with a slow CPU is wasting the capabilities of the machine.

UNIX Characteristics. Integration:

A top-down approach results in an integrated design. Systems tend to be supplied 'complete', ie. everything one requires is usually supplied as-standard. Components work well together since the designers are familiar with all aspects of the system.

Stability and Reliability: The use of quality components, driven by the demands of the markets which most UNIX vendors aim for, results in systems that experience far fewer component failures compared to PCs. As a result of a top-down and integrated approach, the chances of a system experiencing hardwarelevel conflicts are much lower compared to PCs.

Security: It is easy for system designers to incorporate hardware security features such as metal hoops that are part of the main moulded chassis, for attaching to security cables. On the software side, and as an aid to preventing crime (as well as making it easier to solve crime in terms of tracing components, etc.) systems such as SGIs often incorporate unique hardware features. The following applies to SGIs but is also probably true of hardware from other UNIX vendors in some equivalent form.

Every SGI has a PROM chip on the motherboard, without which the system will not boot. This PROM chip is responsible for initiating the system bootup sequence at the very lowest hardware level. However, the chip also contains an ID number which is unique to that particular machine. One can display this ID number with the following command: sysinfo -s

Alternatively, the number can be displayed in hexadecimal format by using sysinfo command on its own (one notes the first 4 groups of two hex digits). A typical output might look like this: % sysinfo -s 1762299020 % sysinfo System ID: 69 0a 8c 8c 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

The important part of the output from the second command is the beginning sequence consisting of '690A8C8C'. The ID number is not only used by SGI when dealing with system hardware and software support contracts, it is also the means by which license codes are supplied for SGI's commercial software packages. If one wishes to use a particular commercial package, eg. the VRML editor called CosmoWorlds, SGI uses the ID number of the machine to create a license code which will be recognised by the program concerned as being valid only for that particular machine. The 20digit hexadecimal license code is created using a special form of encryption, presumably combining the ID number with some kind of internal database of codes for SGI's various applications which only SGI has access to. In the case of the O2 I use at home, the license code for CosmoWorlds is 4CD4FB82A67B0CEB26B7 (ie. different software packages on the same system need different license codes). This code will not work for any other software package on any other SGI anywhere in the world. There are two different license management systems in use by SGIs: the NetLS environment on older platforms, and the FlexLM environment on newer platforms. FlexLM is being widely adopted by many UNIX vendors. NetLS licenses are stored in the /var/netls directory, while FlexLM licenses are kept in /var/flexlm. To the best of my knowledge, SGI's latest version of IRIX (6.5) doesn't use NetLS licenses anymore, though it's possible that 3rd-party software suppliers still do. As stated in the software section, the use of the ID number system at the hardware level means it is impossible to pirate commercial software. More accurately, anyone can copy any SGI software CD, and indeed install the software, but that software will not run without the license code which is unique to each system, so there's no point in copying commercial software CDs or installing copied commercial software in the first place.

Of course, one could always try to reverse-engineer the object code of a commercial package to try and get round the section which makes the application require the correct license code, but this would be very difficult. The important point is that, to the best of my knowledge, SGI's license code schema has never been broken at the hardware level. Note: from the point of view of an admin maintaining an SGI system, if a machine completely fails, eg. damage by fire and water, the admin should always retain the PROM chip if possible ie. a completely new system could be obtained but only the installation of the original PROM chip will make the new system effectively the same as the old one. For PCs, the most important system component in terms of system identity is the system disk (more accurately, its contents); but for machines such as SGIs, the PROM chip is just as if not more important than the contents of the system disk when it comes to a system having a unique identity.

Scalability. Because a top-down hardware design approach has been used by all UNIX hardware vendors over the years, most UNIX vendors offer hardware solutions that scale to a large number of processors. Sun, IBM, SGI, HP and other vendors all offer systems that scale to 64 CPUs. Currently, one cannot obtain a reliable PC/NT platform that scales to even 8 CPUs (Intel won't begin shipping 8-way chip sets until Q3 1999). Along with the basic support for a larger number of processors, UNIX vendors have spent a great deal of time researching advanced ways of properly supporting many CPUs. There are complex issues concerning how such systems handle shared memory, the movement of data, communications links, efficient use of other hardware such as graphics and video subsystems, maximised use of storage systems (eg. RAID), and so on. The result is that most UNIX vendors offer large system solutions which can tackle extremely complex problems. Since these systems are obviously designed to the very highest quality standards with a top-down approach to integration, etc., they are widely used by companies and institutions which need such systems for solving the toughest of tasks, from processing massive databases to dealing with huge seismic data sets, large satellite images, complex medical data and intensive numerical processing (eg. weather modeling). One very beneficial side-effect of this kind of development is that the technology which comes out of such high-quality designs slowly filters down to the desktop systems, enabling customers to eventually utilise extremely advanced and powerful computing systems. A particularly good example of this is SGI's Octane system [9] - it uses the same components and basic technology as SGI's high-end Origin server system. As a result, the user benefits from many advanced features, eg. 

Octane has no inherent maximum memory limit. Memory is situated on a 'node board' along with the 1 or 2 main CPUs, rather than housed on a backplane. As CPU designs improve, so memory capacity on the node board can be increased by using a different node board design, ie. without changing the base system at all. For example, Octane systems using the R10000 CPU



can have up to 2GB RAM, while Octane systems using the R12000 CPU can have up to 4GB RAM. Future CPUs (R14K, R16K, etc.) will change this limit again to 8GB, 16GB, etc. The speed at which all internal links operate is directly synchronised to the clock speed of the main CPU. As a result, internal data pathways can always supply data to both main CPUs faster than they can theoretically cope with, ie. one can get the absolute maximum performance out of a CPU (this is fundamentally not possible with any PC design). As CPU clock speeds increase, so does the rate at which the system can move data around internally. An Octane using 195MHz R10000s offers three separate internal data pathways each operating at 1560MB/sec (10X faster than a typical PCI bus). An Octane using 300MHz R12000s runs the same pathways at the faster rate of 2400MB/sec per link. ie. system bandwidth and memory bandwidth increase to match CPU speed.

The above is not a complete list of advanced features.

SGI's high-end servers are currently the most scalable in the world, offering up to 256 CPUs for a commercially available system, though some sites with advance copies of future OS changes have systems with 512 and 720 CPUs. As stated elsewhere, one system has 6144 CPUs. The quality of design required to create technologies like this, along with software and OS concepts that run them properly, are quite incredible. These features are passed on down to desktop systems and eventually into consumer markets. But it means that, at any one time, midrange systems based on such advanced technologies can be quite expensive (Octanes generally start at around 7000 pounds). Since much of the push behind these developments comes from military and government clients, again there is great emphasis on quality, reliability, security, etc. Cray Research, which is owned by SGI, holds the world record for the most stable and reliable system: a supercomputer with 2048 CPUs which ran for 2.5 years without any of the processors exhibiting a single system-critical error. Sun, HP, IBM, DEC, etc. all operate similar design approaches, though SGI/Cray happens to have the most advanced and scalable server and graphics system designs at the present time, mainly because they have traditionally targeted high-end markets, especially US government contracts. The history of UNIX vendor CPU design follows a similar legacy: typical customers have always been willing to pay 3X as much as an Intel CPU in order to gain access to 2X the performance. Ironically, as a result, Intel have always produced the world's slowest CPUs, even though they are the cheapest. CPUs at much lower clock speeds from other vendors (HP, IBM, Sun, SGI, etc.) can easily be 2X to 5X faster than Intel's current best. As stated above though, these CPUs are much more expensive - even so, it's an extra cost which the relevant clients say they will always bare in order to obtain the fastest available performance. The exception today is the NT workstation market where systems from UNIX vendors utilise Intel CPUs and WindowsNT (and/or Linux), offering a means of gaining access to better quality graphics and video hardware while sacrificing the use of more powerful CPUs and the more sophisticated UNIX OSs, resulting in lower cost. Even so, typical high-end NT systems still cost around 3000 to 15000 pounds.

So far, no UNIX vendor makes any product that is targeted at the home market, though some vendors create technologies that are used in the mass consumer market (eg. the R3000 CPU which runs the Sony PlayStation is designed by SGI and was used in their older workstations in the late 1980s and early 1990s; all of the Nintendo64's custom processors were designed by SGI). In terms of computer systems, it is unlikely this situation will ever change because to do so would mean a vendor would have to adopt a bottom-up design approach in order to minimise cost above all else - such a change wouldn't be acceptable to customers and would contradict the way in which the high-end systems are developed. Vendors which do have a presence in the consumer market normally use subsidiaries as a means of avoiding internal conflicts in design ethos, eg. SGI's MIPS subsidiary (soon to be sold off).

References: 1. Blender Animation and Rendering Program: 2. http://www.blender.nl/

3. XV Image Viewer: 4. http://www.trilon.com/xv/xv.html

5. Extract taken from GNU GENERAL PUBLIC LICENSE, Version 2, June 1991, Copyright (C) 1989, 1991 Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 6. GIMP (GNU Image Manipulation Program): 7. http://www.gimp.org/

8. SGI Freeware Sites (identical): 9. http://freeware.sgi.com/ 10. http://toolbox.sgi.com/TasteOfDT/public/freeware/

11. Pixar's Blue Moon Rendering Tools (BMRT): 12. http://www.bmrt.org/

13. Silicon Graphics Indy, PCW, September 1993: 14. http://www.futuretech.vuurwerk.nl/pcw9-93indy.html

15. "LA conferential", CGI Magazine, Vol4, Issue 1, Jan/Feb 1999, pp. 21, by Richard Spohrer.

Interview from the 'Digital Content and Creation' conference and exhibition: '"No major production facilities rely on commercial software, everyone has to customise applications in order to get the most out of them," said Hughes. "We run Unix on SGI as we need a stable environment which allows fast networking. NT is not a professional solution and was never designed to handle high-end network environments," he added. "Windows NT is the antithesis of what the entertainment industry needs. If we were to move from Irix, we would use Linux over NT."'

- John Hughes, president/CEO of Rhythm & Hues and Scott Squires, visual effects supervisor at ILM and ceo of Puffin Design.

16. Octane Information Index: http://www.futuretech.vuurwerk.nl/octane/

17. "How to set up the BIND domain name server", Network Week, Vol4 No. 29, 14th April 1999, pp. 17, by David Cartwright. 18. A letter from a reader in response to [10]: "Out of a BIND", Network Week, Vol4 No. 31, 28th April 1999, pp. 6:

"A couple of weeks ago, I had a problem. I was attempting to configure NT4's DNS Server for use on a completely private network, but it just wasn't working properly. The WindowsNT 'help' - and I use that term loosely - assumed my network was connected to the Internet, so the examples it gave were largely useless. Then I noticed David Cartwright's article about setting up DNS servers. (Network Week, 14th April). The light began to dawn. Even better, the article used BIND's configuration files as examples. This meant that I could dump NT's obtuse GUI DNS Manager application and hand-hack the configuration files myself. A few minor problems later (most of which were caused by Microsoft's example DNS config files being a bit... um... optimistic) and the DNS server finally lurched into life. Thank you Network Week. The more Q&A and how-to type information you print, the better." - Matthew Bell, Fluke UK.

General References: Anonymous SGI FTP Site List: Origin2000 Information Index: Onyx2 Information Index: SGI: Hewlett Packard: Sun Microsystems: IBM: Compaq/Digital: SCO: Linux:

http://reality.sgi.com/billh/anonftp/ http://www.futuretech.vuurwerk.nl/origin/ http://www.futuretech.vuurwerk.nl/onyx2/ http://www.sgi.com/ http://www.hp.com/ http://www.sun.com/ http://www.ibm.com/ http://www.digital.com/ http://www.sco.com/ http://www.linux.org/

Appendix A: Case Study. For unknown and unchangeable reasons, UCLAN's central admin system has a DNS setup which, incorrectly, does not recognise comp.uclan.ac.uk as a subdomain. Instead, the central DNS lists comp as a host name, ie. comp.uclan.ac.uk is listed as a direct reference to Yoda's external IP address, 193.61.250.34; in terms of the intended use of

the word 'comp', this is rather like referring to a house on a street by using just the street name. As a result, the SGI network's fully qualified host names, such as yoda.comp.uclan.ac.uk, are not recognised outside UCLAN, and neither is comp.uclan.ac.uk since all the machines on the SGI network treat comp as a subdomain. Thus, external users can access Yoda's IP address directly by referring to 193.61.250.34 (so ftp is possible), but they cannot access Yoda as a web server, or access individual systems in Ve24 such as sevrin.comp.uclan.ac.uk, or send email to the SGI network. Also, services such as USENET cannot be setup, so internal users must use web sites to access newsgroups. This example serves as a warning: organisations should thoroughly clarify what their individual department's network structures are going to be, through a proper consultation and discussion process, before allowing departments to setup internal networks. Otherwise, confusion and disagreement can occur. In the case of the SGI network, its internal structure is completely correct (as confirmed by SGI themselves), but the way it is connected to the Internet is incorrect. Only the use of a Proxy server allows clients to access the Internet, but some strange side-effects remain; for example, email can be sent from the SGI network to anywhere on the Internet (from Yoda to Yahoo in less than 10 seconds!), but not vice-versa because incoming data is blocked by the incorrectly configured central DNS. Email from the SGI network can reach the outside world because of the way the email system works: the default settings installed along with the standard Berkeley Sendmail software (/usr/lib/sendmail) are sufficient to forward email from the SGI network to the Internet via routers further along the communications chain, which then send the data to JANET at Manchester, and from there to the final destination (which could include a UCLAN student or staff member). The situation is rather like posting a letter without a sender's address, or including an address which gives everything as far as the street name but not the house number - the letter will be correctly delivered, but the recipient will not be able to reply to the sender.

Detailed Notes for Day 2 (Part 2) UNIX Fundamentals: Shell scripts.

It is an inevitable consequence of using a command interface such as shells that one would wish to be able to run a whole sequence of commands to perform more complex tasks, or perhaps the same task many times on multiple systems. Shells allow one to do this by creating files containing sequences of commands. The file, referred to as a shell script, can be executed just like any other program, though one must ensure the execute permissions on the file are set appropriately in order for the script to be executable. Large parts of all modern UNIX variants use shell scripts to organise system management and behaviour. Programming in shell script can include more complicated structures such as if/then statements, case statements, for loops, while loops, functions, etc. Combined with other features such as metacharacters and the various text-processing utilities (perl, awk, sed, grep, etc.) one can create extremely sophisticated shell scripts to perform practically any system administration task, ie. one is able to write programs which can use any available application or existing command as part of the code in the script. Since shells are based on C and the commands use a similar syntax, shell programming effectively combines the flexibility of C-style programming with the ability to utilise other programs and resources within the shell script code. Looking at typical system shell script files, eg. the bootup scripts contained in /etc/init.d, one can see that most system scripts make extensive use of if/then expressions and case statements. However, a typical admin will find it mostly unnecessary to use even these features. In fact, many administration tasks one might choose to do can be performed by a single command or sequence of commands on a single line (made possible via the various metacharacters). An admin might put such mini-scripts into a file and execute that file when required; even though the file's contents may not appear to be particularly complex, one can perform a wide range of tasks using just a few commands. Hash symbol '#' in a script file at the beginning of a line is used to denote a comment. One of the most commonly used commands in UNIX is 'find' which allows one to search for files, directories, files belonging to a particular user or group, files of a special type (eg. a link to another file), files modified before or after a certain time, and so on (there are many options). Most admins tend to use the find command to select certain files upon which to perform some other operation, to locate files for information gathering purposes, etc. The find command uses a Boolean expression which defines the type of file the command is to search for. The name of any file matching the Boolean expression is returned.

For example (see the 'find' man page for full details): find /home/students -name "capture.mv" -print

Figure 25. A typical find command.

This command searches all students directories, looking for any file called 'capture.mv'. On Indy systems, users often capture movie files when first using the digital camera, but usually never delete them, wasting disk space. Thus, an admin might have a site policy that, at regular intervals, all files called capture.mv are erased - users would be notified that if they captured a video sequence which they wished to keep, they should either set the name to use as something else, or rename the file afterwards. One could place the above command into a executable file called 'loc', running that file when one so desired. This can be done easily by the following sequence of actions (only one line is entered in this example, but one could easily enter many more): % cat > loc find /home/students -name "capture.mv" -print [press CTRL-D] % chmod u+x loc % ls -lF loc -rwxr--r-1 mapleson staff 46 May

3 13:20 loc*

Figure 26. Using cat to quickly create a simple shell script.

Using ls -lF to examine the file, one would see the file has the execute permission set for user, and a '*' has been appended after the file name, both indicating the file is now executable. Thus, one could run that file just as if it were a program. One might imagine this is similar to .BAT files in DOS, but the features and functionality of shell scripts are very different (much more flexible and powerful, eg. the use of pipes). There's no reason why one couldn't use an editor to create the file, but experienced admins know that it's faster to use shortcuts such as employing cat in the above way, especially compared to using GUI actions which requires one to take hold the mouse, move it, double-click on an icon, etc. Novice users of UNIX systems don't realise until later that very simple actions can take longer to accomplish with GUI methods. Creating a file by redirecting the input from cat to a file is a technique I often use for typing out files with little content. cat receives its input from stdin (the keyboard by default), so using 'cat > filename' means anything one types is redirected to the named file instead of stdout; one must press CTRL-D to end the input stream and close the file. An even lazier way of creating the file, if just one line was required, is to use echo: % echo 'find /home/students -name "capture.mv" -print' > loc % chmod u+x loc % ls -lF loc -rwxr--r-1 mapleson staff 46 May 3 13:36 loc

% cat loc find /home/students -name "capture.mv" -print

Figure 27. Using echo to create a simple one-line shell script.

This time, there is no need to press CTRL-D, ie. the prompt returns immediately and the file has been created. This happens because, unlike cat which requires an 'end of file' action to terminate the input, echo's input terminates when it receives an end-of-line character instead (this behaviour can be overridden with the '-n' option). The man page for echo says, "echo is useful for producing diagnostics in command files and for sending known data into a pipe." For the example shown in Fig 27, single quote marks surrounding the find command were required. This is because, without the quotes, the double quotes enclosing capture.mv are not included in the output stream which is redirected into the file. When contained in a shell script file, find doesn't need double quotes around the file name to search for, but it's wise to include them because other characters such as * have special meaning to a shell. For example, without the single quote marks, the script file created with echo works just fine (this example searches for any file beginning with the word 'capture' in my own account): % echo find /mapleson -name "capture.*" -print > loc % chmod u+x loc % ls -lF loc -rwxr--r-1 mapleson staff 38 May 3 14:05 loc* % cat loc find /mapleson -name capture.* -print % loc /mapleson/work/capture.rgb

Figure 28. An echo sequence without quote marks.

Notice the loc file has no double quotes. But if the contents of loc is entered directly at the prompt: % find /mapleson -name capture.* -print find: No match.

Figure 29. The command fails due to * being treated as a metacommand by the shell.

Even though the command looks the same as the contents of the loc file, entering it directly at the prompt produces an error. This happens because the * character is interpreted by the shell before the find command, ie. the shell tries to evaluate the capture.* expression for the current directory, instead of leaving the * to be part of the find command. Thus, when entering commands at the shell prompt, it's wise to either use double quotes where appropriate, or use the backslash \ character to tell the shell not to treat the character as if it was a shell metacommand, eg.: % find /mapleson -name capture.\* -print

/mapleson/work/capture.rgb

Figure 30. Using a backslash to avoid confusing the shell.

A -exec option can be used with the find command to enable further actions to be taken on each result found, eg. the example in Fig 25 could be enhanced by including making the find operation execute a further command to remove each capture.mv file as it is found: find /home/students -name "capture.mv" -print -exec /bin/rm {} \;

Figure 31. Using find with the -exec option to execute rm.

Any name returned by the search is passed on to the rm command. The shell substitutes the {} symbols with each file name result as it is returned by find. The \; grouping at the end serves to terminate the find expression as a whole (the ; character is normally used to terminate a command, but a backslash is needed to prevent it being interpreted by the shell as a metacommand). Alternatively, one could use this type of command sequence to perform other tasks, eg. suppose I just wanted to know how large each movie file was: find /home/students -name "capture.mv" -print -exec /bin/ls -l {} \;

Figure 32. Using find with the -exec option to execute ls.

This works, but two entries will be printed for each command: one is from the -print option, the other is the output from the ls command. To see just the ls output, one can omit the -print option. Consider this version: find /home/students -name "*.mov" -exec /bin/ls -l {} \; > results

Figure 33. Redirecting the output from find to a file.

This searches for any .mov movie file (usually QuickTime movies), with the output redirected into a file. One can then perform further operations on the results file, eg. one could search the data for any movie that contains the word 'star' in its name: grep star results

A final change might be to send the results of the grep operation to the printer for later reading: grep star results | lp

Thus, the completed script looks like this: find /home/students -name "*.mv" -exec /bin/ls -l {} \; > results grep star results | lp

Figure 34. A simple script with two lines.

Only two lines, but this is now a handy script for locating any movies on the file system that are likely to be related to the Star Wars or Star Trek sagas and thus probably wasting valuable disk space! For the network I run, I could then use the results to send each user a message saying the Star Wars trailer is already available in /home/pub/movies/misc, so they've no need to download extra copies to their home directory. It's a trivial example, but in terms of the content of the commands and the way extra commands are added, it's typical of the level of complexity of most scripts which admins have to create. Further examples of the use of 'find' are in the relevant man page; an example file which contains several different variations is: /var/spool/cron/crontabs/root

This file lists the various administration tasks which are executed by the system automatically on a regular basis. The cron system itself is discussed in a later lecture.

WARNING. The Dangers of the Find Command and Wildcards. Although UNIX is an advanced OS with powerful features, sometimes one encounters an aspect of its operation which catches one completely off-guard, though this is much less the case after just a little experience. A long time ago (January 1996), I realised that many students who used the Capture program to record movies from the Digital Camera were not aware that using this program or other movierelated programs could leave unwanted hidden directories containing temporary movie files in their home directory, created during capture, editing or conversion operations (I think it happens when an application is killed of suddenly, eg. with CTRL-C, which doesn't give it an opportunity to erase temporary files). These directories, which are always located in a user's home directory, are named '.capture.mv.tmpXXXXX' where XXXXX is some 5-digit string such as '000Hb', and can easily take up many megabytes of space each. So, I decided to write a script to automatically remove such directories on a regular basis. Note that I was logged on as root at this point, on my office Indy. In order to test that a find command would work on hidden files (I'd never used the find command to look for hidden files before), I created some test directories in the /tmp directory, whose contents would be given by 'ls -AR' as something like this: % ls -AR

.b/ .c/ ./.b:

a/

d/

./.c: .b a ./a: ./d: a

ie. a simple range of hidden and non-hidden directories with or without any content:    

Ordinary directories with or without hidden/non-hidden files inside, Hidden directories with or without hidden/non-hidden files inside, Directories with ordinary files, etc.

The actual files such as .c/a and .c/.b didn't contain anything. Only the names were important for the test.

So, to test that find would work ok, I executed the following command from within the /tmp directory: find . -name ".*" -exec /bin/rm -r {} \;

(NB: the -r option for rm means do a recursive removal, and note that there was no -i option used with the rm here) What do you think this find command would do? Would it remove the hidden directories .b and .c and their contents? If not, why not? Might it do anything else as well?

Nothing happened at first, but the command did seem to be taking far too long to return the shell prompt. So, after a few seconds, I decided something must have gone wrong; I typed CTRL-C to stop the find process (NB: it was fortunate I was not distracted by a phone call or something at this point). Using the ls command showed the test files I'd created still existed, which seemed odd. Trying some further commands, eg. changing directories, using the 'ps' command to see if there was something causing system slowdown, etc., produced strange errors which I didn't understand at the time (this was after only 1 or 2 months' admin experience), so I decided to reboot the system. The result was disaster: the system refused to boot properly, complaining about swap file errors and things relating to device files. Why did this happen? Consider the following command sequence by way of demonstration:

cd /tmp mkdir xyz cd xyz /bin/ls -al

The output given will look something like this: drwxr-xr-x drwxrwxrwt

2 root 6 sys

sys sys

9 Apr 21 13:28 ./ 512 Apr 21 13:28 ../

Surely the directory xyz should be empty? What are these two entries? Well, not quite empty. In UNIX, as stated in a previous lecture, virtually everything is treated as a file. Thus, for example, the command so commonly performed even on the DOS operating system: cd ..

is actually doing something rather special on UNIX systems. 'cd ..' is not an entire command in itself. Instead, every directory on a UNIX file system contains two hidden directories which are in reality special types of file: ./ ../

- this refers to the current directory. - this is effectively a link to the directory above in the file system.

So typing 'cd ..' actually means 'change directory to ..' (logical since cd does mean 'change directory to') and since '..' is treated as a link to the directory above, then the shell changes the current working directory to the next level up. [by contrast, 'cd ..' in DOS is treated as a distinct command in its own right - DOS recognises the presence of '..' and if possible changes directory accordingly; this is why DOS users can type 'cd..' instead if desired] But this can have an unfortunate side effect if one isn't careful, as is probably becoming clear by now. The ".*" search pattern in the find command will also find these special './' and '../' entries in the /tmp directory, ie.:    



The first thing the find command locates is './' './' is inserted into the search string ".*" to give "../*" find changes directory to / (root directory). Uh oh... find locates the ./ entry in / and substitutes this string into ".*" to give "../*". Since the current directory cannot be any higher, the search continues in the current directory; ../ is found next and is treated the same way. The -exec option with 'rm' causes find to begin erasing hidden files and directories such as .Sgiresources, eventually moving onto non-hidden files: first the /bin link to /usr/bin, then the /debug link, then all of /dev, /dumpster, /etc and so on.

By the time I realised something was wrong, the find command had gone as far as deleting most of /etc. Although important files in /etc were erased which I could have replaced with a backup tape or reinstall,

the real damage was the erasure of the /dev directory. Without important entries such as /dev/dsk, /dev/rdsk, /dev/swap and /dev/tty*, the system cannot mount disks, configure the swap partition on bootup, connect to keyboard input devices (tty terminals), and accomplish other important tasks.

In other words, disaster. And I'd made it worse by rebooting the system. Almost a complete repair could have been done simply by copying the /dev and /etc directories from another machine as a temporary fix, but the reboot made everything go haywire. I was partly fooled by the fact that the files in /tmp were still present after I'd stopped the command with CTRL-C. This led me to at first think that nothing had gone awry. Consulting an SGI software support engineer for help, it was decided the only sensible solution was to reinstall the OS, a procedure which was alot simpler than trying to repair the damage I'd done. So, the lessons learned: 

Always read up about a command before using it. If I'd searched the online books with the expression 'find command', I would have discovered the following paragraph in Chapter 2 ("Making the Most of IRIX") of the 'IRIX Admin: System Configuration and Operation' manual: "Note that using recursive options to commands can be very dangerous in that the command automatically makes changes to your files and file system without prompting you in each case. The chgrp command can also recursively operate up the file system tree as well as down. Unless you are sure that each and every case where the recursive command will perform an action is desired, it is better to perform the actions individually. Similarly, it is good practice to avoid the use of metacharacters (described in "Using Regular Expressions and Metacharacters") in combination with recursive commands."

I had certainly broken the rule suggested by the last sentence in the above paragraph. I also did not know what the command would do before I ran it. 

Never run programs or scripts with as-yet unknown effects as root.

ie. when testing something like removing hidden directories, I should have logged on as some ordinary user, eg. a 'testuser' account, so that if the command went wrong it would not have been able to change or remove any files owned by root, or files owned by anyone else for that matter, including my own in /mapleson. If I had done this, the command I used would have given an immediate error and halted when the find string tried to remove the very first file found in the root directory (probably some minor hidden file such as .Sgiresources). Worrying thought: if I hadn't CTRL-C'd the find command when I did, after enough time, the command would have erased the entire file system (including /home), or at least tried to. I seem to recall that, in reality (tested once on a standalone system deliberately), one can get about as far as most of /lib before the system actually goes wrong and stops the current command anyway, ie. the find command

sequence eventually ends up failing to locate key libraries needed for the execution of 'rm' (or perhaps the 'find' itself) at some point.

The only positive aspects of the experience were that, a) I'd learned alot about the subtleties of the find command and the nature of files very quickly; b) I discovered after searching the Net that I was not alone in making this kind of mistake - there was an entire web site dedicated to the comical mess-ups possible on various operating systems that can so easily be caused by even experienced admins, though more usually as a result of inexperience or simple errors, eg. I've had at least one user so far who has erased their home directory by mistake with 'rm -r *' (he'd thought his current working directory was /tmp when in fact it wasn't). A backup tape restored his files. Most UNIX courses explain how to use the various available commands, but it's also important to show how not to use certain commands, mainly because of what can go wrong when the root user makes a mistake. Hence, I've described my own experience of making an error in some detail, especially since 'find' is such a commonly used command. As stated in an earlier lecture, to a large part UNIX systems run themselves automatically. Thus, if an admin finds that she/he has some spare time, I recommend using that time to simply read up on random parts of the various administration manuals - look for hints & tips sections, short-cuts, sections covering daily advice, guidance notes for beginners, etc. Also read man pages: follow them from page to page using xman, rather like the way one can become engrossed in an encyclopedia, looking up reference after reference to learn more.

A Simple Example Shell Script. I have a script file called 'rebootlab' which contains the following: rsh akira init 6& rsh ash init 6& rsh cameron init 6& rsh chan init 6& rsh conan init 6& rsh gibson init 6& rsh indiana init 6& rsh leon init 6& rsh merlin init 6& rsh nikita init 6& rsh ridley init 6& rsh sevrin init 6& rsh solo init 6& #rsh spock init 6& rsh stanley init 6& rsh warlock init 6& rsh wolfen init 6& rsh woo init 6&

Figure 35. The simple rebootlab script.

The rsh command means 'remote shell'. rsh allows one to execute commands on a remote system by establishing a connection, creating a shell on that system using one's own user ID information, and then executing the supplied command sequence. The init program is used for process control initialisation (see the man page for details). A typical use for init is to shutdown the system or reboot the system into a particular state, defined by a number from 0 to 6 (0 = full shutdown, 6 = full reboot) or certain other special possibilities. As explained in a previous lecture, the '&' runs a process in the background. Thus, each line in the file executes a remote shell on a system, instructing that system to reboot. The init command in each case is run in the background so that the rsh command can immediately return control to the rebootlab script in order to execute the next rsh command. The end result? With a single command, I can reboot the entire SGI lab without ever leaving the office. Note: the line for the machine 'spock' is commented out. This is because the Indy called spock is currently in the technician's office, ie. not in service. This is a good example of where I could make the script more efficient by using a for loop, something along the lines of: for each name in this list of names, do . As should be obvious, the rebootlab script makes no attempt to check if anybody is logged into the system. So in practice I use the rusers command to make sure nobody is logged on before executing the script. This is where the script could definitely be improved: the command sent by rsh to each system could be modified with some extra commands so that each system is only rebooted if nobody is logged in at the time (the 'who' command could probably be used for this, eg. 'who | grep -v root' would give no output if nobody was logged on). The following script, called 'remountmapleson', is one I use when I go home in the evening, or perhaps at lunchtime to do some work on the SGI I use at home. rsh yoda umount /mapleson && mount /mapleson & rsh akira umount /mapleson && mount /mapleson & rsh ash umount /mapleson && mount /mapleson & rsh cameron umount /mapleson && mount /mapleson & rsh chan umount /mapleson && mount /mapleson & rsh conan umount /mapleson && mount /mapleson & rsh gibson umount /mapleson && mount /mapleson & rsh indiana umount /mapleson && mount /mapleson & rsh leon umount /mapleson && mount /mapleson & rsh merlin umount /mapleson && mount /mapleson & rsh nikita umount /mapleson && mount /mapleson & rsh ridley umount /mapleson && mount /mapleson & rsh sevrin umount /mapleson && mount /mapleson & rsh solo umount /mapleson && mount /mapleson & #rsh spock umount /mapleson && mount /mapleson &

rsh rsh rsh rsh

stanley umount /mapleson && mount /mapleson & warlock umount /mapleson && mount /mapleson & wolfen umount /mapleson && mount /mapleson & woo umount /mapleson && mount /mapleson &

Figure 36. The simple remountmapleson script.

When I leave for home each day, my own external disk (where my own personal user files reside) goes with me, but this means the mount status of the /mapleson directory for every SGI in Ve24 is now out-of-date, ie. each system still has the directory mounted even though the file system which was physically mounted from the remote system (called milamber) is no longer present. As a result, any attempt to access the /mapleson directory would give an error: "Stale NFS file handle." Even listing the contents of the root directory would show the usual files but also the error as well. To solve this problem, the script makes every system unmount the /mapleson directory and, if that was successfully done, remount the directory once more. Without my disk present on milamber, its /mapleson directory simply contains a file called 'README' whose contents state: Sorry, /mapleson data not available - my external disk has been temporarily removed. I've probably gone home to work for a while. If you need to contact me, please call .

As soon as my disk is connected again and the script run once more, milamber's local /mapleson contents are hidden by my own files, so users can access my home directory once again. Thus, I'm able to add or remove my own personal disk and alter what users can see and access at a global level without users ever noticing the change. Note: the server still regards my home directory as /mapleson on milamber, so in order to ensure that I can always logon to milamber as mapleson even if my disk is not present, milamber's /mapleson directory also contains basic .cshrc, .login and .profile files. Yet again, a simple script is created to solve a particular problem.

Command Arguments. When a command or program is executed, the name of the command and any parameters are passed to the program as arguments. In shell scripts, these arguments can be referenced via the '$' symbol. Argument 0 is always the name of the command, then argument 1 is the first parameter, argument 2 is the second parameter, etc. Thus, the following script called (say) 'go': echo $0 echo $1 echo $2

would give this output upon execution: % go somewhere nice go somewhere nice

Including extra echo commands such 'echo $3' merely produces blank lines after the supplied parameters are displayed. If one examines any typical system shell script, this technique of passing parameters and referencing arguments is used frequently. As an example, I once used the technique to aid in the processing of a large number of image files for a movie editing task. The script I wrote is also typical of the general complexity of code which most admins have to deal with; called 'go', it contained: subimg $1 a.rgb 6 633 6 209 gammawarp a.rgb m.rgb 0.01 mult a.rgb a.rgb n.rgb mult n.rgb m.rgb f.rgb addborder f.rgb b.rgb x.rgb subimg x.rgb ../tmp2/$1 0 767 300 875

(the commands used in this script are various image processing commands that are supplied as part of the Graphics Library Image Tools software subsystem. Consult the relevant man pages for details) The important feature is the use of the $1 symbol in the first line. The script expects a single parameter, ie. the name of the file to be processed. By eventually using this same argument at the end of an alternative directory reference, a processed image file with the same name is saved elsewhere after all the intermediate processing steps have finished. Each step uses temporary files created by previous steps. When I used the script, I had a directory containing 449 image files, each with a different name: i000.rgb i001.rgb i002.rgb . . . i448.rgb

To process all the frames in one go, I simply entered this command: find . -name "i*.rgb" -print -exec go {} \;

As each file is located by the find command, its name is passed as a parameter to the go script. The use of the -print option displays the name of each file before the go script begins processing the file's contents. It's a simple way to execute multiple operations on a large number of files.

Secure/Restricted Shell Scripts. It is common practice to include the following line at the start of a shell script: #!/bin/sh

This tells any shell what to use to interpret the script if the script is simply executed, as opposed to sourcing the script within the shell. The 'sh' shell is a lower level shell than csh or tcsh, ie. it's more restricted in what it can do and does not have all the added features of csh and tcsh. However, this means a better level of security, so many scripts (especially as-standard system scripts) include the above line in order to make sure that security is maximised. Also, by starting a new shell to run the script in, one ensures that the commands are always performed in the same way, ie. a script without the above line may work slightly differently when executed from within different shells (csh, tcsh, etc.), perhaps because of any aliases present in the current shell environment, or a customised path definition, etc.

Detailed Notes for Day 2 (Part 3) UNIX Fundamentals: System Monitoring Tools.

Running a UNIX system always involves monitoring how a system is behaving on a daily basis. Admins must keep an eye on such things as:       

disk space usage system performance and statistics, eg. CPU usage, disk I/O, memory, etc. network performance and statistics system status, user status service availability, eg. Internet access system hardware failures and related maintenance suspicious/illegal activity

Figure 37. The daily tasks of an admin.

This section explains the various system monitoring tools, commands and techniques which an admin can use to monitor the areas listed above. Typical example administration tasks are discussed in a later lecture. The focus here is on available tools and what they offer, not on how to use them as part of an admin strategy.

Disk Space Usage. The df command reports current disk space usage. Run on its own, the output is expressed in terms of numbers of blocks used/free, eg.: yoda # df Filesystem /dev/root /dev/dsk/dks4d5s7 milamber:/mapleson

Type blocks use avail %use Mounted on xfs 8615368 6116384 2498984 71 / xfs 8874746 4435093 4439653 50 /home nfs 4225568 3906624 318944 93 /mapleson

Figure 38. Using df without options.

A block is 512 bytes. But most people tend to think in terms of kilobytes, megabytes and gigabytes, not multiples of 512 bytes. Thus, the -k option can be used to show the output in K: yoda # df -k Filesystem /dev/root /dev/dsk/dks4d5s7 milamber:/mapleson

Type xfs xfs nfs

kbytes 4307684 4437373 2112784

Figure 39. The -k option with df to show data in K.

use 3058192 2217547 1953312

avail %use Mounted on 1249492 71 / 2219826 50 /home 159472 93 /mapleson

The df command can be forced to report data only for the file system housing the current directory by adding a period: yoda # cd /home && df -k . Filesystem Type /dev/dsk/dks4d5s7 xfs

kbytes 4437373

use 2217547

avail %use Mounted on 2219826 50 /home

Figure 40. Using df to report usage for the file system holding the current directory.

The du command can be used to show the amount of space used by a particular directory or file, or series of directories and files. The -k option can be used to show usage in K instead of 512byte blocks just as with df. du's default behaviour is to report a usage amount recursively for every sub-directory, giving a total at the end, eg.: yoda # du -k /usr/share/data/models 436 /usr/share/data/models/sgi 160 /usr/share/data/models/food 340 /usr/share/data/models/toys 336 /usr/share/data/models/buildings 412 /usr/share/data/models/household 864 /usr/share/data/models/scenes 132 /usr/share/data/models/chess 1044 /usr/share/data/models/geography 352 /usr/share/data/models/CyberHeads 256 /usr/share/data/models/machines 1532 /usr/share/data/models/vehicles 88 /usr/share/data/models/simple 428 /usr/share/data/models/furniture 688 /usr/share/data/models/robots 7760 /usr/share/data/models

Figure 41. Using du to report usage for several directories/files.

The -s option can be used to restrict the output to just an overall total for the specified directory: yoda # du -k -s /usr/share/data/models 7760 /usr/share/data/models

Figure 42. Restricting du to a single directory.

By default, du does not follow symbolic links, though the -L option can be used to force links to be followed if desired. However, du does examine NFS-mounted file systems by default. The -l and -m options can be used to restrict this behaviour, eg.: ASH # cd / ASH # du -k -s -l 0 CDROM 0 bin 0 debug 68 dev

0 2 0 299 0 2421 2579 0 0 1 4391 565 65 3927 397570 6346

disk2 diskcopy dumpster etc home lib lib32 opt proc root.home sbin stand tmp unix usr var

Figure 43. Forcing du to ignore symbolic links.

The output in Fig 43 shows that the /home directory has been ignored. Another example: a user can find out how much disk space their account currently uses by entering: du -k -s ~/ Swap space (ie. virtual memory on disk) can be monitored using the swap command with the -l option. For full details on these commands, see the relevant man pages. Commands relating to file system quotas are dealt with in a later lecture.

System Performance. This includes processor loading, disk loading, etc. The most common command used by admins/users to observe CPU usage is ps, which displays a list of currently running processes along with associated information, including the percentage of CPU time currently being consumed by each process, eg.: ASH 6# ps -ef UID PID root 0 root 1 root 2 root 3 root 4 root 5 root 900 [etc] root 7 root 8

PPID 0 0 0 0 0 0 895 0 0

C 0 0 0 0 0 0 0

STIME 08:00:41 08:00:41 08:00:41 08:00:41 08:00:41 08:00:41 08:03:27

TTY ? ? ? ? ? ? ?

0 08:00:41 ? 0 08:00:41 ?

TIME 0:01 0:01 0:00 0:03 0:00 0:02 1:25

CMD sched /etc/init vhand bdflush munldd vfs_sync /usr/bin/X11/Xsgi -bs

0:00 shaked 0:00 xfsd

root 9 0 0 08:00:41 root 10 0 0 08:00:41 root 11 0 0 08:00:41 root 909 892 0 08:03:31 root 1512 1509 0 15:37:17 root 158 1 0 08:01:01 root 70 1 0 08:00:50 root 1536 211 0 16:06:04 root 148 1 0 08:01:00 root 146 1 0 08:01:00 root 173 172 0 08:01:03 root 172 1 0 08:01:03 root 174 172 0 08:01:03 root 175 172 0 08:01:03 root 178 1 0 08:01:03 root 179 1 0 08:01:03 root 180 1 0 08:01:03 root 181 1 0 08:01:03 root 189 1 0 08:01:04 root 190 1 0 08:01:04 root 191 1 0 08:01:04 root 202 1 0 08:01:05 root 192 1 0 08:01:04 root 188 1 0 08:01:03 root 311 1 0 08:01:08 root 211 1 0 08:01:05 root 823 1 0 08:01:33 q15m root 1557 1537 9 16:10:58 root 892 1 0 08:03:25 root 1513 1512 0 15:37:17 root 1546 872 0 16:07:55 /usr/Cadmin/bin/directoryserver root 1537 1536 1 16:06:04 root 903 1 0 08:03:27 lp 460 1 0 08:01:17 root 1509 895 0 15:37:13 root 488 1 0 08:01:19 root 1556 1537 28 16:10:56 print root 895 1 0 08:03:27 root 872 1 0 08:02:32 /usr/Cadmin/bin/directoryserver

? ? ? ? ? ? ? pts/0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

0:00 0:00 0:00 0:02 0:00 0:01 0:00 0:00 0:01 0:00 0:01 0:01 0:01 0:01 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:02 0:13

xfsd xfsd pdflush /usr/etc/videod sh -c /var/X11/xdm/Xlogin /usr/etc/ypbind -ypsetme /usr/etc/syslogd rlogind /usr/etc/routed -h -[etc] /usr/etc/portmap /usr/etc/nfsd 4 /usr/etc/nfsd 4 /usr/etc/nfsd 4 /usr/etc/nfsd 4 /usr/etc/biod 4 /usr/etc/biod 4 /usr/etc/biod 4 /usr/etc/biod 4 bio3d bio3d bio3d /usr/etc/rpc.statd bio3d bio3d /usr/etc/timed -M -F yoda /usr/etc/inetd /usr/lib/sendmail -bd -

pts/0 ? ? ?

0:00 ps -ef 0:00 /usr/etc/videod 0:07 /usr/Cadmin/bin/clogin -f 0:00

pts/0 tablet ? ? ? pts/0

0:01 0:00 0:00 0:00 0:01 0:01

? ?

0:00 /usr/bin/X11/xdm 0:06

-tcsh /sbin/getty ttyd1 co_9600 /usr/lib/lpsched /usr/bin/X11/xdm /sbin/cron find /usr -name *.txt -

Figure 44. Typical output from the ps command.

Before obtaining the output shown in Fig 44, I ran a find command in the background. The output shows that the find command was utilising 28% of available CPU resources; tasks such as find are often limited by the speed and bandwidth capacity of the disk, not the speed of the main CPU. The ps command has a variety of options to show or not show various information. Most of the time though, 'ps -ef' is adequate to display the kind of information required. Note that other

UNIX variants use slightly different options, eg. the equivalent command on SunOS would be 'ps -aux'. One can use grep to only report data for a particular process, eg.: ASH 5# ps -ef | grep lp

lp

460

1

0 08:01:17 ?

0:00 /usr/lib/lpsched

Figure 45. Filtering ps output with grep.

This only reports data for the lp printer scheduler.

However, ps only gives a snapshot of the current system state. Often of more interest is a system's dynamic behaviour. A more suitable command for monitoring system performance over time is 'top', a typical output of which looks like this: IRIX ASH 6.2 03131015 IP22 Load[0.22,0.12,0.01] 16:17:47 166 procs user pid pgrp %cpu proc pri size rss time command root 1576 1576 24.44 * 20 386 84 0:02 find root 1577 1577 0.98 0 65 432 100 0:00 top root 1513 1509 0.18 * 60 4322 1756 0:07 clogin root 900 900 0.12 * 60 2858 884 1:25 Xsgi root 146 146 0.05 * 60 351 77 0:00 portmap root 158 0 0.05 * 60 350 81 0:00 ypbind root 1567 1567 0.02 * 60 349 49 0:00 rlogind root 3 0 0.01 * +39 0 0 0:03 bdflush root 172 0 0.00 * 61 0 0 0:00 nfsd root 173 0 0.00 * 61 0 0 0:00 nfsd root 174 0 0.00 * 61 0 0 0:00 nfsd root 175 0 0.00 * 61 0 0 0:00 nfsd Figure 46. top shows a continuously updated output.

From the man page for top: "Two header lines are displayed. The first gives the machine name, the release and build date information, the processor type, the 1, 5, and 15 minute load average, the current time and the number of active processes. The next line is a header containing the name of each field highlighted."

The display is constantly updated at regular intervals, the duration of which can be altered with the -i option (default duration is 5 seconds). top shows the following data for each process:

"user name, process ID, process group ID, CPU usage, processor currently executing the process (if process not currently running), process priority, process size (in pages), resident set size (in pages), amount of CPU time used by the process, and the process name."

Just as with the ps command, top shows the ID number for each process. These IDs can be used with the kill command (and others) to control running processes, eg. shut them down, suspend them, etc. There is a GUI version of top called gr_top. Note that IRIX 6.5 contains a newer version of top which gives even more information, eg.: IRIX WOLFEN 6.5 IP22 load averages: 0.06 0.01 0.00 17:29:44 58 processes: 56 sleeping, 1 zombie, 1 running CPU: 93.5% idle, 0.5% usr, 5.6% ker, 0.0% wait, 0.0% xbrk, 0.5% intr Memory: 128M max, 116M avail, 88M free, 128M swap, 128M free swap PID COMMAND 1372 153 1364 rlogind

PGRP USERNAME PRI 1372 root 153 root 1364 root

SIZE

RES STATE

20 2204K 1008K run/0 20 2516K 1516K sleep 20 1740K 580K sleep

TIME WCPU% CPU% 0:00 0:05 0:00

0.2 0.1 0.0

3.22 top 1.42 nsd 0.24

Figure 47. The IRIX 6.5 version of top, giving extra information.

A program which offers much greater detail than top is osview. Like top, osview constantly updates a whole range of system performance statistics. Unlike top though, so much information is available from osview that it offers several different 'pages' of data. The number keys are used to switch between pages. Here is a typical output for each of the five pages: Page 1 (system information): Osview 2.1 : One Second Average int=5s Load Average fs ctl 1 Min 0.000 fs data 5 Min 0.000 delwri 15 Min 0.000 free CPU Usage data %user 0.20 empty %sys 0.00 userdata %intr 0.00 reserved %gfxc 0.00 pgallocs %gfxf 0.00 Scheduler %sxbrk 0.00 runq %idle 99.80 swapq System Activity switch syscall 19 kswitch read 1 preempt

WOLFEN 17:32:13 04/21/99 #5 2.0M 7.7M 0 87.5M 26.0M 61.4M 20.7M 0 2 0 0 4 95 1

write 0 fork 0 exec 0 readch 19 writech 38 iget 0 System Memory Phys 128.0M kernel 10.1M heap 3.9M mbufs 96.0K stream 40.0K ptbl 1.2M

Wait Ratio %IO %Swap %Physio

1.2 0.0 0.0

Figure 48. System information from osview.

Page 2 (CPU information): Osview 2.1 : One Second Average int=5s CPU Usage %user 0.00 %sys 100.00 %intr 0.00 %gfxc 0.00 %gfxf 0.00 %sxbrk 0.00 %idle 0.00

WOLFEN 17:36:27 04/21/99 #1

Figure 49. CPU information from osview.

Page 3 (memory information): Osview 2.1 : One Second Average WOLFEN 17:36:56 04/21/99 #1 int=5s System Memory iclean 0 Phys 128.0M *Swap kernel 10.5M *System VM heap 4.2M *Heap mbufs 100.0K *TLB Actions stream 48.0K *Large page stats ptbl 1.3M fs ctl 1.5M fs data 8.2M delwri 0 free 77.1M data 28.8M empty 48.3M userdata 30.7M reserved 0 pgallocs 450 Memory Faults

vfault protection demand cw steal onswap oncache onfile freed unmodswap unmodfile

1.7K 225 375 25 375 0 1.4K 0 0 0 0

Figure 50. Memory information from osview.

Page 4 (network information): Osview 2.1 : One Second Average int=5s TCP acc. conns 0 sndtotal 33 rcvtotal 0 sndbyte 366 rexmtbyte 0 rcvbyte 0 UDP ipackets 0 opackets 0 dropped 0 errors 0 IP ipackets 0 opackets 33 forward 0 dropped 0 errors 0 NetIF[ec0] Ipackets 0 Opackets 33 Ierrors 0 Oerrors 0 collisions 0 NetIF[lo0]

WOLFEN 17:38:15 04/21/99 #1

Figure 51. Network information from osview.

Page 5 (miscellaneous): Osview 2.1 : One Second Average int=5s Block Devices lread 37.5K

WOLFEN 17:38:43 04/21/99 #1

bread 0 %rcache 100.0 lwrite 0 bwrite 0 wcancel 0 %wcache 0.0 phread 0 phwrite 0 Graphics griioctl 0 gintr 75 swapbuf 0 switch 0 fifowait 0 fifonwait 0 Video vidioctl 0 vidintr 0 drop_add 0 *Interrupts *PathName Cache *EfsAct *XfsAct *Getblk *Vnodes Figure 51. Miscellaneous information from osview.

osview clearly offers a vast amount of information for monitoring system and network activity. There is a GUI version of osview called gr_osview. Various options exist to determine which parameters are displayed with gr_osview, the most commonly used being -a to display as much data as possible. Programs such as top and osview may be SGI-specific (I'm not sure). If they are, other versions of UNIX are bound to have equivalent programs to these. Example use: although I do virtually all the administration of the server remotely using the office Indy (either by command line or GUI tools), there is also a VT-style terminal in my office connected to the server's serial port via a lengthy cable (the Challenge S server itself is in a small ante room). The VT display offers a simple text-only interface to the server; thus, most of the time, I leave osview running on the VT display so that I can observe system activity whenever I need to. The VT also offers an extra communications link for remote administration should the network go down, ie. if the network links fail (eg. broken hub) the admin Indy cannot be used to communicate with the server, but the VT still can. Another tool for monitoring memory usage is gmemusage, a GUI program which displays a graphical split-bar chart view of current memory consumption. gmemusage can also display a breakdown of the regions within a program's memory space, eg. text, data, shared memory, etc.

Much lower-level tools exist too, such as sar (system activity reporter). In fact, osview works by using sar. Experienced admins may use tools like sar, but most admins will prefer to use higherlevel tools such as top, osview and gmemusage. However, since sar gives a text output, one can use it in script files for automated system information gathering, eg. a system activity report produced by a script, executed every hour by the cron job-scheduling system (sar-based information gathering scripts are included in the cron job schedule as standard). sar can be given options to report only on selected items, eg. the number of processes in memory waiting for CPU resource time. sar can be told to monitor some system feature for a certain period, saving the data gathered during that period to a file. sar is a very flexible program.

Network Performance and Statistics. osview can be used to monitor certain network statistics, but another useful program is ttcp. The online book, "IRIX Admin: Networking and Mail", says: "The ttcp tool measures network throughput. It provides a realistic measurement of network performance between two stations because it allows measurements to be taken at both the local and remote ends of the transmission."

To run a test with ttcp, enter the following on one system, eg. sevrin: ttcp -r -s

Then enter the following on another system, eg. akira: ttcp -t -s sevrin

After a delay of roughly 20 seconds for a 10Mbit network, results are reported by both systems, which will look something like this: SEVRIN # ttcp -r -s ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from 193.61.252.2 ttcp-r: 16777216 bytes in 18.84 real seconds = 869.70 KB/sec +++ ttcp-r: 3191 I/O calls, msec/call = 6.05, calls/sec = 169.39 ttcp-r: 0.1user 3.0sys 0:18real 16% 118maxrss 0+0pf 1170+1csw AKIRA # ttcp-t: sevrin ttcp-t: ttcp-t: ttcp-t: ttcp-t: ttcp-t:

ttcp -t -s sevrin buflen=8192, nbuf=2048, align=16384/0, port=5001

tcp

->

socket connect 16777216 bytes in 18.74 real seconds = 874.19 KB/sec +++ 2048 I/O calls, msec/call = 9.37, calls/sec = 109.27 0.0user 2.3sys 0:18real 12% 408maxrss 0+0pf 426+4csw

Figure 52. Results from ttcp between two hosts on a 10Mbit network.

Full details of the output are in the ttcp man page, but one can immediately see that the observed network throughput (around 870KB/sec) is at a healthy level.

Another program for gathering network performance information is netstat. The online book, "IRIX Admin: Networking and Mail", says: "The netstat tool displays various network-related data structures that are useful for monitoring and troubleshooting a network. Detailed statistics about network collisions can be captured with the netstat tool."

netstat is commonly used with the -i option to list basic local network information, eg.: yoda # netstat -i Name Mtu Network Coll ec0 1500 193.61.252 553847 ec3 1500 193.61.250 16460 lo0 8304 loopback 0

Address

Ipkts Ierrs

Opkts Oerrs

yoda.comp.uclan

3906956

3

2945002

0

gate-yoda.comp.

560206

2

329366

0

localhost

476884

0

476884

0

Figure 53. The output from netstat.

Here, the packet collision rate has averaged at 18.8%. This is within acceptable limits [1]. Another useful command is 'ping'. This program sends packets of data to a remote system requesting an acknowledgement response for each packet sent. Options can be used to send a specific number of packets, or send as many packets as fast as they are returned, send a packet every so often (user-definable duration), etc. For example: MILAMBER # ping yoda PING yoda.comp.uclan.ac.uk (193.61.252.1): 56 data bytes 64 bytes from 193.61.252.1: icmp_seq=0 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=1 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=2 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=3 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=4 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=5 ttl=255 time=1 ms 64 bytes from 193.61.252.1: icmp_seq=6 ttl=255 time=1 ms ----yoda.comp.uclan.ac.uk PING Statistics---7 packets transmitted, 7 packets received, 0% packet loss round-trip min/avg/max = 1/1/1 ms

Figure 54. Example use of the ping command.

I pressed CTRL-C after the 7th packet was sent. ping is a quick and easy way to see if a host is active and if so how responsive the connection is. If a ping test produces significant packet loss on a local network, then it is highly likely there exists a problem of some kind. Normally, one would rarely see a non-zero packet loss on a local network from a direct machine-to-machine ping test. A fascinating use of ping I once observed was at The Moving Picture Company (MPC) [2]. The admin had written a script which made every host on the network send a ping test to every other host. The results were displayed as a table with host names shown down the left hand side as well as along the top. By looking for horizontal or diagonal lines of unusually large ping times, the admin could immediately see if there was a problem with a single host, or with a larger part of the network. Because of the need for a high system availability rate, the script allows the admin to spot problems almost as soon as they occur, eg. by running the script once every ten seconds. When the admin showed me the script in use, one column had rather high ping times (around 20ms). Logging into the host with rlogin, ps showed everything was ok: a complex process was merely consuming alot of CPU time, giving a slower network response.

System Status and User Status. The rup command offers an immediate overview of current system states, eg.: yoda # rup yoda.comp.uclan.ac.u 0.35 gate-yoda.comp.uclan 0.35 wolfen.comp.uclan.ac 0.00 conan.comp.uclan.ac. 0.00 akira.comp.uclan.ac. 0.00 nikita.comp.uclan.ac 0.00 gibson.comp.uclan.ac 0.00 woo.comp.uclan.ac.uk 0.00 solo.comp.uclan.ac.u 0.00 cameron.comp.uclan.a 0.00

up

6 days,

8:25,

load average: 0.33, 0.36,

up

6 days,

8:25,

load average: 0.33, 0.36,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.06, 0.01,

up

11:28,

load average: 0.01, 0.00,

up

11:28,

load average: 0.03, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.01, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.02, 0.00,

sevrin.comp.uclan.ac 0.50 ash.comp.uclan.ac.uk 0.00 ridley.comp.uclan.ac 0.00 leon.comp.uclan.ac.u 0.00 warlock.comp.uclan.a 0.11 milamber.comp.uclan. 0.00 merlin.comp.uclan.ac 0.00 indiana.comp.uclan.a 0.02 stanley.comp.uclan.a 0.00

up

11:28,

load average: 0.69, 0.46,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

1:57,

load average: 0.08, 0.13,

up

9:52,

load average: 0.11, 0.07,

up

11:28,

load average: 0.01, 0.00,

up

11:28,

load average: 0.00, 0.00,

up

1:56,

load average: 0.00, 0.00,

Figure 55. The output from rup.

The load averages for a single machine can be ascertained by running 'uptime' on that machine, eg.: MILAMBER 84# uptime 8:05pm up 10:28, 6 users, load average: 0.07, 0.06, 0.25 MILAMBER 85# rsh yoda uptime 8:05pm up 6 days, 9:02, 2 users, load average: 0.47, 0.49, 0.42 Figure 56. The output from uptime.

The w command displays current system activity, including what each user is doing. The man page says, "The heading line shows the current time of day, how long the system has been up, the number of users logged into the system, and the load averages." For example: yoda # w 8:10pm up 6 days, 9:07, User tty from root q0 milamber.comp. cmprj ftp UNKNOWN@ns5ip.

2 users, load average: 0.51, 0.50, 0.41 login@ idle JCPU PCPU what 7:02pm 8 w 7:29pm -

Figure 57. The output from w showing current user activity.

With the -W option, w shows the 'from' information on a separate line, allowing one to see the full domain address of ftp connections, etc.: yoda # w -W 8:11pm up 6 days, 9:08, 2 users, load average: 0.43, 0.48, 0.40 User tty login@ idle JCPU PCPU what root ttyq0 7:02pm 8 w -W milamber.comp.uclan.ac.uk cmprj ftp22918 7:29pm [email protected]

Figure 58. Obtaining full domain addresses from w with the -W option.

The rusers command broadcasts to all machines on the local network, gathering data about who is logged on and where, eg.: yoda # rusers yoda.comp.uclan.ac.uk wolfen.comp.uclan.ac.uk gate-yoda.comp.uclan.ac.uk milamber.comp.uclan.ac.uk warlock.comp.uclan.ac.uk

root guest guest root root root root root mapleson mapleson sensjv sensjv

Figure 59. The output from rusers, showing who is logged on where.

The multiple entries for certain users indicate that more than one shell is active for that user. As usual, my login data shows I'm doing several things at once. rusers can be modified with options to:   



report for all machines, whether users are logged in or not (-a), probe a specific machine (supply host name(s) as arguments), display the information sorted alphabetically by: o host name (-h), o idle time (-i), o number of users (-u), give a more detailed output in the same style as the who command (-l).

Service Availability.

The most obvious way to check if a service is available for use by users is to try and use the service, eg. ftp or telnet to a test location, run up a Netscape sessions and enter a familiar URL, send an email to a local or remote account, etc. The ps command can be used to make sure the relevant background process is running for a service too, eg. 'nfsd' for the NFS system. However, if a service is experiencing problems, simply attempting to use the service will not reveal what may be wrong. For example, if one cannot ftp, it could be because of anything from a loose cable connection to some remote server that's gone down. The ping command is useful for an immediate check of network-related services such as ftp, telnet, WWW, etc. One pings each host in the communication chain to see if the hosts respond. If a host somewhere in the chain does not respond, then that host may be preventing any data from getting through (eg. a remote proxy server is down). A useful command one can use to aid in such detective work is traceroute. This command sends test packets in a similar way to ping, but it also reports how the test packets reached the target

site at each stage of the communication chain, showing response times in milliseconds for each step, eg.: yoda # traceroute www.cee.hw.ac.uk traceroute to osiris.cee.hw.ac.uk (137.195.52.12), 30 hops max, 40 byte packets 1 193.61.250.33 (193.61.250.33) 6 ms (ttl=30!) 3 ms (ttl=30!) 4 ms (ttl=30!) 2 193.61.250.65 (193.61.250.65) 5 ms (ttl=29!) 5 ms (ttl=29!) 5 ms (ttl=29!) 3 gw-mcc.netnw.net.uk (194.66.24.1) 9 ms (ttl=28!) 8 ms (ttl=28!) 10 ms (ttl=28!) 4 manchester-core.ja.net (146.97.253.133) 12 ms 11 ms 9 ms 5 scot-pop.ja.net (146.97.253.42) 15 ms 13 ms 14 ms 6 146.97.253.34 (146.97.253.34) 20 ms 15 ms 17 ms 7 gw1.hw.eastman.net.uk (194.81.56.110) 20 ms (ttl=248!) 18 ms 14 ms 8 cee-gw.hw.ac.uk (137.195.166.101) 17 ms (ttl=23!) 31 ms (ttl=23!) 18 ms (ttl=23!) 9 osiris.cee.hw.ac.uk (137.195.52.12) 14 ms (ttl=56!) 26 ms (ttl=56!) 30 ms (ttl=56!)

If a particular step shows a sudden jump in response time, then there may be a communications problem at that step, eg. the host in question may be overloaded with requests, suffering from lack of communications bandwidth, CPU processing power, etc.

At a lower level, system services often depend on background system processes, or daemons. If these daemons are not running, or have shut down for some reason, then the service will not be available. On the SGI Indys, one example is the GUI service which handles the use of on-screen icons. The daemon responsible is called objectserver. Older versions of this particular daemon can occasionally shut down if an illegal iconic operation is performed, or if the file manager daemon experiences an error. With no objectserver running, the on-screen icons disappear. Thus, a typical task might be to periodically check to make sure the objectserver daemon is running on all relevant machines. If it isn't, then the command sequence: /etc/init.d/cadmin stop /etc/init.d/cadmin start

restarts the objectserver. Once running, the on-screen icons return. A common cause of objectserver shutting down is when a user's desktop layout configuration files (contained in .desktop- directories) become corrupted in some way, eg. edited by hand in an incorrect manner, or mangled by some other operation (eg. a faulty Java script from a home made web page). One solution is to erase the user's desktop layout configuration directory, then login as the user and create a fresh .desktop- directory. objectserver is another example of UNIX GUI evolution. In 1996 SGI decided to replace the objectserver system entirely in IRIX 6.3 (and later) with a new service that was much more

reliable, less likely to be affected by errors made in other applications, and fully capable of supporting new 'webified' iconic services such as on-screen icons that are direct links to ftp, telnet or WWW sites. In general, checking the availability of a service requires one to check that the relevant daemons are running, that the appropriate configuration files are in place, accessible and have the correct settings, that the relevant daemon is aware of any changes which may have been made (perhaps the service needs to be stopped and restarted?) and to investigate via online information what may have caused services to fail as and when incidents occur. For every service one can use, the online information explains how to setup, admin and troubleshoot the service. The key is to know where to find that information when it is needed. A useful source of constantly updated status information is the /var/adm/SYSLOG file. This file is where any important system events are logged. One can configure all the various services and daemons to log different degrees of detailed information in the SYSLOG file. Note: logging too much detail can cause the log file to grow very quickly, in which case one would also have to ensure that it did not consume valuable disk space. The SYSLOG file records user logins, connections via ftp, telnet, etc., messages logged at system bootup/shutdown time, and many other things.

Vendor Information Updates. Most UNIX vendors send out periodic information booklets containing indepth articles on various system administration issues. SGI's bulletin is called Pipeline. Such information guides are usually supplied as part of a software support contract, though the vendor will often choose to include copies on the company web site. An admin should read any relevant articles from these guides - they can often be unexpectedly enlightening.

System Hardware Failures. When problems occur on a system, what might at first appear to be a software problem may in fact be a hardware fault. Has a disk failed? The fx program can be used to check disk status (block read tests, disk label checks, etc.) Has a network cable failed? Are all the cable connections firmly in place in the hub? Has a plug come loose? In late 1998, the Ve24 network stopped operating quite unexpectedly one morning. The errors made it appear that there was a problem with the NFS service or perhaps the main user files disk connected to the server; in fact, the fault lay with the Ve24 hub. The online guides have a great deal of advice on how to spot possible hardware failures. My advice is to check basic things first and move onto the more complex possibilities later. In the above example, I wasted a great deal of time investigating whether the NFS service was

responsible, or the external user files disk, when in fact I should have checked the hub connections first. As it happens, the loose connection was such that the hub indicator light was on even though the connection was not fully working (thus, a visual check would not have revealed the problem) - perhaps the fault was caused by a single loose wire out of the 8 running through the cable, or even an internal fault in the hub (more likely). Either way, the hub was eventually replaced. Other things that can go wrong include memory faults. Most memory errors are not hardware errors though, eg. applications with bugs can cause errors by trying to access some protected area of memory. Hardware memory errors will show up in the system log file /var/adm/SYSLOG as messages saying something like 'Hardware ECC Memory Error in SIMM slot 4'. By swapping the memory SIMMs around between the slots, one can identify which SIMM is definitely at fault (assuming there is only one causing the problem). The most common hardware component to go wrong on a system, even a non-PC system, is the disk drive. When configuring systems, or carrying out upgrades/expansions, it is wise to stick with models recommended by the source vendor concerned, eg. SGI always uses high-quality Seagate, IBM or Quantum disk drives for their systems; thus, using (for example) a Seagate drive is a good way to ensure a high degree of reliability and compatibility with the system concerned. Sometimes an admin can be the cause of the problem. For example, when swapping disks around or performing disk tasks such as disk cloning, it is possible to incorrectly set the SCSI ID of the disk. SGI systems expect the system disk to be on SCSI ID 1 (though this is a configurable setting); if the internal disk is on the wrong SCSI ID, then under certain circumstances it can appear to the system as if there are multiple disks present, one on each possible ID. If hardware errors are observed on bootup (the system diagnostics checks), then the first thing to do is to reboot and enter the low-level 'Command Monitor' (an equivalent access method will exist for all UNIX systems): the Command Monitor has a small set of commands available, some of which can be used to perform system status checks, eg. the hinv command. For the problem described above, hinv would show multiple instances of the same disk on all SCSI IDs from 1 to 7 - the solution is to power down and check the SCSI jumpers carefully. Other problems can occasionally be internal, eg. a buildup of dust blocking air vents (leading to overheating), or a fan failure, followed by overheating and eventually an automatic system shutdown (most UNIX systems' power supplies include circuitry to monitor system temperature, automatically shutting down if the system gets too hot). This leads on to questions of system maintenance which will be dealt with on Day 3. After disk failures, the other most common failure is the power supply. It can sometimes be difficult to spot because a failure overnight or when one isn't around can mean the system shuts down, cools off and is thus rebootable again the next morning. All the admin sees is a system that's off for no readily apparent reason the next morning. The solution is to, for example, move the system somewhere close at hand so that it can be monitored, or write a script which tests

whether the system is active every few seconds, logging the time of each successful test - if the system goes down, the admin is notified in some way (eg. audio sound file played) and the admin can then quickly check the machine - if the power supply area feels overly hot, then that is the likely suspect, especially if an off/on mains switch toggle doesn't turn the system back on (power supplies often have circuitry which will not allow power-on if the unit is still too hot). If the admin wasn't available at the time, then the logged results can show when the system failed. All SGIs (and UNIX systems in general) include a suite of hardware and software diagnostics tests as part of the OS. IRIX contains a set of tests for checking the mouse, keyboard, monitor, audio ports, digital camera and other basic hardware features. Thankfully, for just about any hardware failure, hardware support contracts cover repairs and/or replacements very effectively for UNIX systems. It's worth noting that although the Computing Department has a 5-day support contract with SGI, all problems I've encountered so far have been dealt either on the same day or early next morning by a visiting support engineer (ie. they arrived much earlier than they legally had to). Since November 1995 when I took charge of the Ve24 network, the hardware problems I've encountered have been:         

2 failed disks 1 replaced power supply 2 failed memory SIMMs (1 failed SIMM from two different machines) 1 replaced keyboard (user damage) 1 failed monitor 1 suspect motherboard (replaced just in case) 1 suspect video card (replaced just in case) 1 problematic 3rd-party disk (incorrect firmware, returned to supplier and corrected with up-todate firmware; now operating ok) 1 suspect hub (unknown problem; replaced just in case)

Given that the atmosphere in Ve24 is unfiltered, often humid air, and the fact that many of the components in the Indys in Ve24 have been repeatedly swapped around to create different configurations at different times, such a small number of failures is an excellent record after nearly 4 years of use.

It is likely that dirty air (dust, humidity, corrosive gases) was largely responsible for the disk, power supply and memory failures - perhaps some of the others too. A build up of dust can combine with airborne moisture to produce corrosive chemicals which can short-circuit delicate components. To put the above list another way: 14 out of the 18 Indys have been running non-stop for 3.5 years without a single hardware failure of any kind, despite being housed in an area without filtered air or temperature control. This is very impressive and is quite typical of UNIX hardware platforms.

Installing systems with proper air filtering and temperature control can be costly, but the benefit may be a much reduced chance of hardware failure - this could be important for sites with many more systems and a greater level of overall system usage (eg. 9 to 5 for most machines). Some companies go to great lengths to minimise the possibility of hardware failure. For example, MPC [2] has an Origin200 render farm for rendering movie animation frames. The render farm consists of 50 Origin200 servers, each with 2 R10000 CPUs, ie. 100 CPUs in total. The system is housed in a dedicated room with properly filtered air and temperature control. Almost certainly as a result of this high-quality setup, MPC has never had a single hardware failure of any kind in nearly 3 years of operation. Further, MPC has not experienced a single OS failure over the same period either, even though the system operates 24hours/day. This kind of setup is common amongst companies which have time-critical tasks to perform, eg. oil companies with computational models that can take six months to complete - such organisations cannot afford to have failures (the problem would likely have to be restarted from scratch, or at least delayed), so it's worth spending money on air filters, etc. If one does not have filtered air, then the very least one should do is keep the systems clean inside and out, performing system cleaning on a regular basis. At present, my current policy is to thoroughly clean the Indys twice a year: every machine is stripped right down to the bare chassis; every component is individually cleaned with appropriate cleaning solutions, cloths, air-dusters, etc. (this includes removing every single key from all the keyboards and mass-cleaning them with a bucket of hot water and detergent! And cleaning the keyboard bases inside and out too). Aside from these major bi-annual cleanings, simple regular cleaning is performed on a weekly or monthly basis: removing dirt from the mice (inside especially), screen, chassis/monitor surface, cables and so on; cleaning the desks; opening each system and blowing away internal dust using a can of compressed filtered air, etc. Without a doubt, this process greatly lengthens the life-span of the systems' hardware components, and users benefit too from a cleaner working environment - many new students each autumn often think the machines must be new because they look so clean. Hardware failures do and will occur on any system whether it's a UNIX platform or not. An admin can use information from online sources, combined with a knowledge of relevant system test tools such as fx and ping, to determine the nature of hardware failures and take corrective action (contacting vendor support if necessary); such a strategy may include setting up automated hardware tests using regularly-executed scripts. Another obvious source of extensive information about any UNIX platform is the Internet. Hundreds of existing users, including company employees, write web pages [3] or USENET posts describing their admin experiences and how to deal with typical problems.

Suspicious/Illegal Activity. Users inevitably get up to mischief on occasion, or external agencies may attempt to hack the system. Types of activity could include:     

users downloading illegal or prohibited material, either with respect to national/local laws or internal company policy, accessing of prohibited sites, eg. warez software piracy sites, mail spamming and other abuses of Internet services, attacks by hackers, misuse/abuse of system services internally.

There are other possibilities, but these are the main areas. This lecture is an introduction to security and monitoring issues. A more in-depth discussion is given in the last lecture.

As an admin who is given the task of supposedly preventing and/or detecting illegal activities, the first thing which comes to mind is the use of various file-searching methods to locate suspect material, eg. searching every user's netscape bookmarks file for particular keywords. However, this approach can pose legal problems. Some countries have data protection and/or privacy laws [4] which may prohibit one from arbitrarily searching users' files. Searches of this type are the equivalent of a police force tapping all the phones in an entire street and recording every single conversation just on the off-chance that they might record something interesting; such methods are sometimes referred to as 'fishing' and could be against the law. So, for example, the following command might be illegal: find grep grep grep grep grep grep grep grep

/home/students -name "*" -print > list sex list > suspected warez list >> suspected xxx list >> suspected pics list >> suspected mpg list >> suspected jpg list >> suspected gif list >> suspected sites list >> suspected

As a means of finding possible policy violations, the above script would be very effective, but it's definitely a form of fishing (even the very first line). Now consider the following: find /home/students -name "bookmarks.html" -print -exec grep playboy {} \;

This command will effectively locate any Netscape bookmarks file which contains a possible link to the PlayBoy web site. Such a command is clearly looking for fairly specific content in a very specific file in each user's .netscape directory; further, it is probably accessing a user's

account space without her or his permission (this opens the debate on whether 'root' even needs a user's permission since root actually owns all files anyway - more on this below). The whole topic of computer file access is a grey area. For example, might the following command also be illegal? find . -name "*.jpg" -print > results && grep sex results

A user's lawyer could argue that it's clearly looking for any JPEG image file that is likely to be of an explicit nature. On the other hand, an admin's lawyer could claim the search was actually looking for any images relating to tourism in Sussex county, or musical sextets, or adverts for local unisex hair salons, and just accidentally happened to be in a directory above /home/students when the command was executed (the find would eventually reach /home/students). Obviously a setting for a messy court-room battle. But even ignoring actions taken by an admin using commands like find, what about data backups? An extremely common practice on any kind of computer system is to backup user data to a media such as DAT on a regular basis - but isn't this accessing user files without permission? But hang on, on UNIX systems, the root user is effectively the absolute owner of any file, eg. suppose a file called 'database' in /tmp, owned by an ordinary user, contained some confidential data; if the the admin (logged in as root) then did this: cat /tmp/database

the contents of the database file would indeed be displayed. Thus, since root basically owns all files anyway by default, surely a backup procedure is just the root user archiving files it already owns? If so, does one instead have to create some abstract concept of ownership in order to offer users a concrete concept of what data privacy actually is? Who decides? Nations which run their legal systems using case-law will find these issues very difficult to clarify, eg. the UK's Data Protection Act is known to be 'weak'. Until such arguments are settled and better laws created, it is best for an admin to err on the side of caution. For example, if an admin wishes to have some kind of regular search conducted, the existence of the search should be included as part of stated company policy, and enshrined into any legal documents which users must sign before they begin using the system, ie. if a user signs the policy document, then the user has agreed to the actions described in that document. Even then, such clauses may not be legally binding. An admin could also setup some form of login script which would require users to agree to a sytsem usage policy before they were fully loggedin. However, these problems won't go away, partly because of the specifics of how some modern Internet services such as the web are implemented. For example, a user could access a site which automatically forces the pop-up of a Netscape window which is directed to access a prohibited site; inline images from the new site will then be present in the user's Netscape cache directory in their home account area even though they haven't specifically tried to download anything. Are they legally liable? Do such files even count as personal data? And if the site has its own proxy

server, then the images will also be in the server's proxy cache - are those responsible for the server also liable? Nobody knows. Legal arguments on the nature of cache directories and other file system details have not yet been resolved. Clearly, there is a limit to how far one can go in terms of prevention simply because of the way computing technologies work. Thus, the best thing to do is to focus efforts on information that does not reside inside user accounts. The most obvious place is the system log file, /var/adm/SYSLOG. This file will show all the ftp and telnet sites which users have been accessing; if one of these sites is a prohibited place, then that is sufficient evidence to take action. The next most useful data resource to keep an eye on is the web server log(s). The web logs record every single access by all users to the WWW. Users have their own record of their accesses in the form of a history file, hidden in their home directory inside the .netscape directory (or other browser); but the web logs are outside their accounts and so can be probably be freely examined, searched, processed, etc. by an admin without having to worry about legal issues. Even here though, there may be legal issues, eg. log data often includes user IDs which can be used to identify specific individuals and their actions - does a user have a legal right to have such data kept private? Only a professional lawyer in the field would know the correct answer. Note: the amount of detail placed into a log file can be changed to suit the type of logging required. If a service offers different levels of logging, then the appropriate online documentation will explain how to alter the settings.

Blocking Sites. If an admin does not want users to be able to access a particular site, then that site can be added to a list of 'blocked' sites by using the appropriate option in the web server software concerned, eg. Netscape Enterprise Server, CERN web server, Apache web server, etc. Even this may pose legal problems if a country has any form of freedom-of-speech legislation though (non-existent in the UK at present, so blocking sites should be legally OK in the UK). However, blocking sites can become somewhat cumbersome because there are thousands of web sites which an admin could theoretically have to deal with - once the list becomes quite large, web server performance decreases as every access has to have its target URL checked against the list of banned sites. So, if an admin does choose to use such a policy, it is best to only add sites when necessary, and to construct some kind of checking system so that if no attempt is made to access a blocked site after a duration of, say, two weeks (whatever), then that site is removed from the list of blocked sites. In the long term, such a policy should help to keep the list to a reasonably manageable size. Even so, just the act of checking the web logs and adding sites to the list could become a costly time-consuming process (time = money = wages). One can also use packet filtering systems such as hardware routers or software daemons like ipfilterd which can accept, reject, or reject-and-log incoming packets based on source/destination IP address, host name, network interface, port number, or any combination of these. Note that

daemons such as ipfilterd may require the presence of a fast CPU if the overhead from a busy site is to be properly supported. The ipfilterd system is discussed in detail on Day 3.

System Temporary Directories. An admin should keep a regular eye on the contents of temporary directories on all systems, ie. /tmp and /var/tmp. Users may download material and leave the material lying around for anyone to see. Thus, a suspicious file can theoretically be traced to its owner via the user ID and group ID of the file. I say theoretically because, as explained elsewhere, it is possible for a user X to download a file (eg. by ftp so as to avoid the web logs, or by telnet using a shell on a remote system) and then 'hand over' ownership of the file to someone else (say user Y) using the chgrp and chown commands, making it look as though a different user is responsible for the file. In that sense, files found outside a user's home directory could not normally be used as evidence, though they would at least alert the admin to the fact that suspect activities may be occurring, permitting a refocusing of monitoring efforts, etc. However, one way in which it could be possible to reinforce such evidence is by being able to show that user Y was not logged onto the system at the time when the file in question was created (this information can be gleaned from the system's own local /var/adm/SYSLOG file, and the file's creation time and date). Unfortunately, both users could have been logged onto the same system at the time of the file's creation. Thus, though a possibility, the extra information may not help. Except in one case: video evidence. If one can show by security camera recordings that user X did indeed login 'on console' (ie. at the actual physical keyboard) then that can be tied in with SYSLOG data plus file creation times, irrespective of what user Y was doing at the time. Certainly, if someone wished to frame a user, it would not be difficult to cause a considerable amount of trouble for that user with just a little thought on how to access files, where to put them, changing ownership, etc. In reality, many admins probably just do what they like in terms of searching for files, examining users' areas, etc. This is because there is no way to prove someone has attempted to search a particular part of the file system - UNIX doesn't keep any permanent record of executed commands. Ironically, the IRIX GUI environment does keep a record of any file-related actions taken with the GUI system (icons, file manager windows, directory views, etc.) but the log file with this information is kept inside the user's .desktop- directory and thus may be legally out of bounds.

File Access Permissions. Recall the concept of file access permissions for files. If a user has a directory or file with its permissions set so that another ordinary user can read it (ie. not just root, who can access

anything by default anyway), does the fact that the file is globally readable mean the user has by default given permission for anyone else to read the file? If one says no, then that would mean it is illegal to read any user's own public_html web area! If one says yes, and a legal body confirmed this for the admin, then that would at least enable the admin to examine any directory or file that had the groups and others permissions set to a minimum of read-only (read and executable for directories). The find command has an option called -perm which allows one to search for files with permissions matching a given mode. If nothing else, such an ability would catch out careless users since most users are not aware that their account has hidden directories such as .netscape. An admin ought to make users aware of security issues beforehand though.

Backup Media. Can an admin search the data residing on backup media? (DAT, CDR, ZIP, DLT, etc.) After all, the data is no longer inside the normal home account area. In my opinion yes, since root owns all files anyway (though I've never done such a search), but others might disagree. For that matter, consider the tar commands commonly used to perform backups: a full backup accesses every file on the file system by default (ie. including all users' files, whatever the permissions may be), so are backups a problem area? Yet again, one can easily see how legal grey areas emerge concerning the use of computing technologies.

Conclusion. Until the law is made clearer and brought up-to-date (unlikely) the best an admin can do is consult any available internal legal team, deciding policy based on any advice given.

References: 1. "Ethernet Collisions on Silicon Graphics Systems", SGI Pipeline magazine (support info bulletin), July/August 1998 (NB: URL not accessible to those without a software support contract): 2. http://support.sgi.com/surfzone/content/pipeline/html/19980404EthernetC ollisions.html

3. The Moving Picture Company, Soho Square, London. Responsible for some or all of the special effects in Daylight, The English Patient, Goldeneye, The Borrowers, and many other feature films, music videos, adverts, etc. Hardware used: several dozen Octane workstations, many Onyx2 graphics supercomputers, a 6.4TB Ampex disk rack with real-time Philips cinescan film-todigital-video converter (cinema resolution 70mm uncompressed video converter; 250K's worth),

Challenge L / Discrete Logic video server, a number of O2s, various older SGI models such as Onyx RealityEngine2, Indigo2, Personal IRIS, etc., some high-end Apple Macs and a great deal of dedicated video editing systems and VCRs, supported by a multi-gigabit network. I saw one NT system which the admin said nobody used.

4. The SGI Tech/Advice Centre: Holland #1: http://www.futuretech.vuurwerk.nl/ 5. Worldwide Mirror Sites: Holland #2: http://sgi.webguide.nl/ 6. Holland #3: http://sgi.webcity.nl/ 7. USA: http://sgi.cartsys.net/ 8. Canada: http://sgi-tech.unixology.com/

9. 10. The Data Protection Act 1984, 1998. Her Majesty's Stationary Office (HMSO): http://www.hmso.gov.uk/acts/acts1984/1984035.htm

Detailed Notes for Day 2 (Part 4) UNIX Fundamentals: Further Shell scripts. for/do Loops. The rebootlab script shown earlier could be rewritten using a for/do loop, a control structure which allows one to execute a series of commands many times. Rewriting the rebootlab script using a for/do loop doesn't make much difference to the complexity of this particular script, but using more sophisticated shell code is worthwhile when one is dealing with a large number of systems. Other benefits arise too; a suitable summary is given at the end of this discussion. The new version could be rewritten like this: #!/bin/sh for machine in akira ash cameron chan conan gibson indiana leon merlin \ nikita ridley sevrin solo stanley warlock wolfen woo do echo $machine rsh $machine init 6& done

The '\' symbol is used to continue a line onto the next line. The 'echo' line displays a comment as each machine is dealt with. This version is certainly shorter, but whether or not it's easier to use in terms of having to modify the list of host names is open to argument, as opposed to merely commenting out the relevant lines in the original version. Even so, if one happened to be writing a script that was fairly lengthy, eg. 20 commands to run on every system, then the above format is obviously much more efficient. Similarly, the remountmapleson script could be rewritten as follows: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ nikita ridley sevrin solo stanley warlock wolfen woo do echo $machine rsh $machine "umount /mapleson && mount /mapleson" done

Note that in this particular case, the command to be executed must be enclosed within quotes in order for it to be correctly sent by rsh to the remote system. Quotes like this are normally not needed; it's only because rsh is being used in this example that quotes are required.

Also note that the '&' symbol is not used this time. This is because the rebootlab procedure is asynchronous, whereas I want the remountdir script to output its messages just one action at a time. In other words, for the rebootlab script, I don't care in what order the machines reboot, so each rsh call is executed as a background process on the remote system, thus the rebootlab script doesn't wait for each rsh call to return before progressing. By contrast, the lack of a '&' symbol in remountdir's rsh command means the rsh call must finish before the script can continue. As a result, if an unexpected problem occurs, any error message will be easily noticed just by watching the output as it appears.

Sometimes a little forward thinking can be beneficial; suppose one might have reason to want to do exactly the same action on some other NFS-mounted area, eg. /home, or /var/mail, then the script could be modified to include the target directory as a single argument supplied on the command line. The new script looks like this: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ nikita ridley sevrin solo stanley warlock wolfen woo do echo $machine rsh $machine "umount $1 && mount $1" done

The script would probably be renamed to remountdir (whatever) and run with: remountdir /mapleson

or perhaps: remountdir /home

if/then/else constructs. But wait a minute, couldn't one use the whole concept of arguments to solve the problem of communicating to the script exactly which hosts to deal with? Well, a rather useful feature of any program is that it will always return a result of some kind. Whatever the output actually is, a command always returns a result which is defined to be true or false in some way. Consider the following command: grep target database

If grep doesn't find 'target' in the file 'database', then no output is given. However, as a program that has been called, grep has also passed back a value of 'FALSE' - the fact that grep does this is simply invisible during normal usage of the command. One can exploit this behaviour to create a much more elegant script for the remountdir command. Firstly, imagine that I as an admin keep a list of currently active hosts in a file called 'live' (in my case, I'd probably keep this file in /mapleson/Admin/Machines). So, at the present time, the file would contain the following: yoda akira ash cameron chan conan gibson indiana leon merlin nikita ridley sevrin solo stanley warlock wolfen woo

ie. the host called spock is not listed. The remountdir script can now be rewritten using an if/then construct: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live; then echo Remounting $1 on $machine... rsh $machine "umount $1 && mount $1" fi done

This time, the complete list of hosts is always used in the script, ie. once the script is rewritten, it doesn't need to be altered again. For each machine, the grep command searches the 'live' file for the target name; if it finds the name, then the result is some output to the screen from grep, but also a 'TRUE' condition, so the echo and rsh commands are executed. If grep doesn't find the target host name in the live file then that host is ignored.

The result is a much more elegant and powerful script. For example, suppose some generous agency decided to give the department a large amount of money for an extra 20 systems: the only changes required are to add the names of the new hosts to remountdir's initial list, and to add the names of any extra active hosts to the live file. Along similar lines, when spock finally is returned to the lab, its name would be added to the live file, causing remountdir to deal with it in the future. Even better, each system could be setup so that, as long as it is active, the system tells the server every so often that all is well (a simple script could achieve this). The server brings the results together on a regular basis, constantly keeping the live file up-to-date. Of course, the server includes its own name in the live file. A typical interval would be to update the live file every minutes. If an extra program was written which used the contents of the live file to create some kind of visual display, then an admin would know in less than a minute when a system had gone down. Naturally, commercial companies write professional packages which offer these kinds of services and more, with full GUI-based monitoring, but at least it is possible for an admin to create home-made scripts which would do the job just as well.

/dev/null. There is still an annoying feature of the script though: if grep finds a target name in the live file, the output from grep is visible on the screen which we don't really want to see. Plus, the umount command will return a message if /mapleson wasn't mounted anyway. These messages clutter up the main 'trace' messages. To hide the messages, one of UNIX's special device files can be used. Amongst the various device files in the /dev directory, one particularly interesting file is called /dev/null. This device is known as a 'special' file; any data sent to the device is discarded, and the device always returns zero bytes. Conceptually, /dev/null can be regarded as an infinite sponge - anything sent to it is just ignored. Thus, for dealing with the unwanted grep output, one can simply redirect grep's output to /dev/null. The vast majority of system script files use this technique, often many times even in a single script. Note: descriptions of all the special device files /dev are given in Appendix C of the online book, "IRIX Admin: System Configuration and Operation". Since grep returns nothing if a host name is not in the live file, a further enhancement is to include an 'else' clause as part of the if construct so that a separate message is given for hosts that are currently not active. Now the final version of the script looks like this: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \

spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live > /dev/null; then echo Remounting $1 on $machine... rsh $machine "umount $1 && mount $1" else echo $machine is not active. fi done

Running the above script with 'remountdir /mapleson' gives the following output: Checking yoda... Remounting /mapleson Checking akira... Remounting /mapleson Checking ash... Remounting /mapleson Checking cameron... Remounting /mapleson Checking chan... Remounting /mapleson Checking conan... Remounting /mapleson Checking gibson... Remounting /mapleson Checking indiana... Remounting /mapleson Checking leon... Remounting /mapleson Checking merlin... Remounting /mapleson Checking spock... spock is not active. Checking nikita... Remounting /mapleson Checking ridley... Remounting /mapleson Checking sevrin... Remounting /mapleson Checking solo... Remounting /mapleson Checking stanley... Remounting /mapleson Checking warlock... Remounting /mapleson Checking wolfen... Remounting /mapleson Checking woo... Remounting /mapleson

on yoda... on akira... on ash... on cameron... on chan... on conan... on gibson... on indiana... on leon... on merlin...

on nikita... on ridley... on sevrin... on solo... on stanley... on warlock... on wolfen... on woo...

Notice the output from grep is not shown, and the different response given when the script deals with the host called spock. Scripts such as this typically take around a minute or so to execute, depending on how quickly each host responds. The rebootlab script can also be rewritten along similar lines to take advantage of the new 'live' file mechanism, but with an extra if/then structure to exclude yoda (the rebootlab script is only meant to reboot the lab machines, not the server). The extra if/then construct uses the 'test' command to compare the current target host name with the word 'yoda' - the rsh command is only executed if the names do not match; otherwise, a message is given stating that yoda has been excluded. Here is the new rebootlab script: #!/bin/sh for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live > /dev/null; then if test $machine != yoda; then echo Rebooting $machine... rsh $machine init 6& else echo Yoda excluded. fi else echo $machine is not active. fi done

Of course, an alternative way would be to simply exclude 'yoda' from the opening 'for' line. However, one might prefer to always use the same host name list in order to minimise the amount of customisation between scripts, ie. to create a new script just copy an existing one and modify the content after the for/do structure. Notes: 



All standard shell commands and other system commands, programs, etc. can be used in shell scripts, eg. one could use 'cd' to change the current working directory between commands. An easy way to ensure that a particular command is used with the default or specifically desired behaviour is to reference the command using an absolute path description, eg. /bin/rm instead of just rm. This method is frequently found in system shell scripts. It also ensures that the scripts are not confused by any aliases which may be present in the executing shell.



Instead of including a raw list of hosts in the script at beginning, one could use other commands such as grep, awk, sed, perl and cut to obtain relevant host names from the /etc/hosts file, one at a time. There are many possibilities.

Typically, as an admin learns the existence of new commands, better ways of performing tasks are thought of. This is perhaps one reason why UNIX is such a well-understood OS: the process of improving on what has been done before has been going on for 30 years, largely because much of the way UNIX works can be examined by the user (system script files, configuration files, etc.) One can imagine the hive of activity at BTL and Berkeley in the early days, with suggestions for improvements, additions, etc. pouring in from enthusiastic testers and volunteers. Today, after so much evolution, most basic system scripts and other files are probably as good as they're going to be, so efforts now focus on other aspects such as system service improvements, new technology (eg. Internet developments, NSD), security enhancements, etc. Linux evolved in a very similar way. I learned shell programming techniques mostly by looking at existing system scripts and reading the relevant manual pages. An admin's shell programming experience usually begins with simple sequential scripts that do not include if/then structures, for loops, etc. Later on, a desire to be more efficient gives one cause to learn new techniques, rewriting earlier work as better ideas are formed.

Simple scripts can be used to perform a wide variety of tasks, and one doesn't have to make them sophisticated or clever to get the job done - but with some insightful design, and a little knowledge of how the more useful aspects of UNIX work, one can create extremely flexible scripts that can include error checking, control constructs, progress messages, etc. written in a way which does not require them to be modified, ie. external ideas, such as system data files, can be used to control script behaviour; other programs and scripts can be used to extract information from other parts of the system, eg. standard configuration files. A knowledge of the C programming language is clearly helpful in writing shell scripts since the syntax for shell programming is so similar. An excellent book for this is "C Programming in a UNIX Environment", by Judy Kay & Bob Kummerfeld (Addison Wesley Publishing, 1989. ISBN: 0 201 12912 4).

Other Useful Commands. A command found in many of the numerous scripts used by any UNIX OS is 'test'; typically used to evaluate logical expressions within 'if' clauses, test can determine the existence of files, status of access permissions, type of file (eg. ordinary file, directory, symbolic link, pipe, etc.), whether or not a file is empty (zero size), compare strings and integers, and other possibilities. See the test man page for full details. For example, the test command could be used to include an error check in the rebootlab script, to ascertain whether the live file is accessible:

#!/bin/sh if test -r /mapleson/Admin/Machines/live; then for machine in yoda akira ash cameron chan conan gibson indiana leon merlin \ spock nikita ridley sevrin solo stanley warlock wolfen woo do echo Checking $machine... if grep $machine /mapleson/Admin/Machines/live > /dev/null; then if test $machine != yoda; then echo Rebooting $machine... rsh $machine init 6& else echo Yoda excluded. fi else echo $machine is not active. fi done else echo Error: could not access live file, or file is not readable. fi

NOTE: Given that 'test' is a system command... % which test /sbin/test

...any user who creates a program called test, or an admin who writes a script called test, will be unable to execute the file unless one of the following is done:   

Use a complete pathname for the file, eg. /home/students/cmpdw/test Insert './' before the file name Alter the path definition ($PATH) so that the current directory is searched before /sbin (dangerous! The root user should definitely not do this).

In my early days of learning C, I once worked on a C program whose source file I'd called simply test.c - it took me an hour to realise why nothing happened when I ran the program (obviously, I was actually running the system command 'test', which does nothing when given no arguments except return an invisible 'false' exit status). Problem Question 1. Write a script which will locate all .capture.mv.* directories under /home and remove them safely. You will not be expected to test this for real, but feel free to create 'mini' test directories if required by using mkdir. Modify the script so that it searches a directory supplied as a single argument ($1). Relevant commands: find, rm

Tips:  

Research the other possible options for rm which might be useful. Don't use your home directory to test out ideas. Use /tmp or /var/tmp.

Problem Question 2. This is quite a complicated question. Don't feel you ought to be able to come up with an answer after just one hour. I want to be able to keep an eye on the amount of free disk space on all the lab machines. How could this be done? If a machine is running out of space, I want to be able to remove particular files which I know can be erased without fear of adverse side effects, including:   

Unwanted user files left in /tmp and /var/tmp, ie. files such as movie files, image files, sound files, but in general any file that isn't owned by root. System crash report files left in /var/adm/crash, in the form of unix.K and vmcore.K.comp, where K is some digit. Unwanted old system log information in the file /var/adm/SYSLOG. Normally, the file is moved to oSYSLOG minus the last 10 or 20 lines, and a new empty SYSLOG created containing the aforementioned most recent 10 or 20 lines.

a. Write a script which will probe each system for information, showing disk space usage. b. Modify the script (if necessary) so that it only reports data for the local system disk. c. Add a means for saving the output to some sort of results file or files. d. Add extra features to perform space-saving operations such as those described above. Advanced: e. Modify the script so that files not owned by root are only removed if the relevant user is not logged onto the target system. Relevant commands: grep, df, find, rm, tail, cd, etc.

UNIX Fundamentals: Application Development Tools. A wide variety of commands, programs, tools and applications exist for application development work on UNIX systems, just as for any system. Some come supplied with a UNIX OS asstandard, some are free or shareware, while others are commercial packages. An admin who has to manage a system which offers these services needs to be aware of their existence because there are implications for system administration, especially with respect to installed software. This section does not explain how to use these tools (even though an admin would probably find many of them useful for writing scripts, etc.) The focus here is on explaining what tools are available and may exist on a system, where they are usually located (or should be installed if an admin has to install non-standard tools), and how they might affect administration tasks and/or system policy. There tend to be several types of software tools: 1. Software executed usually via command line and written using simple editors, eg. basic compilers such as cc, development systems such as the Sun JDK for Java. Libraries for application development, eg. OpenGL, X11, Motif, Digital Media Libraries - such library resources will include example source code and programs, eg. X11 Demo Programs. In both cases, online help documents are always included: man pages, online books, hints & tips, local web pages either in /usr/share or somewhere else such as /usr/local/html. 2. Higher-level toolkits providing an easier way of programming with various libraries, eg. Open Inventor. These are often just extra library files somewhere in /usr/lib and so don't involve executables, though example programs may be supplied (eg. SceneViewer, gview, ivview). Any example programs may be in custom directories, eg. SceneViewer is in /usr/demos/Inventor, ie. users would have to add this directory to their path in order to be able to run the program. These kinds of details are in the release notes and online books. Other example programs may be in /usr/sbin (eg. ivview). 3. GUI-based application development systems for all manner of fields, eg. WorkShop Pro CASE tools for C, C++, Ada, etc., CosmoWorlds for VRML, CosmoCreate for HTML, CosmoCode for Java, RapidApp for rapid prototyping, etc. Executables are usually still accessible by default (eg. cvd appears to be in /usr/sbin) but the actual programs are normally stored in application-specific directories, eg. /usr/WorkShop, /usr/CosmoCode, etc. (/usr/sbin/cvd is a link to /usr/WorkShop/usr/sbin/cvd). Supplied online help documents are in the usual locations (/usr/share, etc.) 4. Shareware/Freeware programs, eg. GNU, Blender, XV, GIMP, XMorph, BMRT. Sometimes such software comes supplied in a form that means one can install it anywhere (eg. Blender) - it's up to the admin to decide where (/usr/local is the usual place). Other type of software installs automatically to a particular location, usually

/usr/freeware or /usr/local (eg. GIMP). If the admin has to decide where to install the software, it's best to follow accepted conventions, ie. place such software in /usr/local (ie. executables in /usr/local/bin, libraries in /usr/local/lib, header files in /usr/local/include, help documents in /usr/local/docs or /usr/local/html, source code in /usr/local/src). In all cases, it's the admin's responsibility to inform users of any new software, how to use it, etc. The key to managing these different types of tools is consistency; don't put one shareware program in /usr/local and then another in /usr/SomeCustomName. Users looking for online source code, help docs, etc. will become confused. It also complicates matters when one considers issues such as library and header file locations for compiling programs. Plus, consistency eases other aspects of administration, eg. if one always uses /usr/local for 3rd-party software, then installing this software onto a system which doesn't yet have it is a simple matter of copying the entire contents of /usr/local to the target machine. It's a good idea to talk to users (perhaps by email), ask for feedback on topics such as how easy it is to use 3rd-party software, are there further programs they'd like to have installed to make their work easier, etc. For example, a recent new audio standard is MPEG3 (MP3 for short); unknown to me until recently, there exists a freeware MP3 audio file player for SGIs. Unusually, the program is available off the Net in executable form as just a single program file. Once I realised that users were trying to play MP3 files, I discovered the existence of the MP3 player and installed it in /usr/local/bin as 'mpg123'. My personal ethos is that users come first where issues of carrying out their tasks are concerned. Other areas such as security, etc. are the admin's responsibility though - such important matters should either be left to the admin or discussed to produce some statement of company policy, probably via consulation with users, managers, etc. For everyday topics concerning users getting the most out of the system, it's wise for an admin to do what she/he can to make users' lives easier.

General Tools (editors). Developers always use editing programs for their work, eg. xedit, jot, nedit, vi, emacs, etc. If one is aware that a particular editor is in use, then one should make sure that all appropriate components of the relevant software are properly installed (including any necessary patches and bug fixes), and interested users notified of any changes, newly installed items, etc. For example, the jot editor is popular with many SGI programmers because it has some extra features for those programming in C, eg. an 'Electric C Mode'. However, a bug exists in jot which can cause file corruption if jot is used to access files from an NFSmounted directory. Thus, if jot is being used, then one should install the appropriate patch

file to correct the bug, namely Patch 2051 (patch CDs are supplied as part of any software support contract, but most patches can also be downloaded from SGI's ftp site). Consider searching the vendor's web site for information about the program in question, as well as the relevant USENET newsgroups (eg. comp.sys.sgi.apps, comp.sys.sgi.bugs). It is always best to prevent problems by researching issues beforehand. Whether or not an admin chooses to 'support' a particular editor is another matter; SGI has officially switched to recommending the nedit editor for users now, but many still prefer to use jot simply because of familiarity, eg. all these course notes have been typed using jot. However, an application may 'depend' on minor programs like jot for particular functions. Thus, one may have to install programs such as jot anyway in order to support some other application (dependency). An example in the case of the Ve24 network is the emacs editing system: I have chosen not to support emacs because there isn't enough spare disk space available to install emacs on the Indys which only have 549MB disks. Plus, the emacs editor is not a vendorsupplied product, so my position is that it poses too many software management issues to be worth using, ie. unknown bug status, file installation location issues, etc. Locations: editors are always available by default; executables tend to be in /usr/sbin, so users need not worry about changing their path definition in order to use them. All other supplied-as-standard system commands and programs come under the heading of general tools.

Compilers. There are many different compilers which might have to be installed on a system, eg.: Programming Language

Compiler Executable

C C++ Ada Fortran77 Fortran90 Pascal

cc, gcc CC ? f77 f90 ?

Some UNIX vendors supply C and C++ compilers as standard, though licenses may be required. If there isn't a supplied compiler, but users need one, then an admin can install the GNU compilers which are free. An admin must be aware that the release versions of software such as compilers is very important to the developers who use them (this actually applies to all types of software). Installing an update to a compiler might mean the libraries have fewer bugs, better

features, new features, etc., but it could also mean that a user's programs no longer compile with the updated software. Thus, an admin should maintain a suitable relationship with any users who use compilers and other similar resources, ie. keep each other informed of relevant issues, changes being made or requested, etc. Another possibility is to manage the system in such a way as to offer multiple versions of different software packages, whether that is a compiler suite such as C development kit, or a GUI-based application such as CosmoWorlds. Multiple versions of low-level tools (eg. cc and associated libraries, etc.) can be supported by using directories with different names, or NFS-mounting directories/disks containing software of different versions, and so on. There are many possibilities - which one to use depends on the size of the network, ease of management, etc. Multiple versions of higher-level tools, usually GUI-based development environments though possibly ordinary programs like Netscape, can be managed by using 'wrapper' scripts: the admin sets an environment variable to determine which version of some software package is to be the default; when a system is booted, the script is executed and uses the environment variable to mount appropriate directories, execute any necessary initialisation scripts, background daemons, etc. Thus, when a user logs in, they can use exactly the same commands but find themselves using a different version of the software. Even better, an admin can customise the setup so that users themselves can decide what version they want to use; logging out and then logging back in again would then reset all necessary settings, path definitions, command aliases, etc. MPC operates its network in this way. They use high-end professional film/video effects/animation tools such as Power Animator, Maya, Flame, etc. for their work, but the network actually has multiple versions of each software package available so that animators and artists can use the version they want, eg. for compatibility reasons, or personal preferences for older vs. newer features. MPC uses wrapper scripts of a type which require a system reboot to change software version availability, though the systems have been setup so that a user can initiate the reboot (I suspect the reboot method offers better reliability). Locations: Executables are normally in /usr/sbin, libraries in /usr/lib, header files in /usr/include and online documents, etc. in /usr/share. Note also that the release notes for such products contain valuable information for administrators (setup advice) and users alike.

Debuggers. Debugging programs are usually part of a compilation system, so everything stated above for compilers applies to debuggers as well. However, it's perfectly possible for a user to use a debugger that's part of a high-level GUI-based application development toolkit to debug programs that are created using low-level tools such as jot and xedit. A typical

example on the Ve24 machines is students using the cvd program (from the WorkShop Pro CASE Tools package) to debug their C programs, even though they don't use anything else from the comprehensive suite of CASE tools (source code management, version control, documentation management, rapid prototyping, etc.) Thus, an admin must again be aware that users may be using features of high-level tools for specific tasks even though most work is done with low-level tools. Hence, issues concerning software updates arise, eg. changing software versions without user consulation could cause problems for existing code.

High-level GUI-based Development Toolkits. Usually vendor-supplied or commercial in nature, these toolkits include products such as CosmoCode (Java development with GUI tools), RapidApp, etc. As stated above, there are issues with respect to not carrying out updates without proper consideration to how the changes may affect users who use the products, but the ramifications are usually much less serious than low-level programs or shareware/freeware. This is because the software supplier will deliberately develop new versions in such a way as to maximise compatibility with older versions. High-level toolkits sometimes rely on low-level toolkits (eg. CosmoCode depends on the Sun JDK software), so an admin should also be aware that installing updates to low-level toolkits may have implications for their higher-level counterparts.

High-level APIs (Application Programming Interfaces). This refers to advanced library toolkits such as Open Inventor, ViewKit, etc. The actual application developments tools used with these types of products are the same, whether low-level or high-level (eg. cc and commands vs. WorkShop Pro CASE Tools). Thus, high-level APIs are not executable programs in their own right; they are a suite of easierto-use libraries, header files, etc. which users can use to create applications designed at a higher level of abstraction. Some example high-level APIs and their low-level counterparts include: Lower-level

Higher-level

OpenGL X11/Motif ImageVision

Open Inventor ViewKit/Tk Image Format Library, Electronic Light Table.

This is not a complete list. And there may be more than one level of abstraction, eg. Open Inventor is a subset of VRML.

Locations: high-level APIs tend to have their files stored in correspondingly named directories in /usr/lib, /usr/include, etc. For example, Open Inventor files can be found in /usr/lib/Inventor and /usr/include/Inventor. An exception is support files such as example models, images, textures, etc. which will always be in /usr/share, but not necessarily in specifically named locations, eg. the example 3D Inventor models are in /usr/share/data/models.

Shareware and Freeware Software. This category of software, eg. the GNU compiler system, is usually installed either in /usr/local somewhere, or in /usr/freeware. Many shareware/freeware program don't have to be installed in one of these two places (Blender is one such example) but it is best to do so in order to maintain a consistent software management policy. Since /usr/local and /usr/freeware are not normally referenced by the standard path definition, an admin must ensure that relevant users are informed of any changes they may have to make in order to access newly installed software. A typical notification might be a recommendation of a how a user can modify her/his own .cshrc file so that shells and other programs know where any new executable files, libraries, online documents, etc. are stored. Note that, assuming the presence of Internet access, users can easily download freeware/shareware on their own and install it in their own directory so that it runs from their home account area, or they could even install software in globally writeable places such as /var/tmp. If this happens, it's common for an admin to become annoyed, but the user has every right to install software in their own account area (unless it's against company policy, etc.) A better response is to appreciate the user's need for the software and offer to install it properly so that everyone can use it, unless some other factor is more important. Unlike vendor-supplied or commercial applications, newer versions of shareware and freeware programs can often be radically different from older versions. GIMP is a good example of this - one version introduced so many changes that it was barely comparable to an older version. Users who utilise these types of packages might be annoyed if an update is made without consulting them because: it's highly likely their entire working environment may be different in the new version, o features of the old version may no longer be available, o aspects of the new version may be incompatible with the old version, o etc. o

Thus, shareware/freeware programs are a good example of where it might be better for admins to offer more than one version of a software package, eg. all the files for Blender V1.57 are stored in /usr/local/blender1.57_SGI_6.2_iris on akira and sevrin. When the

next version comes out (eg. V1.6), the files will be in /usr/local/blender1.6_SGI_6.2_iris - ie. users can still use the old version if they wish. Because shareware/free programs tend to be supplied as distinct modules, it's often easier to support multiple versions of such software compared to vendor-supplied or commercial packages.

Comments on Software Updates, Version Issues, etc. Modern UNIX systems usually employ software installation techniques which operate in such way so as to show any incompatibilities before installation (SGIs certainly operate this way); the inst program (and thus swmgr too since swmgr is just a GUI interface to inst) will not allow one to install software if there are conflicts present concerning software dependency and compatibility. This feature of inst (and swmgr) to monitor software installation issues applies only to software subsystems that can be installed and removed using inst/swmgr, ie. those said to be in 'inst' format. Thankfully, large numbers of freeware programs (eg. GIMP) are supplied in this format and so they can be managed correctly. Shareware/Freeware programs do not normally offer any means by which one can detect possible problems before installation or removal, unless the authors have been kind enough to supply some type of analysis script or program. Of course, there is nothing to stop an admin using low-level commands such as cp, tar, mv, etc. to manually install problematic files by copying them from a CD, or another system, but to do so is highly unwise as it would invalidate the inst database structure which normally acts as a highly accurate and reliable record of currently installed software. If an admin must make custom changes, an up-to-date record of these changes should be maintained. To observe inst/swmgr in action, either enter 'inst' or 'swmgr' at the command prompt (or select 'Software Manager' from the Toolchest which runs swmgr). swmgr is the easier to understand because of its intuitive interface. Assuming the use of swmgr, once the application window has appeared, click on 'Manage Installed Software'. swmgr loads the inst database information, reading the installation history, checking subsystem sizes, calculating dependencies, etc. The inst system is a very effective and reliable way of managing software. Most if not all modern UNIX systems will employ a software installation and management system such as inst, or a GUI-based equivalent.

Summary. As an administrator, one should not need to know how to use the software products which users have access to (though it helps in terms of being able to answer simple questions), but one should: be aware of where the relevant files are located, understand issues concerning revision control, notify users of any steps they must take in order to access new software or features, o aid users in being able to use the products efficiently (eg. using /tmp or /var/tmp for working temporarily with large files or complex tasks), o have a consistent strategy for managing software products. o o o

These issues become increasingly important as systems become more complex, eg. multiple vendor platforms, hundreds of systems connected across multiple departments, etc. One solution for companies with multiple systems and more than one admin is to create a system administration committee whose responsibilities could include coordinating site policies, dealing with security problems, sharing information, etc.

Detailed Notes for Day 3 (Part 1) UNIX Fundamentals: Installing an Operating System and/or Software. Installation Rationale. Installing an OS is a common task for an admin to perform, usually often because of the acquisition of a new system or the installation of a new disk. Although any UNIX variant should be perfectly satisfactory once it has been installed, sometimes the admin or a user has a particular problem which requires, for example, a different system configuration (and thus perhaps a reinstall to take account of any major hardware changes), or a different OS version for compatibility testing, access to more up-to-date features, etc. Alternatively, a serious problem or accidental mistake might require a reinstallation, eg. corrupted file system, damaged disk, or an unfortunate use of the rm command (recall the example given in the notes for Day 2, concerning the dangers of the 'find' command); although system restoration via backups is an option, often a simple reinstall is more convenient and faster. Whatever the reason, an admin must be familiar with the procedure for installing an OS on the platform for which she/he is responsible.

Installation Interface and Tools. Most UNIX systems have two interfaces for software installation: a high-level mode where an admin can use some kind of GUI-based tool, and a low-level mode which employs the command line shell. The GUI tool normally uses the command line version for the actual installation operations. In the case of SGI's IRIX, the low-level program is called 'inst', while the GUI interface to inst is called 'swmgr' - the latter can be activated from the 'Toolchest' on the desktop or entered as a command. Users can also run swmgr, but only in 'read-only' mode, ie. Non-root users cannot use inst or swmgr to install or remove software. For general software installation tasks (new/extra applications, updates, patches, etc.) the GUI tool can normally be used, but for installing an OS, virtually every UNIX platform will require the admin to not only use the low-level tool for the installation, but also carry out the installation in a 'lower' (restricted) access mode, ie. a mode where only the very basic system services are operating: no user-related processes are running, the end-user GUI interface is not active, no network services are running, etc. For SGI's IRIX, this mode is called 'miniroot'. Major updates to the OS are also usually carried out in miniroot mode - this is because a fully operational system will have services running which could be altered by a major OS change, ie. it would be risky to perform any such change in anything but the equivalent of miniroot. It is common for this restricted miniroot mode to be selected during bootup, perhaps by pressing the ESC key when prompted. In the case of SGI systems, the motherboard PROM chip includes

a hard-coded GUI interface mechanism called ARCS which displays a mouse-driven menu on bootup. This provides the admin with a user-friendly way of performing low-level system administration tasks, eg. installing the OS from miniroot, running hardware diagnostics, accessing a simple PROM shell called a Command Monitor for performing low-level actions such as changing PROM settings (eg. which SCSI ID to treat as the system disk), etc. Systems without graphics boards, such as servers, provide the same menu but in text-only form, usually through a VT or other compatible text display terminal driven from the serial port. Note that SGI's VisualWorkstation machine (an NT system) also uses the ARCS GUI interface - a first for any NT system (ie. no DOS at all for low-level OS operations). Not many UNIX vendors offer a GUI menu system like ARCS for low-level tasks - SGI is one of the few who do, probably because of a historical legacy of making machines for the visual arts and sciences. Though the ARCS system is perhaps unique, after one one has selected 'Software Installation' the procedure progresses to a stage where the interface does become the more familiar text-based use of inst (ie. the text information just happens to be presented within a GUI-style window). Very early UNIX platforms were not so friendly when it came to offering an easy method for installing the OS, especially in the days of older storage media such as 5.25" disks, magnetic tapes, etc. However, some vendors did a good job, eg. the text-only interface for installing HPUX on Hewlett Packard machines (eg. HP9000/730) is very user-friendly, allowing the admin to use the cursor arrow keys to select options, activate tasks, etc. During installation, constantly updated information shows how the installation is progressing: current file being installed, number of files installed so far, number of files remaining, amount of disk space used up so far, disk space remaining, percentage equivalents for all these, and even an estimate of how much longer the installation will take before completion (surprisingly, inst doesn't provide this last piece of information as it is running, though one can make good estimates or find out how long it's going to take from a 3rd-party information source). The inst program gives progress output equivalent to most of the above by showing the current software subsystem being installed, which sub-unit of which subsystem, and what percentage of the overall operation has been done so far. Perhaps because of the text-only interface which is at the heart of installing any UNIX variant, installing an OS can be a little daunting at first, but the actual procedure itself is very easy. Once an admin has installed an OS once, doing it again quickly becomes second nature. The main reason the task can seem initially confusing is that the printed installation guides are often too detailed, ie. the supplied documents have to assume that the person carrying out the installation may know nothing at all about what they're doing. Thankfully, UNIX vendors have recognised this fact and so nowadays any such printed material also contains a summary installation guide for experts and those who already know the general methods involved - this is especially useful when performing an OS update as opposed to an original OS installation.

OS Source Media. Years ago, an OS would be stored on magnetic tape or 5.25" disks. Today, one can probably state with confidence that CDROMs are used by every vendor. For example, SGI's IRIX 6.2 comes on 2 CDROMs; IRIX 6.5 uses 4 CDROMs, but this is because 6.5 can be used with any machine from SGI's entire current product line, aswell as many older systems - thus, the basic CD set must contain the data for all relevant systems even though an actual installation will only use a small subset of the data from the CDs (typically less than one CD's worth). In the future, it is likely that vendors will switch to DVDs due to higher capacities and faster transfer rates. Though a normal OS installation uses some form of original OS media, UNIX actually allows one to install an OS (or any software) via some quite unique ways. For example, one could copy the data from the source media (I shall assume CDROM) to a fast UltraSCSI disk drive. Since disks offer faster transfer rates and access times, using a disk as a source media enables a faster installation, as well as removing the need for swapping CDROMs around during the installation process. This is essentially a time-saving feature but is also very convenient, eg. no need to carry around many CDROMs (remember that after an OS installation, an admin may have to install extra software, applications, etc. from other CDs). A completely different option is to install the OS using a storage device which is attached to a remote machine across a network. This may sound strange, ie. the idea that a machine without an OS can access a device on a remote system and use that as an OS installation source. It's something which is difficult but not impossible with PCs (I'm not sure whether a Linux PC would support this method). A low-level communications protocol called bootp (Internet Bootstrap Protocol), supported by all traditional UNIX variants, is used to facilitate communication across the network. As long as the remote system has been configured to allow another system to access its local device as a source for remote OS installation, then the remote system will effectively act as an attatched storage medium. However, most admins will rarely if ever have to install an OS this way for small networks, though it may be more convenient for larger networks. Note that IRIX systems are supplied by default with the bootp service disabled in the /etc/inetd.conf file (the contents of this file controls various network services). Full details on how to use the bootp service for remote OS installation are normally provided by the vendor in the form of an online book or reference page. In the case of IRIX, see the section entitled, "Booting across the Network" in Chapter 10 of the online book, "IRIX Admin: System Configuration and Operation". Note: this discussion does not explain every single step of installing an OS on an SGI system, though the method will be demonstrated during the practical session if time permits. Instead, the focus here is on management issues which surround an OS installation, especially those techniques which can ease the installation task. Because of the SGI-related technical site I run, I have already created extremely detailed installation guides for IRIX 6.2 [1] and IRIX 6.5 [2] which also include tables of example installation times (these two documents are included for future reference). The installation times obtained were used to conduct a CPU and CDROM

performance analysis [3]. Certain lessons were learned from this analysis which are also relevant to installing an OS - these are explained later.

Installing an OS on multiple systems. Using a set of CDs to install an OS can take quite some time (15 to 30 minutes is a useful approximation). If an admin has many machines to install, there are several techniques for cutting the amount of time required to install the OS on all the machines. The most obvious method is for all machines to install via a remote network device, but this could actually be very slow, limited partly by network speed but also by the way in which multiple systems would all be trying to access the same device (eg. CDROM) at the same time. It would only really be effective for a situation where the network was very fast and the device - or devices, there could be more than one - was also fast. An example would be the company MPC; as explained in previous lectures, their site configuration is extremely advanced. The network they employ is so fast that it can saturate the typical 100Mbit Ethernet port of a modern workstation like Octane. MPC's storage systems include many high-end RAID devices capable of delivering data at hundreds of MB/sec rates (this kind of bandwidth is needed for editing broadcast-quality video and assuring that animators can load complete scene databases without significant delay). Thus, the admin at MPC can use some spare RAID storage to install an OS on a system across the network. When this is done, the limiting factor which determines how long the installation takes is the computer's main CPU(s) and/or its Ethernet port (100MBit), the end result of which is that an installation can take mere minutes. In reality, the MPC admin uses an even faster technique for installing an OS, which is discussed in a moment. At the time of my visit, MPC was using a high-speed crossbar switching 288Mbit/sec network (ie. multiple communications links through the routers - each machine could be supplied with up to 36MB/sec). Today they use multiple gigabit links (HiPPI) and other supporting devices. But not everyone has the luxury of having such equipment.

Disk Cloning [1]. If an admin only has a single machine to deal with, the method used may not matter too much, but often the admin has to deal with many machines. A simple technique which saves a huge amount of time is called 'disk cloning'. This involves installing an OS onto a single system ('A') and then copying (ie. cloning) the contents of that system's disk onto other disks. The first installation might be carried out by any of the usual means (CDROM, DAT, network, etc.), after which any extra software is also installed; in the case of SGIs, this would mean the admin starting up the system into a normal state of operation, logging in as root and using swmgr to install extra items. At this point, the admin may wish to make certain custom changes as well, eg.

installing shareware/freeware software, etc. This procedure could take more than an hour or two if there is a great deal of software to install. Once the initial installation has finished, then begins the cloning process. On SGIs, this is typically done as follows (other UNIX systems will be very similar if not identical): 1. Place the system disk from another system B into system A, installed at, for example, SCSI ID 2 (B's system disk would be on SCSI ID 1 in the case of SGIs; SCSI ID 0 is used for the SCSI controller). Bootup the system. 2. Login as root. Use fx to initialise the B disk to be a new 'root' (ie. system) disk; create a file system on it; mount the disk on some partition on A's disk such as /disk2. 3. Copy the contents of disk A to disk B using a command such as tar. Details of how to do this with example tar commands are given in the reference guides [1] [2]. 4. Every system disk contains special volume header information which is required in order to allow it to behave as a bootable device. tar cannot copy this information since it does not reside on the main data partition of the disk in the form of an ordinary file, so the next step is to copy the volume header data from A to B using a special command for that purpose. In the case of SGIs, the relevant program is called dvhtool (device volume header tool). 5. Shut down system A; remove the B disk; place the B disk back into system B, remembering to change its SCSI ID back to 1. If further cloning is required, insert another disk into system A on SCSI ID 2, and (if needed) a further disk into system B, also set to SCSI ID 2. Reboot both systems. 6. System A will reboot as normal. At bootup time, although system B already has a kernel file available (/unix) because all the files will be recognised as new (ie. changed) system B will also create a new kernel file (/unix.install) and then bootup normally ready for login. Reboot system B once more so that the new kernel file is made the current kernel file. At this stage, what one has effectively created is a situation comprising two systems as described in Step 1, instead of only one such system which existed before the cloning process. Thus, one could now repeat the process again, creating four systems ready to use or clone again as desired. Then eight, sixteen, thirty two and so on. This is exactly the same way biological cells divide, ie. binary fission. Most people are familiar with the idea that repeatedly doubling the number of a thing can create a great many things in a short space of time, but the use of such a technique for installing an operating system on many machines means an admin can, for example, completely configure over one hundred machines in less than five hours! The only limiting factor, as the number of machines to deal with increases, is the amount of help available by others to aid in the swapping of disks, typing of commands, etc. In the case of the 18 Indys in Ve24, the last complete reinstall I did on my own took less than three hours. Note: the above procedure assumes that each cloning step copies one disk onto just a single other disk - this is because I'm using the Indy as an example, ie. Indy only has internal space for one extra disk. But if a system has the available room, then many more disks could be installed on other SCSI IDs (3, 4, 5, etc.) resulting in each cloning step creating three, four, etc. disks from just one. This is only possible because one can run multiple tar copy commands at the same time.

Of course, one could use external storage devices to connect extra disks. There's no reason why a system with two SCSI controllers (Indigo2, O2, Octane, etc.) couldn't use external units to clone the system disk to 13 other disks at the same time; for a small network, such an ability could allow the reinstallation of the entire system in a single step!

Using a Backup Image. If a system has been backed up onto a medium such as DAT tape, one could in fact use that tape for installing a fresh OS onto a different disk, as opposed to the more usual use of the tape for data restoration purposes. The procedure would be similar to some of the steps in disk cloning, ie. install a disk on SCSI ID 2, initialise, and use tar to extract the DAT straight to the disk. However, the volume header information would have to come from the original system since it would not be present on the tape, and only one disk could be written to at a time from the tape. Backup media are usually slower than disks too.

Installing a New Version of an OS (Major Updates). An admin will often have to install updates to various OS components as part of the normal routine of installing software patches, bug fixes, new features, security fixes, etc. as they arrive in CD form from the vendor concerned. These can almost always be installed using the GUI method (eg. swmgr) unless specifically stated otherwise for some reason. However, if an admin wishes to change a machine which already has an OS installed to a completely new version (whether a newer version or an older one), then other issues must be considered. Although it is perfectly possible to upgrade a system to a newer OS, an existing system will often have so much software installed with a whole range of configuration files, a straight upgrade to a new OS revision may not work very well. It would be successful, but what usually happens is that the admin has to resolve installation conflicts before the procedure can begin, which is annoyingly time wasting. Further, some changes may even alter some fundamental aspect of the system, in which case an upgrade on top of the existing OS would involve extra changes which an admin would have to read up on first (eg. IRIX 6.2 uses a completely different file system to IRIX 5.3: XFS vs. EFS). Even if an update over an existing OS is successful, one can never really be sure that older files which aren't needed anymore were correctly removed. To an admin, the system would 'feel' as if the older OS was somehow still there, rather like an old layer of paint hidden beneath a new gloss. This aspect of OS management is perhaps only psychological, but it can be important. For example, if problems occurred later, an admin might waste time checking for issues concerning the older OS which aren't relevant anymore, even though the admin theoretically knows such checks aren't needed.

Thus, a much better approach is to perform a 'clean' installation when installing a new OS. A typical procedure would be as follows: 1. Read all the relevant notes supplied with the new OS release so that any issues relevant to how the system may be different with the new OS version are known beforehand, eg. if any system services operate in a different way, or other factors (eg. new type of file system, etc.) 2. Make a full system backup of the machine concerned. 3. Identify all the key files which make the system what it is, eg. /etc/sys_id, /etc/hosts, and other configuration files/directories such as /var/named, /var/flexlm/license.dat, etc. These could be placed onto a DAT, floptical, ZIP, or even another disk. Items such as shareware/freeware software are probably best installed anew (read any documents relevant to software such as this too). 4. Use the appropriate low-level method to reinitialise the system disk. For SGI IRIX systems, this means using the ARCS bootup menu to select the Command Monitor, boot off of the OS CDROM and use the fx program to reinitialise the disk as a root disk, use mkfs to create a new file system (the old OS image is now gone), then reboot to access the 'Install System Software' option from the ARCS menu. 5. Install the OS in the normal manner. 6. Use the files backed up in step 3 to change the system so that it adopts its usual identity and configuration, baring in mind any important features/caveats of the new OS release. This is a safe and reliable way of ensuring a clean installation. Of course, the installation data could come from a different media or over a network from a remote system as described earlier.

Time-saving Tips. When installing an OS or software from a CDROM, it's tempting to want to use the fastest possible CDROM available. However, much of the process of installing software, whether the task is an OS installation or not, involves operations which do not actually use the CDROM. For example, system checks need to be made before the installation can begin (eg. available disk space), hundreds of file structures need to be created on the disk, installation images need to be uncompressed in memory once they have been retrieved from the CDROM, installed files need to be checked as the installation progresses (checksums), and any post-installation tasks performed such as compiling any system software indices. As a result, perhaps 50% of the total installation time may involve operations which do not access the CDROM. Thus, using a faster CDROM may not speedup the overall installation to any great degree. This effect is worsened if the CPU in the system is particularly old or slow, ie. a slow CPU may not be able to take full advantage of an old CDROM, never mind a new one. In order for a faster CDROM to make any significant difference, the system's CPU must be able to take advantage of it, and a reasonably large proportion of an installation procedure must actually consist of accessing the CDROM.

For example, consider the case of installing IRIX 6.5 on two different Indys - one with a slow CPU, the other with a better CPU - comparing any benefit gained from using a 32X CDROM instead of a 2X CDROM [3]. Here is a table of installation times, in hours minutes and seconds, along with percentage speedups. 2X CDROM

32X CDROM

%Speedup

100MHz R4600PC Indy:

1:18:36

1:12:11

8.2%

200MHz R4400SC Indy:

0:52:35

0:45:24

13.7%

(data for a 250MHz R4400SC Indigo2 shows the speedup would rise to 15.2% - a valid comparison since Indy and Indigo2 are almost identical in system design) In other words, the better the main CPU, the better the speedup obtained by using a faster CDROM. This leads on to the next very useful tip for installing software (OS or otherwise)...

Temporary Hardware Swaps. The example above divided the columns in order to obtain the speedup for using a faster CDROM, but it should be obvious looking at the table that a far greater speedup can be obtained by using a better CPU: Using 200MHz R4400SC CPU Instead of 100MHz R4600PC. (Percentage Speedup) 2X CDROM with Indy:

33.1%

32X CDROM with Indy:

37.1%

In other words, no matter what CDROM is used, an admin can save approximately a third of the normal installation time just by temporarily swapping the best possible CPU into the target system! And of course, the saving is maximised by using the fastest CDROM available too, or other installation source such as a RAID containing the CDROM images. For example, if an admin has to carry out a task which would normally be expected to take, say, three hours on the target system, then a simple component swap could save over an hour of installation time. From an admin's point of view, that means getting the job done quicker (more time for other tasks), and from a management point of view that means lower costs and better efficiency, ie. less wages money spent on the admin doing that particular task. Some admins might have to install OS images as part of their job, eg. performance analysis or configuring systems to order. Thus, saving as much time as possible could result in significant daily productivity improvements.

The Effects of Memory Capacity. During the installation of software or an OS, the system may consume large amounts of memory in order to, for example, uncompress installation images from the CDROM, process existing system files during a patch update, recompile system file indices, etc. If the target system does not have enough physical memory, then swap space (otherwise known as virtual memory) will have to be used. Since software installation is a disk and memory intensive task, this can massively slow down the installation or removal procedure (the latter can happen too because complex file processing may be required in order to restore system files to an earlier state prior to the installation of the software items being removed). Thus, just as it can be helpful to temporarily swap a better CPU into the target system and use a faster CDROM if available, it is also a good idea to ensure the system has sufficient physical memory for the task. For example, I once had cause to install a large patch upgrade to the various compiler subsystems on an Indy running IRIX 6.2 with 64MB RAM [1]. The update procedure seemed to be taking far too long (15 minutes and still not finished). Noticing the unusually large amount of disk activity compared to what I would normally expect, ie. Noise coming from the disk, I became suspicious and wondered whether the installation process was running out of memory. A quick use of gmemusage showed the available memory to be very low (3MB) implying that memory swapping was probably occurring. I halted the update procedure (easy to do with IRIX) and cancelled the installation. After upgrading the system temporarily to 96MB RAM (using 32MB from another Indy) I ran the patch again. This time, the update was finished in less than one minute! Using gmemusage showed the patch procedure required at least 40MB RAM free in order to proceed without resorting to the use of swap space.

Summary. 1. Before making any major change to a system, make a complete backup just in case something goes wrong. Read any relevant documents supplied with the software to be installed, eg. release notes, caveats to installation, etc. 2. When installing an OS or other software, use the most efficient storage media available if possible, eg. the OS CDs copied onto a disk. NB: using a disk OS image for installation might mean repartitioning the disk so that the system regards the disk as a bootable device, just like a CDROM. By default, SCSI disks do not have the same partition layout as a typical CDROM. On SGIs using IRIX, the fx program is used to repartition disks. 3. If more than one system is involved, use methods such as disk cloning to improve the efficiency of the procedure. 4. If possible, temporarily swap better system components into the target system in order to reduce installation time and ensure adequate resources for the procedure (better CPU, lots of RAM, fastest possible CDROM).

Caution: item 4 above might not be possible if the particular set of files which get installed are determined by the presence of internal components. In the case of Indy, installing an R5000 series CPU would result in the installation of different low-level bootup CPU-initialisation libraries compared to R4600 or R4400 (these latter two CPUs can use the same libraries, but any R5000 CPU uses newer libraries). Files relevant to these kinds of issues are located in directories such as /var/sysgen.

Patch Files. Installing software updates to parts of the OS or application software is a common task for admins. In general, patch files should not be installed unless they are needed, but sometimes an admin may not have any choice, eg. for security reasons, or Y2K compliance. Typically, patch updates are supplied on CDs in two separate categories (these names apply to SGIs; other UNIX vendors probably use a similar methodology): 1. Required/Recommended patches. 2. Fix-on-Fail Patches. Item 1 refers to patches which the vendor suggests the admin should definitely install. Typically, a CD containing such patches is accessed with inst/swmgr and an automatic installation carried out, ie. the admin lets the system work out which of the available required/recommended patches should be installed. This concept is known as installing a 'patch set'. When discussing system problems or issues with others (eg. technical support, or colleagues on the Net), the admin can then easily describe the OS state as being a particular revision modified by a particular dated patch set, eg. IRIX 6.5 with the April 1999 Patch Set. Item 2 refers to patches which only concern specific problems or issues, typically a single patch file for each problem. An admin should not install such patches unless they are required, ie. they are selectively installed as and when is necessary. For example, an unmodified installation of IRIX 6.2 contains a bug in the 'jot' editor program which affects they way in which jot accesses files across an NFS-mounted directory (the bug can cause jot to erase the file). To fix the bug, one installs patch number 2051 which is shown in the inst/swmgr patch description list as 'Jot fix for mmapping', but there's no need to install the patch if a machine running 6.2 is not using NFS.

Patch Inheritance. As time goes by, it is common for various bug fixes and updates from a number of patches to be brought together into a 'rollup' patch. Also, a patch file may contain the same fixes as an earlier patch plus some other additional fixes. Two issues arise from this: 1. If one is told to install a patch file of a particular number (eg. advice gained from someone on a newsgroup), it is usually the case that any later patch which has been declared to be a replacement for the earlier patch can be used instead. This isn't always

the case, perhaps due to specific hardware issues of a particular system, but in general a fix for a problem will be described as 'install patch or later'. The release notes for any patch file will describe what hardware platforms and OS revisions that patch is intended for, what patches it replaces, what bugs are fixed by the patch (official bug code numbers included), what other known bugs still exist, and what workarounds can be used to temporarily solve the remaining problems. 2. When a patch is installed, a copy of the effected files prior to installation, called a 'patch history', is created and safely stored away so that if ever the patch has to be removed at a later date, the system can restore the relevant files to the state they were in before the patch was first installed. Thus, installing patch files consumes disk space - how much depends on the patch concerned. The 'versions' command with the 'removehist' option can be used to remove the patch history for a particular patch, recovering disk space, eg.: 3. versions removehist patchSG0001537

would remove the patch history file for patch number 1537. To remove all patch histories, the command to use is: versions removehist "*"

Conflicts. When installing patches, especially of the Fix-on-Fail variety, an admin can come across a situation where a patch to be installed (A) is incompatible with one already present on the system (B). This usually happens when an earlier problem was dealt with using a more up-to-date patch than was actually necessary. The solution is to either remove B, then install an earlier but perfectly acceptable patch C and finally install A, or find a more up-to-date patch D which supersedes A and is thus compatible with B. Note: if the history file for a patch has been removed in order to save disk space, then it will not be possible to remove that patch from the system. Thus, if an admin encounters the situation described above, the only possible solution will be to find the more up-to-date patch D.

Exploiting Patch File Release Notes. The release notes for patches can be used to identify which patches are compatible, as well ascertain other useful information, especially to check whether a particular patch is the right one an admin is looking for (patch titles can sometimes be somewhat obscure). Since the release notes exist on the system in text form (stored in /usr/relnotes), one can use the grep command to search the release notes for information by hand, using appropriate commands. The commands 'relnotes' and 'grelnotes' can be used to view release notes. relnotes outputs only text. Without arguments, it shows a summary of all installed products for which release notes are available. One can then supply a product name - relnotes will respond with a list of chapter titles for that product. Finally, specifying a product name and a chapter number will output the actual text notes for the chosen chapter, or one can use '*' to display all

chapters for a product. grelnotes gives the same information in a browsable format displayed in a window, ie. grelnotes is a GUI interface to relnotes. See the man pages for these commands for full details. relnotes actually uses the man command to display information, ie. the release notes files are stored in the same compressed text format ('pack') used by online manual pages (man uses the 'unpack' command to decompress the text data). Thus, in order to grep-search through a release notes file, the file must first be uncompressed using the unpack command. This is a classic example of where the UNIX shell becomes very powerful, ie. one could write a shell script using a combination of find, ls, grep, unpack and perhaps other commands to allow one to search for specific items in release notes. Although the InfoSearch tool supplied with IRIX 6.5 allows one to search release notes, IRIX 6.2 does not have InfoSearch, so an admin might decide that writing such a shell script would prove very useful. Incidentally, this is exactly the kind of useful script which ends up being made available on the Net for free so that anyone can use it. For all I know, such a script already exists. Over time, entire collections of useful scripts are gathered together and eventually released as freeware (eg. GNU shell script tools). An admin should examine any such tools to see if they could be useful - a problem which an admin has to deal with may already have been solved by someone else two decades earlier.

Patch Subsystem Components. Like any other software product, a patch file is a software subsystem usually containing several sub-units, or components. When manually selecting a patch for installation, inst/swmgr may tag all sub-units for installation even if certain sub-units are not applicable (this can happen for an automatic selection too, perhaps because inst selects all of a patch's components by default). If this happens, any conflicts present will be displayed, preventing the admin from accidentally installing unwanted or irrelevant items. Remember that an installation cannot begin until all conflicts are resolved, though an admin can override this behaviour if desired. Thus, when manually installing a patch file (or files), I always check the individual sub-units to see what they are. In this way, I can prevent conflicts from arising in the first place by not selecting subsystems which I know are not relevant, eg. 64bit libraries which aren't needed for a system with a 32bit memory address kernel like Indy (INFO: all SGIs released after the Indigo R3000 in 1991 do 64bit processing, but the main kernel file does not need to be compiled using 64bit addressing extensions unless the system is one which might have a very large amount of memory, eg. an Origin2000 with 16GB RAM). Even when no conflicts are present, I always check the selected components to ensure no 'older version' items have been selected.

References: 1. Disk and File System Administration: 2. http://www.futuretech.vuurwerk.nl/disksfiles.html

3. How to Install IRIX 6.5: 4. http://www.futuretech.vuurwerk.nl/6.5inst.html

5. SGI General Performance Comparisons: 6. http://www.futuretech.vuurwerk.nl/perfcomp.html

Detailed Notes for Day 3 (Part 2) UNIX Fundamentals: Organising a network with a server. This discussion explains basic concepts rather than detailed ideas such as specific 'topologies' to use with large networks, or how to organise complex distributed file systems, or subdomains and address spaces - these are more advanced issues which most admins won't initially have to deal with, and if they do then the tasks are more likely to be done as part of a team. The SGI network in Ve24 is typical a modern UNIX platform in how it is organised. The key aspects of this organisation can be summarised as follows: 

   



 

A number of client machines and a server are connected together using a hub (24-port in this case) and a network comprised of 10Mbit Ethernet cable (100Mbit is more common in modern systems, with Gigabit soon to enter the marketplace more widely). Each client machine has its own unique identity, a local disk with an installed OS and a range of locally installed application software for use by users. The network has been configured to have its own subdomain name of a form that complies with the larger organisation of which it is just one part (UCLAN). The server has an external connection to the Internet. User accounts are stored on the server, on a separate external disk. Users who login to the client machines automatically find their own files available via the use of the NFS service. Users can work with files in their home directory (which accesses the server's external disk across the network) or use the temporary directories on a client machine's local disk for better performance. Other directories are NFS mounted from the server in order to save disk space and to centralise certain services (eg. /usr/share, /var/mail, /var/www). Certain aspects of the above are customised in places. Most networks are customised in certain ways depending on the requirements of users and the decisions taken by the admin and management. In this case, specifics include: o Some machines have better hardware internals, allowing for software installation setups that offer improved user application performance and services, eg. bigger disk permits /usr/share to be local instead of NFS-mounted, and extra vendor software, shareware and freeware can be installed. o The admin's account resides on an admin machine which is effectively also a client, but with minor modifications, eg. tighter security with respect to the rest of the network, and the admin's personal account resides on a disk attached to the admin machine. NFS is used to export the admin's home account area to the server and all other clients; custom changes to the admin's account definition allows the admin account to be treated just like any other user account (eg. accessible from within /home/staff). o The server uses a Proxy server in order to allow the client machines to access the external connection to the Internet. o Ordinary users cannot login to the server, ensuring that the server's resources are reserved for system services instead of running user programs. Normally, this

would be a more important factor if the server was a more powerful system than the clients (typical of modern organisations). In the case of the Ve24 network though, the server happens to have the same 133MHz R4600PC CPU as the client machines. Staff can login to the server however - an ability based on assumed privilege. o One client machine is using a more up-to-date OS version (IRIX 6.5) in order to permit the use of a ZIP drive, a device not fully supported by the OS version used on the other clients (IRIX 6.2). ZIP drives can be used with 6.2 at the commandline level, but the GUI environment supplied with 6.2 does not fully support ZIP devices. In order to support 6.5 properly, the client with the ZIP drive has more memory and a larger disk (most of the clients have 549MB system disks insufficient to install 6.5 which requires approximately 720MB of disk space for a default installation). o etc. This isn't a complete list, but the above are the important examples. Exactly how an admin configures a network depends on what services are to be provided, how issues such as security and access control are dealt with, Internet issues, available disk space and other resources, peripherals provided such as ZIP, JAZ, etc., and of course any policy directives decided by management. My own personal ethos is, in general, to put users first. An example of this ethos in action is that /usr/share is made local on any machine which can support it - accesses to such a local directory occur much faster than across a network to an NFS-mounted /usr/share on a server. Thus, searching for man pages, accessing online books, using the MIDI software, etc. is much more efficient/faster, especially when the network or server is busy.

NFS Issues. Many admins will make application software NFS-mounted, but this results in slower performance (unless the network is fast and the server capable of supplying as much data as can be handled by the client, eg. 100Mbit Ethernet, etc.) However, NFS-mounted application directories do make it easier to manage software versions, updates, etc. Traditional client/server models assume applications are stored on a server, but this is an old ethos that was designed without any expectation that the computing world would eventually use very large media files, huge applications, etc. Throwing application data across a network is a ridiculous waste of bandwidth and, in my opinion, should be avoided where possible (this is much more important for slower networks though, eg. 10Mbit). In the case of the Ve24 network, other considerations also come into play because of hardwarerelated factors, eg. every NFS mount point employed by a client system uses up some memory which is needed to handle the operational overhead of dealing with accesses to that mount point. Adding more mount points means using more memory on the client; for an Indy with 32MB RAM, using as many as a dozen mount points can result in the system running out of memory (I

tried this in order to offer more application software on the systems with small disks, but 32MB RAM isn't enough to support lots of NFS-mounted directories, and virtual memory is not an acceptable solution). This is a good example of how system issues should be considered when deciding on the hardware specification of a system. As with any computer, it is unwise to equip a UNIX system with insufficient resources, especially with respect to memory and disk space.

Network Speed. Similarly, the required speed of the network will depend on how the network will be used. What applications will users be running? Will there be a need to support high-bandwidth data such as video conferencing? Will applications be NFS-mounted or locally stored? What kind of system services will be running? (eg. web servers, databases, image/document servers, etc.) What about future expansion? All these factors and more will determine whether typical networking technologies such as 10Mbit, 100Mbit or Gigabit Ethernet are appropriate, or whether a different networking system such as ATM should be used instead. For example, MPC uses a fastswitching high-bandwidth network due to the extensive use of data-intensive applications which include video editing, special effects, rendering and animation. After installation, commands such as netstat, osview, ping and ttcp can be used to monitor network performance. Note that external companies, and vendor suppliers, can offer advice on suggested system topologies. For certain systems (eg. high-end servers), specific on-site consultation and analysis may be part of the service.

Storage. Deciding on appropriate storage systems and capacities can be a daunting task for a non-trivial network. Small networks such as the SGI network I run can easily be dealt with simply by ensuring that the server and clients all have large disks, that there is sufficient disk space for user accounts, and a good backup system is used, eg. DDS3 DAT. However, more complex networks (eg. banks, commercial businesses, etc.) usually need huge amounts of storage space, use very different types of data with different requirements (text, audio, video, documents, web pages, images, etc.), and must consider a whole range of issues which will determine what kind of storage solution is appropriate, eg.:     

preventing data loss, sufficient data capacity with room for future expansion, interupt-free fast access to data, failure-proof (eg. backup hub units/servers/UPS), etc.

A good source of advice may be the vendor supplying the systems hardware, though note that 3rd-party storage solutions can often be cheaper, unless there are other reasons for using a vendor-sourced storage solution (eg. architectural integration).

See the article listed in reference [1] for a detailed discussion on these issues.

Setting up a network can thus be summarised as follows:    

   

  

Decide on the desired final configuration (consultation process, etc.) Install the server with default installations of the OS. Install the clients with a default or expanded/customised configuration as desired. Construct the hardware connections. Modify the relevant setup files of a single client and the server so that one can rlogin to the server from the client and use GUI-based tools to perform further system configuration and administration tasks. Create, modify or install the files necessary for the server and clients to act as a coherent network, eg. /etc/hosts, .rhosts, etc. Setup other services such as DNS, NIS, etc. Setup any client-specific changes such as NFS mount points, etc. Check all aspects of security and access control, eg. make sure guest accounts are blocked if required, all client systems have a password for the root account, etc. Use any available FAQ (Frequently Asked Questions) files or vendor-supplied information as a source of advice on how to deal with these issues. Very usefully, IRIX 6.5 includes a high-level tool for controlling overall system and network security - the tool can be (and normally is) accessed via a GUI interface. Begin creating group entries in /etc/group ready for user accounts, and finally the user accounts themselves. Setup any further services required, eg. Proxy server for Internet access. etc.

The above have not been numbered in a rigid order since the tasks carried out after the very first step can usually be performed in a different order without affecting the final configuration. The above is only a guide.

Quotas. Employing disk quotas is a practice employed by most administrators as a means of controlling disk space usage by users. It is easy to assume that a really large disk capacity would mean an admin need not bother with quotas, but unfortunately an old saying definitely holds true: "Data will expand to fill the space available." Users are lazy where disk space is concerned, perhaps because it is not their job to manage the system as a whole. If quotas are not present on a system, most users simply don't bother deleting unwanted files. Alternatively, the quota management software can be used as an efficient disk accounting system by setting up quotas for a file system without using limit enforcement. IRIX employs a quota management system that is common amongst many UNIX variants. Examining the relevant commands (consult the 'SEE ALSO' section from the 'quotas' man page),

IRIX's quota system appears to be almost identical to that employed by, for example, HP-UX (Hewlett Packard's UNIX OS). There probably are differences between the two implementations, eg. issues concerning supported operations on particular types of file system, but in this case the quota system is typical of the kind of OS service which is very similar or identical across all UNIX variants. An important fact is that the quota software is part of the overall UNIX OS, rather than some hacked 3rd-party software addon. Quota software allows users to determine their current disk usage, and enables an admin to monitor available resources, how long a user is over their quota, etc. Quotas can be used not only to limit the amount of available disk space a user has, but also the number of files (inodes) which a user is permitted to create. Quotas consist of soft limits and hard limits. If a user's disk usage exceeds the soft limit, a warning is given on login, but the user can still create files. If disk usage continues to increase, the hard limit is the point beyond which the user will not be able to use any more disk space, at least until the usage is reduced so that it is sufficiently below the hard limit once more. Like most system services, how to setup quotas is explained fully in the relevant online book, "IRIX Admin: Disks and Filesystems". What follows is a brief summary of how quotas are setup under IRIX. Of more interest to an admin are the issues which surround quota management these are discussed shortly. To activate quotas on a file system, an extra option is added to the relevant entry in the /etc/fstab file so that the desired file system is set to have quotas imposed on all users whose accounts reside on that file system. For example, without quotas imposed, the relevant entry in yoda's /etc/fstab file looks like this: /dev/dsk/dks4d5s7 /home xfs rw 0 0

With quotas imposed, this entry is altered to be: /dev/dsk/dks4d5s7 /home xfs rw,quota 0 0

Next, the quotaon command is used to activate quotas on the root file system. A reboot causes the quota software to automatically detect that quotas should be imposed on /home and so the quota system is turned on for that file system. The repquota command is used to display quota statistics for each user. The edquota command is used to change quota values for a single user, or multiple users at once. With the -i option, edquota can also read in quota information from a file, allowing an admin to set quota limits for many users with a single command. With the -e option, repquota can output the current quota statistics to a file in a format that is suitable for use with edquota's -i option. Note: the editor used by edquota is vi by default, but an admin can change this by etting an environment variable called 'EDITOR', eg.: setenv EDITOR jot -f

The -f option forces jot to run in the foreground. This is necessary because the editor used by edquota must run in the foreground, otherwise edquota will simply see an empty file instead of quota data. Ordinary users cannot change quota limits.

Quota Management Issues. Most users do not like disk quotas. They are perceived as the information equivalent of a straitjacket. However, quotas are usually necessary in order to keep disk usage to a sensible level and to maintain a fair usage amongst all users. As a result, the most important decision an admin must make regarding quotas is what limit to actually set for users, either as a whole or individually. The key to amicable relations between an admin and users is flexibility, eg. start with a small to moderate limit for all (eg. 20MB). If individuals then need more space, and they have good reason to ask, then an admin should increase the user's quota (assuming space is available). Exactly what quota to set in the first instance can be decided by any sensible/reasonable schema. This is the methodology I originally adopted: 

The user disk is 4GB. I don't expect to ever have more than 100 users, so I set the initial quota to 40MB each.

In practice, as expected, some users need more, but most do not. Thus, erring on the side of caution while also being flexible is probably the best approach. Today, because the SGI network has a system with a ZIP drive attatched, and the SGIs offer reliable Internet access to the WWW, many students use the Ve24 machines solely for downloading data they need, copying or moving the data onto ZIP for final transfer to their PC accounts, or to a machine at home. Since the ZIP drive is a 100MB device, I altered the quotas to 50MB each, but am happy to change that to 100MB if anyone needs it (this allows for a complete ZIP image to be downloaded if required), ie. I am tailoring quota limits based on a specific hardware-related user service issue. If a user exceeds their quota, warnings are given. If they ask for more disk space, an admin would normally enquire as to whether the user genuinely needs more space, eg.: 

Does the user have unnecessary files lying around in their home directory somewhere? For example, movie files from the Internet, unwanted mail files, games files, object files or core dump files left over from application development, media files created by 'playing' with system tools (eg. the digital camera). What about their Netscape cache? Has it been set to too high a value? Do they have hidden files they're not aware of, eg.

.capture.tmp.* directories, capture.mv files, etc.? Can the user employ compression methods to save space? (gzip, pack, compress) If a user has removed all unnecessary files, but is still short of space, then unless there is some special reason for not increasing their quota, an admin should provide more space. Exceptions could include, for example, a system which has a genuine overall shortage of storage space. In such a situation, it is common for an admin to ask users to compress their files if possible, using the 'gzip', 'compress' or 'pack' commands. Users can use tar to create archives of many files prior to compression. There is a danger with asking users to compress files though: eventually, extra storage has to be purchased; once it has been, many users start uncompressing many of the files they earlier compressed. To counter this effect, any increase in storage space being considered should be large, say an order of magnitude, or at the very least a factor of 3 or higher (I'm a firm believer in future-proofing). Note that the find command can be used to locate files which are above a certain size, eg. those that are particularly large or in unexpected places. Users can use the du command to examine how much space their own directories and files are consuming. Note: if a user exceeds their hard quota limit whilst in the middle of a write operation such as using an editor, the user will find it impossible to save their work. Unfortunately, quitting the editor at that point will lose the contents of the file because the editor will have opened a file for writing already, ie. the opened file will have zero contents. The man page for quotas describes the problem along with possible solutions that a user can employ: "In most cases, the only way for a user to recover from over-quota conditions is to abort whatever activity is in progress on the filesystem that has reached its limit, remove sufficient files to bring the limit back below quota, and retry the failed program. However, if a user is in the editor and a write fails because of an over quota situation, that is not a suitable course of action. It is most likely that initially attempting to write the file has truncated its previous contents, so if the editor is aborted without correctly writing the file, not only are the recent changes lost, but possibly much, or even all, of the contents that previously existed. There are several possible safe exits for a user caught in this situation. He can use the editor ! shell escape command (for vi only) to examine his file space and remove surplus files. Alternatively, using csh, he can suspend the editor, remove some files, then resume it. A third possibility is to write the file to some other filesystem (perhaps to a file on /tmp) where the user's quota has not been exceeded. Then after rectifying the quota situation, the file can be moved back to the filesystem it belongs on." It is important that users be made aware of these issues if quotas are installed. This is also another reason why I constantly remind users that they can use /tmp and /var/tmp for temporary tasks. One machine in Ve24 (Wolfen) has an extra 549MB disk available which any user can write to, just in case a particularly complex task requiring alot of disk space must be carried out, eg. movie file processing.

Naturally, an admin can write scripts of various kinds to monitor disk usage in detailed ways, eg. regularly identify the heaviest consumers of disk resources; one could place the results into a regularly updated file for everyone to see, ie. a publicly readable "name and shame" policy (not a method I'd use unless absolutely necessary, eg. when individual users are abusing the available space for downloading game files).

UNIX Fundamentals: Installing/removing internal/external hardware. As explained in this course's introduction to UNIX, the traditional hardware platforms which run UNIX OSs have a legacy of top-down integrated design because of the needs of the market areas the systems are sold into. Because of this legacy, much of the toil normally associated with hardware modifications is removed. To a great extent, an admin can change the hardware internals of a machine without ever having to be concerned with system setup files. Most importantly, low-level issues akin to IRQ settings in PCs are totally irrelevant with traditional UNIX hardware platforms. By traditional I mean the long line of RISC-based systems from the various UNIX vendors such as Sun, IBM, SGI, HP, DEC and even Intel. This ease of use does not of course apply to ordinary PCs running those versions of UNIX which can be used with PCs, eg. Linux, OpenBSD, FreeBSD, etc.; for this category of system, the OS issues will be simpler (presumably), but the presence of a bottom-up-designed PC hardware platform presents the usual problems of compatible components, device settings, and other irritating low-level issues. This discussion uses the SGI Indy as an example system. If circumstances allow, a more up-todate example using the O2 system will also be briefly demonstrated in the practical session. Hardware from other UNIX vendors will likely be similar in terms of ease-of-access and modification, though it has to be said that SGI has been an innovator in this area of design. Many system components can be added to, or removed from a machine, or swapped between machines, without an admin having to change system setup files in order to make the system run smoothly after any alterations. Relevant components include:      

Memory units, Disk drives (both internal and external), Video or graphics boards that do not alter how the system would handle relevant processing operations. CPU subsystems which use the same instruction set and hardware-level initialisation libraries as are already installed. Removable storage devices, eg. ZIP, JAZ, Floptical, SyQuest, CDROM, DVD (where an OS is said to support it), DAT, DLT, QIC, etc. Any option board which does not impact on any aspect of existing system operation not related to the option board itself, eg. video capture, network expansion (Ethernet, HiPPI, TokenRing, etc.), SCSI expansion, PCI expansion, etc.

Further, the physical layout means the admin does not have to fiddle with numerous cables and wires. The only cables present in Indy are the two short power supply cables, and the internal SCSI device ribbon cable with its associated power cord. No cables are present for graphics boards, video options, or other possible expansion cards. Some years after the release of the Indy, SGI's O2 design allows one to perform all these sorts of component changes without having to fiddle with any cables or screws at all (the only exception being any PCI expansion, which most O2 users will probably never use anyway). This integrated approach is certainly true of Indy. The degree to which such an ethos applies to other specific UNIX hardware platforms will vary from system to system. I should imagine systems such as Sun's Ultra 5, Ultra 10 and other Ultra-series workstations are constructed in a similar way. One might expect that any system could have a single important component replaced without affecting system operation to any great degree, even though this is usually not the case with PCs, but it may come as a far greater surprise that an entire set of major internal items can be changed or swapped from one system to another without having to alter configuration files at all. Even when setup files do have to be changed, the actual task normally only involves either a simple reinstall of certain key OS software sub-units (the relevant items will be listed in accompanying documentation and release notes), or the installation of some additional software to support any new hardware-level system features. In some cases, a hardware alteration might require a software modification to be made from miniroot if the software concerned was of a type involved in normal system operation, eg. display-related graphics libraries which controlled how the display was handled given the presence of a particular graphics board revision. The main effect of this flexible approach is that an admin has much greater freedom to:    

modify systems as required, perhaps on a daily basis (eg. the way my external disk is attatched and removed from the admin machine every single working day), experiment with hardware configurations, eg. performance analysis (a field I have extensively studied with SGIs [2]), configure temporary setups for various reasons (eg. demonstration systems for visiting clients), effect maintenance and repairs, eg. cleaning, replacing a power supply, etc.

All this without the need for time-consuming software changes, or the irritating necessity to consult PC-targeted advice guides about devices (eg. ZIP) before changes are made. Knowing the scope of this flexibility with respect to a system will allow an admin to plan tasks in a more efficient manner, resulting in better management of available time.

An example of the above with respect to the SGI Indy would be as follows (this is an imaginary demonstration of how the above concepts could be applied in real-life):



An extensive component swap between two indys, plus new hardware installed.

Background information: CPUs. All SGIs use a design method which involves supplying a CPU and any necessary secondary cache plus interface ASICs on a 'daughterboard', or 'daughtercard'. Thus, replacing a CPU merely involves changing the daughtercard, ie. no fiddling with complex CPU insertion sockets, etc. Daughtercards in desktop systems can be replaced in seconds, certainly no more than a minute or two. The various CPUs available for Indy can be divided into two categories: those which support everything up to and including the MIPS III instruction set, and those which support all these plus the MIPS IV instruction set. The R4000, R4600 and R4400 CPUs all use MIPS III and are initialised on bootup with the same low-level data files, ie. the files stored in /var/sysgen. This covers the following CPUs: 100MHz 100MHz 100MHz 133MHz 133MHz 100MHz 150MHz 175MHz 200MHz

R4000PC R4000SC R4600PC R4600PC R4600SC R4400SC R4400SC R4400SC R4400SC

(no L2) (1MB L2) (no L2) (no L2) (512K L2) (1MB L2) (1MB L2) (1MB L2) (1MB L2)

Thus, two Indys with any of the above CPUs can have their CPUs swapped without having to alter system software. Similarly, the MIPS IV CPUs: 150MHz R5000PC (no L2) 150MHz R5000SC (512K L2) 180MHz R5000SC (512K L2)

can be treated as interchangeable between systems in the same way. The difference between an Indy which uses a newer vs. older CPU is that the newer CPUs require a more up-to-date version of the system PROM chip to be installed on the motherboard (a customer who orders an upgrade is suppled with the newer PROM if required).

Video/Graphics Boards. Indy can have three different boards which control display output:

8bit XL 24bit XL 24bit XZ

8bit and 24bit XL are designed for 2D applications. They are identical except for the addition of more VRAM to the 24bit version. XZ is designed for 3D graphics and so requires a slightly different installation of software graphics libraries to be installed in order to permit proper use. Thus, with respect to the XL version, an 8bit XL card can be swapped with a 24bit XL card with no need to alter system software. Indy can have two other video options:  

IndyVideo (provides video output ports as well as extra input ports), CosmoCompress (hardware-accelerate MJPEG video capture board).

IndyVideo does not require the installation of any extra software in order to be used. CosmoCompress does require some additional software to be installed (CosmoCompress compression API and libraries). Thus, IndyVideo could be installed without any post-installation software changes. swmgr can be used to install the CosmoCompress software after the option card has been installed.

Removable Media Devices. As stated earlier, no software modifications are required, unless specifically stated by the vendor. Once a device has its SCSI ID set appropriately and installed, it is recognised automatically and a relevant icon placed on the desktop for users to exploit. Some devices may require a group of DIP switches to be configured on the outside of the device, but that is all (settings to use for a particular system will be found in the supplied device manual). The first time I used a DDS3 DAT drive (Sony SDT9000) with an Indy, the only setup required was to set four DIP switches on the underside of the DAT unit to positions appropriate for use with an SGI (as detailed on the first page of the DAT manual). Connecting the DAT unit to the Indy, booting up and logging in, the DAT was immediately usable (icon available, etc.) No setup files, no software to install, etc. The first time I used a 32X CDROM (Toshiba CD-XM-6201B) not even DIP switches had to be set.

System Disks, Extra Disks. Again, installed disks are detected automatically and the relevant device files in /dev initialised to be treated as the communication points with the devices concerned. After bootup, the fx, mkfs and mount commands can be used to configure and mount new disks, while disks which already have a valid file system installed can be mounted immediately. GUI tools are available for performing these actions too.

Thus, consider two Indys:

System A

System B

200MHz R4400SC 24bit XL 128MB RAM 2GB disk IRIX 6.2

100MHz R4600PC 8bit XL 64MB RAM 1GB disk IRIX 6.2

Suppose an important company visitor is expected the next morning at 11am and the admin is asked to quickly prepare a decent demonstration machine, using a budget provided by the visiting company to cover any changes required (as a gift, any changes can be permanent). The admin orders the following extra items for next-day delivery:      

A new 4GB SCSI disk (Seagate Barracuda 7200rpm) IndyVideo board Floptical drive ZIP drive 32X Toshiba CDROM (external) DDS3 Sony DAT drive (external)

The admin decides to make the following changes (Steps 1 and 2 are carried out immediately; in order to properly support the ZIP drive, the admin needs to use IRIX 6.5 on B. The support contract means the CDs are already available.): 1. Swap the main CPU, graphics board and memory components between systems A and B. 2. Remove the 1GB disk from System B and install it as an option disk in System A. The admin uses fx and mkfs to redine the 1GB disk as an option drive, deciding to use the disk for a local /usr/share partition (freeing up perhaps 400MB of space from System A's 2GB disk). 3. The order arrives the next morning at 9am (UNIX vendors usually use couriers such as Fedex and DHL, so deliveries are normally very reliable). The 4GB disk is installed into System B (empty at this point) and the CDROM connected to the external SCSI port (SCSI ID 3). The admin then installs IRIX 6.5 onto the 4GB disk, a process which takes approximately 45 minutes. The system is powered down ready for the final hardware changes. 4. The IndyVideo board is installed in System B (sits on top of the 24bit XL board, 2 or 3 screws involved, no cables), along with the internal Floptical drive above the 4GB disk (SCSI ID set to 2). The DAT drive (SCSI ID set to 4) is daisy chained to the external CDROM. The ZIP drive is daisy chained to the DAT (SCSI ID 5 by default selector, terminator enabled). This can all be done in less than five minutes. 5. The system is rebooted, the admin logs in as root. All devices are recognised automatically and icons for each device (ZIP, CDROM, DAT, Floptical) are immediately present on the desktop and available for use. Final additional software installations can begin, ready for the visitor's arrival. An hour should be plenty of time to install specific application(s) or libraries that might be required for the visit.

I am confident that steps 1 and 2 could be completed in less than 15 minutes. Steps 3, 4 and 5 could be completed in less little more than an hour. Throughout the entire process, no OS or software changes have to be made to either System A, or to the 6.5 OS installed on System B's new 4GB after initial installation (ie. the ZIP, DAT and Floptical were not attatched to System B when the OS was installed, but they are correctly recognised by the default 6.5 OS when the devices are added afterwards). If time permits and interest is sufficient, almost all of this example can be demonstrated live (the exception is the IndyVideo board; such a board is not available for use with the Ve24 system at the moment). How does the above matter from an admin's point of view? The answer is confidence and lack of stress. I could tackle a situation such as described here in full confidence that I would not have to deal with any matters concerning device drivers, interupt addresses, system file modifications, etc. Plus, I can be sure the components will work perfectly with one another, constructed as they are as part of an integrated system design. In short, this integrated approach to system design makes the admin's life substantially easier.

The Visit is Over. Afterwards, the visitor donates funds for a CosmoCompress board and an XZ board set. Ordered that day, the boards arrive the next morning. The admin installs the CosmoCompress board into System B (2 or 3 more screws and that's it). Upon bootup, the admin installs the CosmoCompress software from the supplied CD with swmgr. With no further system changes, all the existing supplied software tools (eg. MediaRecorder) can immediately utilise the new hardware compression board. The 8bit XL board is removed from System A and replaced with the XZ board set. Using inst accessed via miniroot, the admin reinstalls the OS graphics libraries so that the appropriate libraries are available to exploit the new board. After rebooting the system, all existing software written in OpenGL automatically runs ten times faster than before, without modification.

Summary. Read available online books and manual pages on general hardware concepts thoroughly. Get to know the system - every machine will either have its own printed hardware guide, or an equivalent online book. Practice hardware changes before they are required for real. Consult any Internet-based information sources, especially newsgroup posts, 3rd-party web sites and hardware-related FAQ files.

When performing installations, follow all recommended procedures, eg. use an anti-static strap to eliminate the risk of static discharge damaging system components (especially important for handling memory items, but also just as relevant to any other device). Construct a hardware maintenance strategy for cleansing and system checking, eg. examine all mice on a regular basis to ensure they are dirt-free, use an air duster once a month to clear away accumulated dist and grime, clean the keyboards every two months, etc. Be flexible. System management policies are rarely static, eg. a sudden change in the frequency of use of a system might mean cleansing tasks need to be performed more often, eg. cleaning monitor screens. If you're not sure what the consequences of an action might be, call the vendor's hardware support service and ask for advice. Questions can be extremely detailed if need be - this kind of support is what such support services are paid to offer, so make good use of them. Before making any change to a system, whether hardware or software, inform users if possible. This is probably more relevant to software changes (eg. if a machine needs to be rebooted, use 'wall' to notify any users logged onto the machine at the time, ie. give them time to log off; if they don't, go and see why they haven't), but giving advance notice is still advisable for hardware changes too, eg. if a system is being taken away for cleaning and reinstallation, a user may want to retrieve files from /var/tmp prior to the system's removal, so place a notice up a day or so beforehand if possible.

References: 1. "Storage for the network", Network Week, Vol4 No.31, 28th April 1999, pp. 25 to 29, by Marshall Breeding. 2. SGI General Performance Comparisons: 3. http://www.futuretech.vuurwerk.nl/perfcomp.html

Detailed Notes for Day 3 (Part 3) UNIX Fundamentals: Typical system administration tasks. Even though the core features of a UNIX OS are handled automatically, there are still some jobs for an admin to do. Some examples are given here, but not all will be relevant for a particular network or system configuration.

Data Backup. A fundamental aspect of managing any computer system, UNIX or otherwise, is the backup of user and system data for possible retrieval purposes in the case of system failure, data corruption, etc. Users depend the admin to recover files that have been accidentally erased, or lost due to hardware problems.

Backup Media. Backup devices may be locally connected to a system, or remotely accessible across a network. Typical backup media types include:    

1/4" cartridge tape, 8mm cartridge tape (used infrequently today) DAT (very common) DLT (where lots of data must be archived) Floptical, ZIP, JAZ, SyQuest (common for user-level backups)

Backup tapes, disks and other media should be well looked after in a secure location [3].

Backup Tools. Software tools for archiving data include low-level format-independent tools such as dd, file and directory oriented tools such as tar and cpio, filesystem-oriented tools such as bru, standard UNIX utilities such as dump and restore (cannot be used with XFS filesystems - use xfsdump and xfsrestore instead), etc., and high-level tools (normally commercial packages) such as IRIS NetWorker. Some tools include a GUI frontend interface. The most commonly used program is tar, which is also widely used for the distribution of shareware and freeware software. Tar allows one to gather together a number of files and directories into a single 'tar archive' file which by convention should always have a '.tar' suffix. By specifying a device such as a DAT instead of an archive file, tar can thus be used to archive data directly to a backup medium. Tar files can also be compressed, usually with the .gz format (gzip and gunzip) though there are other compression utilities (compress, pack, etc.) Backup and restoration speed can be improved

by compressing files before any archiving process commences. Some backup devices have builtin hardware compression abilities. Note that files such as MPEG movies and JPEG images are already in a compressed format, so compressing these prior to backup is pointless. Straightforward networks and systems will almost always use a DAT drive as the backup device and tar as the software tool. Typically, the 'cron' job scheduling system is used to execute a backup at regular intervals, usually overnight. Cron is discussed in more detail below.

Backup Strategy. Every UNIX guide will recommend the adoption of a 'backup strategy', ie. a combination of hardware and software related management methods determined to be the most suitable for the site in question. A backup strategy should be rigidly adhered to once in place. Strict adherence allows an admin to reliably assess whether lost or damaged data is recoverable when a problem arises. Exactly how an admin performs backups depends upon the specifics of the site in question. Regardless of the chosen strategy, at least two full sets of reasonably current backups should always be maintained. Users should also be encouraged to make their own backups, especially with respect to files which are changed and updated often.

What/When to Backup. How often a backup is made depends on the system's frequency of use. For a system like the Ve24 SGI network, a complete backup of user data every night, plus a backup of the server's system disk once a week, is fairly typical. However, if a staff member decided to begin important research with commercial implications on the system, I might decide that an additional backup at noon each day should also be performed, or even hourly backups of just that person's account. Usually, a backup archives all user or system data, but this may not be appropriate for some sites. For example, an artist or animator may only care about their actual project files in their ~/Maya project directory (Maya is a professional Animation/Rendering package) rather than the files which define their user environment, etc. Thus, an admin might decide to only backup every users' Maya projects directory. This would, for example, have the useful side effect of excluding data such as the many files present in a user's .netscape/cache directory. In general though, all of a user's account is archived. If a change is to be made to a system, especially a server change, then separate backups should be performed before and after the change, just in case anything goes wrong. Since root file systems do not change very much, they can be backed up less frequently, eg. once per week. An exception might be if the admin wishes to keep a reliable record of system access logs which are part of the root file system, eg. those located in the files (for example):

/var/adm/SYSLOG /var/netscape/suitespot/proxy-sysname-proxy/logs

The latter of the two would be relevant if a system had a Proxy server installed, ie. 'sysname' would be the host name of the system. Backing up /usr and /var instead of the entire / root directory is another option - the contents of /usr and /var change more often than many other areas of the overall file system, eg. users' mail is stored in /var/mail and most executable programs are under /usr. In some cases, it isn't necessary to backup an entire root filesystem anyway. For example, the Indys in Ve24 all have more or less identical installations: all Indys with a 549MB disk have the same disk contents as each other, likewise for those with 2GB disks. The only exception is Wolfen which uses IRIX 6.5 in order to provide proper support for an attached ZIP drive. Thus, a backup of one of the client Indys need only concern specific key files such as /etc/hosts, /etc/sys_id, /var/flexlm/license.dat, etc. However, this policy may not work too well for servers (or even clients) because:  

an apparently small change, eg. adding a new user, installing a software patch, can affect many files, the use of GUI-based backup tools does not aid an admin in remembering which files have been archived.

For this reason, most admins will use tar, or a higher-level tool like xfsdump. Note that because restoring data from a DAT device is slower than copying data directly from disk to disk (especially modern UltraSCSI disks), an easier way to restore a client's system disk where all clients have identical disk contents - is to clone the disk from another client and then alter the relevant files; this is what I do if a problem occurs. Other backup devices can be much faster though [1], eg. DLT9000 tape streamer, or military/industrial grade devices such as the DCRsi 240 Digital Cartridge Recording System (30MB/sec) as was used to backup data during the development of the 777 aircraft, or the Ampex DIS 820i Automated Cartridge Library (scalable from 25GB to 6.4TB max capacity, 80MB/sec sustained record rate, 800MB/sec search/read rate, 30 seconds maximum search time for any file), or just a simple RAID backup which some sites may choose to use. It's unusual to use another disk as a backup medium, but not unheard of. Theoretically, it's the fastest possible backup medium, so if there's a spare disk available, why not? Some sites may even have a 'mirror' system whereby a backup server B copies exactly the changes made to an identical file system on the main server A; in the event of serious failure, server B can take over immediately. SGI's commercial product for this is called IRIS FailSafe, with a switchover time between A and B of less than a millisecond. Fail-safe server configurations like this are the ultimate form of backup, ie. all files are being backed up in real-time, and the support hardware has a backup too. Any safety-critical installation will probably use such methods. Special power supplies might be important too, eg. a UPS (Uninterruptable Power Supply) which gives some additional power for a few minutes to an hour or more after a power failure and

notifies the system to facilitate a safe shutdown, or a dedicated backup power generator could be used, eg. hospitals, police/fire/ambulance, airtraffic control, etc. Note: systems managed by more than one admin should be backed up more often; admin policies should be consistent.

Incremental Backup. This method involves only backing up files which have changed since the previous backup, based on a particular schedule. An incremental schema offers the same degree of 'protection' as an entire system backup and is faster since fewer files are archived each time, which means faster restoration time too (fewer files to search through on a tape). An example schedule is given in the online book, "IRIX Admin: Backup, Security, and Accounting': "An incremental scheme for a particular filesystem looks something like this: 1. On the first day, back up the entire filesystem. This is a monthly backup. 2. On the second through seventh days, back up only the files that changed from the previous day. These are daily backups. 3. On the eighth day, back up all the files that changed the previous week. This is a weekly backup. 4. Repeat steps 2 and 3 for four weeks (about one month). 5. After four weeks (about a month), start over, repeating steps 1 through 4. You can recycle daily tapes every month, or whenever you feel safe about doing so. You can keep the weekly tapes for a few months. You should keep the monthly tapes for about one year before recycling them."

Backup Using a Network Device. It is possible to archive data to a remote backup medium by specifying the remote host name along with the device name. For example, an ordinary backup to a locally attached DAT might look like this: tar cvf /dev/tape /home/pub

Or if no other relevant device was present:

tar cv /home/pub

For a remote device, simply add the remote host name before the file/directory path: tar cvf yoda:/dev/tape /home/pub

Note that if the tar command is trying to access a backup device which is not made by the source vendor, then '/dev/tape' may not work. In such cases, an admin would have to use a suitable lower-level device file, ie. one of the files in /dev/rmt - exactly which one can be determined by deciding on the required functionality of the device, as explained in the relevant device manual, along with the SCSI controller ID and SCSI device ID. Sometimes a particular user account name may have to be supplied when accessing a remote device, eg.: tar guest@yoda:/dev/tape /home/pub

This example wouldn't actually work on the Ve24 network since all guest accounts are locked out for security reasons, except on Wolfen. However, an equivalent use of the above syntax can be demonstrated using Wolfen's ZIP drive and the rcp (remote copy) command: rcp -r /home/pub guest.guest1@wolfen:/zip

Though note that the above use of rcp would not retain file time/date creation/modification information when copying the files to the ZIP disk (tar retains all information).

Automatic Backup With Cron. The job scheduling system called cron can be used to automatically perform backups, eg. overnight. However, such a method should not be relied upon - nothing is better than someone manually executing/observing a backup, ensuring that the procedure worked properly, and correctly labelling the tape afterwards. If cron is used, a typical entry in the root cron jobs schedule file (/var/spool/cron/crontabs/root) might look like this: 0 3 * * * /sbin/tar cf /dev/tape /home

This would execute a backup to a locally attached backup device at 3am every morning. Of course, the admin would have to ensure a suitable media was loaded before leaving at the end of each day. This is a case where the '&&' operator can be useful: in order to ensure no subsequent operation could alter the backed-up data, the 'eject' command could be employed thus: 0 3 * * * /sbin/tar cf /dev/tape /home && eject /dev/tape

Only after the tar command has finished will the backup media be ejected. Notice there is no 'v' option in these tar commands (verbose mode). Why bother? Nobody will be around to see the output. However, an admin could modify the command to record the output for later reading: 0 3 * * * /sbin/tar cvf /dev/tape /home > /var/tmp/tarlog && eject /dev/tape

Caring for Backup Media. This is important, especially when an admin is responsible for backing up commercially valuable, sensitive or confidential data. Any admin will be familiar with the usual common-sense aspects of caring for any storage medium, eg. keeping media away from strong magnetic fields, extremes of temperature and humidity, etc., but there are many other factors too. The "IRIX Admin: Backup, Security, and Accounting' guide contains a good summary of all relevant issues: "Storage of Backups Store your backup tapes carefully. Even if you create backups on more durable media, such as optical disks, take care not to abuse them. Set the write protect switch on tapes you plan to store as soon as a tape is written, but remember to unset it when you are ready to overwrite a previously-used tape. Do not subject backups to extremes of temperature and humidity, and keep tapes away from strong electromagnetic fields. If there are a large number of workstations at your site, you may wish to devote a special room to storing backups. Store magnetic tapes, including 1/4 in. and 8 mm cartridges, upright. Do not store tapes on their sides, as this can deform the tape material and cause the tapes to read incorrectly. Make sure the media is clearly labeled and, if applicable, write-protected. Choose a labelcolor scheme to identify such aspects of the backup as what system it is from, what level of backup (complete versus partial), what filesystem, and so forth. To minimize the impact of a disaster at your site, such as a fire, you may want to store main copies of backups in a different building from the actual workstations. You have to balance this practice, though, with the need to have backups handy for recovering files. If backups contain sensitive data, take the appropriate security precautions, such as placing them in a locked, secure room. Anyone can read a backup tape on a system that has the appropriate utilities.

How Long to Keep Backups You can keep backups as long as you think you need to. In practice, few sites keep system backup tapes longer than about a year before recycling the tape for new backups. Usually, data for specific purposes and projects is backed up at specific project milestones (for example, when a project is started or finished). As site administrator, you should consult with your users to determine how long to keep filesystem backups. With magnetic tapes, however, there are certain physical limitations. Tape gradually loses its flux (magnetism) over time. After about two years, tape can start to lose data. For long-term storage, re-copy magnetic tapes every year to year-and-a-half to prevent data loss through deterioration. When possible, use checksum programs, such as the sum(1) utility, to make sure data hasn't deteriorated or altered in the copying process. If you want to reliably store data for several years, consider using optical disk.

Guidelines for Tape Reuse You can reuse tapes, but with wear, the quality of a tape degrades. The more important the data, the more precautions you should take, including using new tapes. If a tape goes bad, mark it as "bad" and discard it. Write "bad" on the tape case before you throw it out so that someone doesn't accidentally try to use it. Never try to reuse an obviously bad tape. The cost of a new tape is minimal compared to the value of the data you are storing on it."

Backup Performance. Sometimes data archive/extraction speed may be important, eg. a system critical to a commercial operation fails and needs restoring, or a backup/archive must be made before a deadline. In these situations, it is highly advisable to use a fast backup medium, eg. DDS3 DAT instead of DDS1 DAT. For example, an earlier lecture described a situation where a fault in the Ve24 hub caused unnecessary fault-hunting. As part of that process, I restored the server's system disk from a backup tape. At the time, the backup device was a DDS1 DAT. Thus, to restore some 1.6GB of data from a standard 2GB capacity DAT tape, I had to wait approximately six hours for the restoration to complete (since the system was needed the next morning, I stayed behind well into the night to complete the operation).

The next day, it was clear that using a DDS1 was highly inefficient and time-wasting, so a DDS3 DAT was purchased immediately. Thus, if the server ever has to be restored from DAT again, and despite the fact it now has a larger disk (4GB with 2.5GB of data typically present), even a full restoration would only take three hours instead of six (with 2.5GB used, the restoration would finish in less than two hours). Tip: as explained in the lecture on hardware modifications and installations, consider swapping a faster CPU into a system in order to speedup a backup or restoration operation - it can make a significant difference [2].

Hints and Tips.  







    

Keep tape drives clean. Newer tapes deposit more dirt than old ones. Use du and df to check that a media will have enough space to store the data. Consider using data compression options if space on the media is at a premium (some devices may have extra device files which include a 'c' in the device name to indicate it supports hardware compression/decompression, eg. a DLT drive whose raw device file is /dev/rmt/tps0d5vc). There is no point using compression options if the data being archived is already compressed with pack, compress, gzip, etc. or is naturally compressed anyway, eg. an MPEG movie, JPEG image, etc. Use good quality media. Do not use ordinary audio DAT tapes with DAT drives for computer data backup; audio DAT tapes are of a lower quality than DAT tapes intended for computer data storage. Consider using any available commands to check beforehand that a file system to be backed up is not damaged or corrupted (eg. fsck). This will be more relevant to older file system types and UNIX versions, eg. fsck is not relevant to XFS filesystems (IRIX 6.x and later), but may be used with EFS file systems (IRIX 5.3 and earlier). Less important when dealing with a small number of items. Label all backups, giving full details, eg. date, time, host name, backup command used (so you or another admin will know how to extract the files later), general contents description, and your name if the site has more than one admin with responsibility for backup procedures. Verify a backup after it is made; some commands require specific options, while others provide a means of listing the contents of a media, eg. the -t option used with tar. Write-protect a media after a backup has finished. Keep a tally on the media of how many times it has been used. Consider including an index file at the very start of the backup on the media, eg.: ls -AlhFR /home > /home/0000index && tar cv /home

Note: such index files can be large.  

Exploit colour code schemes to denote special attributes, eg. daily vs. weekly vs. monthly tapes. Be aware of any special issues which may be relevant to the type of data being backed up. For example, movie files can be very large; on SGIs, tar requires the K option in order to archive files larger than 2GB. Use of this option may mean the archived media is not compatible with another vendor's version of tar.



Consult the online guides. Such guides often have a great deal of advice, examples, etc.

tar is a powerful command with a wide range of available options and is used on UNIX systems worldwide. It is typical of the kind of UNIX command for which an admin is well advised to read through the entire man page. Other commands in this category include find, rm, etc. Note: if compatibility between different versions of UNIX is an issue, one can use the lowerlevel dd command which allows one to specify more details about how the data is to be dealt with as it is sent to or received from a backup device, eg. changing the block size of the data. A related command is 'mt' which can be used to issue specific commands to a magnetic tape device, eg. print device details and default block size. If problems occur during backup/restore operations, remember to check /var/adm/SYSLOG for any relevant error messages (useful if one cannot be present to monitor the operation in person).

Restoring Data from Backup Media. Restoring non-root-filesystem data is trivial: just use the relevant extraction tool, eg.: tar xv /dev/tape

However, restoring the root '/' partition usually requires access to an appropriate set of OS CD(s) and a full system backup tape of the / partition. Further, many OSs may insist that backup and restore operations at the system level must be performed with a particular tool, eg. Backup and Restore. If particular tools were required but not used to create the backup, or if the system cannot boot to a state where normal extraction tools can be used (eg. damage to the /usr section of the filesystem) then a complete reinstallation of the OS must be done, followed by the extraction of the backup media ontop of the newly created filesystem using the original tool. Alternatively, a fresh OS install can be done, then a second empty disk inserted on SCSI ID 2, setup to be a root disk, the backup media extracted onto the second disk, then the volume header copied over using dhvtool or other command relevant to the OS being used (this procedure is similar to disk cloning). Finally, a quick swap of the disks so that the second disk is on SCSI ID 1 and the system is back to normal. I personally prefer this method since it's "cleaner", ie. one can never be sure that extracting files ontop of an existing file system will result in a final filesystem that is genuinely identical to the original. By using a second disk in this way, the psychological uncertainty is removed. Just like backing up data to a remote device, data can be restored from a remote device as well. An OS 'system recovery' menu will normally include an option to select such a restoration method - a full host:/path specification is required. Note that if a filesystem was archived with a leading / symbol, eg.: tar cvf /dev/tape /home/pub/movies/misc

then an extraction may fail if an attempt is made to extract the files without changing the equivalent extraction path, eg. if a student called cmpdw entered the following command with such a tape while in their home directory: tar xvf /dev/tape

then the command would fail since students cannot write to the top level of the /home directory. Thus, the R option can be used (or equivalent option for other commands) to remove leading / symbols so that files are extracted into the current directory, ie. if cmpdw entered: tar xvfR /dev/tape

then tar would place the /home data from the tape into the cmpdw's home directory, ie. cmpdw would see a new directory with the name: /home/students/cmpdw/home

Other Typical Daily Tasks. From my own experience, these are the types of task which most admins will likely carry out every day:  

  

 



Check disk usage across the system. Check system logs for important messages, eg. system errors and warnings, possible suspected access attempts from remote systems (hackers), suspicious user activity, etc. This applies to web server logs too (use script processing to ease analysis). Check root's email for relevant messages (eg. printers often send error messages to root in the form of an email). Monitor system status, eg. all systems active and accessible (ping). Monitor system performance, eg. server load, CPU-hogging processes running in background that have been left behind by a careless user, packet collision checks, network bandwidth checks, etc. Ensure all necessary system services are operating correctly. Tour the facilities for general reasons, eg. food consumed in rooms where such activity is prohibited, users who have left themselves logged in by mistake, a printer with a paper jam that nobody bothered to report, etc. Users are notoriously bad at reporting physical hardware problems - the usual response to a problem is to find an alternative system/device and let someone else deal with it. Dealing with user problems, eg. "Somebody's changed my password!" (ie. the user has forgotten their password). Admins should be accessible by users, eg. a public email address, web feedback form, post box by the office, etc. Of course, a user can always send an email to the root account, or to the admin's personal account, or simply visit the admin in person. Some systems, like Indy, may have additional abilities, eg. video conferencing: a user can use the InPerson software to request a live video/audio link to the admin's system, allowing 2-way communication (see the inperson man page). Other

 

 

facilities such as the talk command can also be employed to contact the admin, eg. at a remote site. It's up to the admin to decide how accessible she/he should be - discourage trivial interruptions. Work on improving any relevant aspect of system, eg. security, services available to users (software, hardware), system performance tuning, etc. Cleaning systems if they're dirty; a user will complain about a dirty monitor screen or sticking mouse behaviour, but they'll never clean them for you. Best to prevent complaints via regular maintenance. Consider other problem areas that may be hidden, eg. blowing loose toner out of a printer with an air duster can. Learning more about UNIX in general. Taking necessary breaks! A tired admin will make mistakes.

This isn't a complete list, and some admins will doubtless have additional responsibilities, but the above describes the usual daily events which define the way I manage the Ve24 network.

Useful file: /etc/motd The contents of this file will be echoed to stdout whenever a user activates a login shell. Thus, the message will be shown when:   

a user first logs in (contents in all visible shell windows), a user accesses another system using commands such as rlogin and telnet, a user creates a new console shell window; from the man page for console, "The console provides the operator interface to the system. The operating system and system utility programs display error messages on the system console."

The contents of /etc/motd are not displayed when the user creates a new shell using 'xterm', but is displayed when winterm is used. The means by which xterm/winterm are executed are irrelevant (icon, command, Toolchest, etc.) The motd file can be used as a simple way to notify users of any developments. Be careful of allowing its contents to become out of date though. Also note that the file is local to each system, so maintaining a consistent motd between systems might be necessary, eg. a script to copy the server's motd to all clients. Other possible ways to inform users of worthy news is the xconfirm command, which could be included within startup scripts, user setup files, etc. From the xconfirm man page: "xconfirm displays a line of text for each -t argument specified (or a file when the -file argument is used), and a button for each -b argument specified. When one of the buttons is pressed, the label of that button is written to xconfirm's standard output. The enter key activates the specified default button. This provides a means of communication/feedback from within shell scripts and a means to display useful information to a user from an application. Command line options are available to specify geometry, font style, frame

style, modality and one of five different icons to be presented for tailored visual feedback to the user." For example, xconfirm could be used to interactively warn the user if their disk quota has been exceeded.

UNIX Fundamentals: System bootup and shutdown, events, daemons. SGI's IRIX is based on System V with BSD enhancements. As such, the way an IRIX system boots up is typical of many UNIX systems. Some interesting features of UNIX can be discovered by investigating how the system starts up and shuts down. After power on and initial hardware-level checks, the first major process to execute is the UNIX kernel file /unix, though this doesn't show up in any process list as displayed by commands such as ps. The kernel then starts the init program to begin the bootup sequence, ie. init is the first visible process to run on any UNIX system. One will always observe init with a process ID of 1: % ps -ef | grep init | grep -v grep root 1 0 0 21:01:57 ?

0:00 /etc/init

init is used to activate, or 'spawn', other processes. The /etc/inittab file is used to determine what processes to spawn. The lecture on shell scripts introduced the init command, in a situation where a system was made to reboot using: init 6

The number is called a 'run level', ie. a software configuration of the system under which only a selected group of processes exist. Which processes correspond to which run level is defined in the /etc/inittab file. A system can be in any one of eight possible run levels: 0 to 6, s and S (the latter two are identical). The states which most admins will be familiar with are 0 (total shutdown and power off), 1 (enter system administration mode), 6 (reboot to default state) and S (or s) for 'single-user' mode, a state commonly used for system administration. The /etc/inittab file contains an 'initdefault' state, ie. the run level to enter by default, which is normally 2, 3 or 4. 2 is the most common, ie. the full multi-user state with all processes, daemons and services activated. The /etc/inittab file is constructed so that any special initialisation operations, such as mounting filesystems, are executed before users are allowed to access the system. The init man page has a very detailed description of these first few steps of system bootup. Here is a summary:

An initial console shell is created with which to begin spawning processes. The fact that a shell is used this early in the boot cycle is a good indication of how closely related shells are to UNIX in general. The scripts which init uses to manage processes are stored in the /etc/init.d directory. During bootup, the files in /etc/rc2.d are used to bring up system processes in the correct order (the /etc/rc0.d directory is used for shutdown - more on that later). These files are actually links to the equivalent script files in /etc/init.d. Each file in /etc/rc2.d (the 2 presumably corresponding to run level 2 by way of a naming convention) all begin with S followed by two digits (S for 'Spawn' perhaps), causing them to be executed in a specific order as determined by the first 3 characters of each file (alphanumeric). Thus, the first file run in the console shell is /etc/rc2.d/S00anounce (a link to /etc/init.d/announce - use 'more' or load this file into an editor to see what it does). init will run the script with appropriate arguments depending on whether the procedure being followed is a startup or shutdown, eg. 'start', 'stop', etc. The /etc/config directory is used by each script in /etc/init.d to decide what it should do. /etc/config contains files which correspond to files found in /etc/rc2.d with the same name. These /etc/config files contain simply 'on' or 'off'. The chkconfig command is used to test the appropriate file by each script, returning true or false depending on its contents and thus determining whether the script does anything. An admin uses chkconfig to set the various files' contents to on or off as desired, eg. to switch a system into stand-alone mode, turn off all network-related services on the next reboot: chkconfig chkconfig ckkconfig chkconfig init 6

network off nfs off yp off named off

Enter chkconfig on its own to see the current configuration states. Lower-level functions are performed first, beginning with a SCSI driver check to ensure that the system disk is going to be accessed correctly. Next, key file systems are mounted. Then the following steps occur, IF the relevant /etc/config file contains 'on' for any step which depends on that fact:    

 

A check to see if any system crash files are present (core dumps) and if so to send a message to stdout. Display company trademark information if present; set the system name. Begin system activity reporting daemons. Create a new OS kernel if any system changes have been made which require it (this is done by testing whether or not any of the files in /var/sysgen are newer than the /unix kernel file). Configure and activate network ports. etc.

Further services/systems/tasks to be activated if need be include ip-aliasing, system auditing, web servers, license server daemons, core dump manager, swap file configuration, mail daemon, removal of /tmp files, printer daemon, higher-level web servers such as Netscape Administration Server, cron, PPP, device file checks, and various end-user and application daemons such as the midi sound daemon which controls midi library access requests. This isn't a complete list, and servers will likely have more items to deal with than clients, eg. starting up DNS, NIS, security & auditing daemons, quotas, internet routing daemons, and more than likely a time daemon to serve as a common source of current time for all clients. It should be clear that the least important services are executed last - these usually concern userrelated or application-related daemons, eg. AppleTalk, Performance Co-Pilot, X Windows Display Manager, NetWare, etc. Even though a server or client may initiate many background daemon processes on bootup, during normal system operation almost all of them are doing nothing at all. A process which isn't doing anything is said to be 'idle'. Enter: ps -ef

The 'C' column shows the activity level of each process. No matter when one checks, almost all the C entries will be zero. UNIX background daemons only use CPU time when they have to, ie. they remain idle until called for. This allows a process which truly needs CPU cycles to make maximum use of available CPU time. The scripts in /etc/init.d may startup other services if necessary as well. Extra configuration/script files are often found in /etc/config in the form of a file called servicename.options, where 'servicename' is the name of the normal script run by init. Note: the 'verbose' file in /etc/config is used by scripts to dynamically redefine whether the echo command is used to output progress messages. Each script checks whether verbose mode is on using the chkconfig command; if on, then a variable called $ECHO is set to 'echo'; if off, $ECHO is set to something which is interpreted by a shell to mean "ignore everything that follows this symbol", so setting verbose mode to off means every echo command in every script (which uses the $ECHO test and set procedure) will produce no output at all - a simple, elegant and clean way of controlling system behaviour. When shutting a system down, the behaviour described above is basically just reversed. Scripts contained in the /etc/rc0.d directory perform the necessary actions, with the name prefixes determining execution order. Once again, the first three characters of each file name decide the alphanumeric order in which to execute the scripts; 'K' probably stands for 'Kill'. The files in /etc/rc0.d shutdown user/application-related daemons first, eg. the MIDI daemon. Comparing the contents of /etc/rc2.d and /etc/rc0.d, it can be seen that their contents are mirror images of each other. The alphanumeric prefixes used for the /etc/rc*.d directories are defined in such a way as to allow extra scripts to be included in those directories, or rather links to relevant scripts in

/etc/init.d. Thus, a custom 'static route' (to force a client to always route externally via a fixed route) can be defined by creating new links from /etc/rc2.d/S31network and /etc/rc0.2/K39network, to a custom file called network.local in /etc/init.d. There are many numerical gaps amongst the files, allowing for great expansion in the number of scripts which can be added in the future.

References: 1. Extreme Technologies: 2. http://www.futuretech.vuurwerk.nl/extreme.html

3. DDS1 vs. DDS3 DAT Performance Tests: 4. 5. 6. 7.

http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT1 http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT2 http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT3 http://www.futuretech.vuurwerk.nl/perfcomp.html#DAT4

8. "Success With DDS Media", Hewlett Packard, Edition 1, February 1991.

Detailed Notes for Day 3 (Part 4) UNIX Fundamentals: Security and Access Control.

General Security. Any computer system must be secure, whether it's connected to the Internet or not. Some issues may be irrelevant for Intranets (isolated networks which may or may not use Internet-style technologies), but security is still important for any internal network, if only to protect against employee grievances or accidental damage. Crucially, a system should not be expanded to include external network connections until internal security has been dealt with, and individual systems should not be added to a network until they have been properly configured (unless the changes are of a type which cannot be made until the system is physically connected). However, security is not an issue which can ever be finalised; one must constantly maintain an up-to-date understanding of relevant issues and monitor the system using the various available tools such as 'last' (display recent logins; there are many other available tools and commands). In older UNIX variants, security mostly involved configuring the contents of various system/service setup files. Today, many UNIX OSs offer the admin a GUI-frontend security manager to deal with security issues in a more structured way. In the case of SGI's IRIX, version 6.5 has such a GUI tool, but 6.2 does not. The GUI tool is really just a convenient way of gathering together all the relevant issues concerning security in a form that is easier to deal with (ie. less need to look through man pages, online books, etc.) The security issues themselves are still the same. UNIX systems have a number of built-in security features which offer a reasonably acceptable level of security without the need to install any additional software. UNIX gives users a great deal of flexibility in how they manage and share their files and data; such convenience may be incompatible with an ideal site security policy, so decisions often have to be taken about how secure a system is going to be - the more secure a system is, the less flexible for users it becomes. Older versions of any UNIX variant will always be less secure than newer ones. If possible, an admin should always try and use the latest version in order to obtain the best possible default security. For example, versions of IRIX as old as 5.3 (circa 1994) had some areas of subtle system functionality rather open by default (eg. some feature or service turned on), whereas versions later than 6.0 turned off the features to improve the security of a default installation UNIX vendors began making these changes in order to comply with the more rigorous standards demanded by the Internet age. Standard UNIX security features include: 1. 2. 3. 4.

File ownership, File permissions, System activity monitoring tools, eg. who, ps, log files, Encryption-based, password-protected user accounts,

5. An encryption program (crypt) which any user can exploit.

Figure 60. Standard UNIX security features.

All except the last item above have already been discussed in previous lectures. The 'crypt' command can be used by the admin and users to encrypt data, using an encryption key supplied as an argument. Crypt employs an encryption schema based on similar ideas used in the German 'Enigma' machine in WWII, although crypt's implementation of the mathematical equivalent is much more complex, like having a much bigger and more sophisticated Enigma machine. Crypt is a satisfactorily secure program; the man page says, "Methods of attack on such machines are known, but not widely; moreover the amount of work required is likely to be large." However, since crypt requires the key to be supplied as an argument, commands such as ps could be used by others to observe the command in operation, and hence the key. This is crypt's only weakness. See the crypt man page for full details on how crypt is used.

Responsibility. Though an admin has to implement security policies and monitor the system, ordinary users are no less responsible for ensuring system security in those areas where they have influence and can make a difference. Besides managing their passwords carefully, users should control the availability of their data using appropriate read, write and execute file permissions, and be aware of the security issues surrounding areas such as accessing the Internet. Security is not just software and system files though. Physical aspects of the system are also important and should be noted by users as well as the admin. Thus:    

 

Any item not secured with a lock, cable, etc. can be removed by anyone who has physical access. Backups should be securely stored. Consider the use of video surveillance equipment and some form of metal-key/keycard/numeric-code entry system for important areas. Account passwords enable actions performed on the system to be traced. All accounts should have passwords. Badly chosen passwords, and old passwords, can compromise security. An admin should consider using password-cracking software to ensure that poorly chosen passwords are not in use. Group permissions for files should be set appropriately (user, group, others). Guest accounts can be used anonymously; if a guest account is necessary, the tasks which can be carried out when logged in as guest should be restricted. Having open guest accounts on multiple systems which do not have common ordinary accounts is unwise - it allows users to anonymously exchange data between such systems when their normal accounts would not

   













allow them to do so. Accounts such as guest can be useful, but they should be used with care, especially if they are left with no password. Unused accounts should be locked out, or backed up and removed. If a staff member leaves the organisation, passwords should be changed to ensure such former users do not retain access. Sensitive data should not be kept on systems with more open access such as anonymous ftp and modem dialup accounts. Use of the su command amongst users should be discouraged. Its use may be legitimate, but it encourages lax security (ordinary users have to exchange passwords in order to use su). Monitor the /var/adm/sulog file for any suspicious use of su. Ensure that key files owned by a user are writeable only by that user, thus preventing 'trojan horse' attacks. This also applies to root-owned files/dirs, eg. /, /bin, /usr/bin, /etc, /var, and so on. Use find and other tools to locate directories that are globally writeable - if such a directory is a user's home directory, consider contacting the user for further details as to why their home directory has been left so open. For added security, use an account-creation schema which sets users' home directories to not be readable by groups or others by default. Instruct users not to leave logged-in terminals unattended. The xlock command is available to secure an unattended workstation but its use for long periods may be regarded as inconsiderate by other users who are not able to use the terminal, leading to the temptation of rebooting the machine, perhaps causing the logged-in user to lose data. Only vendor-supplied software should be fully trusted. Commercial 3rd-party software should be ok as long as one has confidence in the supplier, but shareware or freeware software must be treated with care, especially if such software is in the form of precompiled ready-to-run binaries (precompiled non-vendor software might contain malicious code). Software distributed in source code form is safer, but caution is still required, especially if executables have to be owned by root and installed using the set-UID feature in order to run. Set-UID and set-GID programs have legitimate uses, but because they are potentially harmful, their presence on a system should be minimised. The find command can be used to locate such files, while older file system types (eg. EFS) can be searched with commands such as ncheck. Network hardware can be physically tapped to eavesdrop on network traffic. If security must be particularly tight, keep important network hardware secure (eg. locked cupboard) and regularly check other network items (cables, etc.) for any sign of attack. Consider using specially secure areas for certain hardware items, and make it easy to examine cabling if possible (keep an up-todate printed map to aid checks). Fibre-optic cables are harder to interfere with, eg. FDDI. Consider using video surveillance technologies in such situations. Espionage and sabotage are issues which some admins may have to be aware of, especially where commercially sensitive or government/police-related work data is being manipulated. Simple example: could someone see a monitor screen through a window using a telescope? What about RF radiation? Remote scanners can pickup stray monitor emissions, so consider appropriate RF shielding (Faraday Cage). What about insecure phone lines? Could someone, even an ordinary user, attach a modem to a system and dial out, or allow someone else to dial in? Keep up-to-date with security issues; monitor security-related sites such as www.rootshell.com, UKERNA, JANET, CERT, etc. [7]. Follow any extra advice given in vendor-specific security FAQ files (usually posted to relevant 'announce' or 'misc' newsgroups, eg. comp.sys.sgi.misc). Most UNIX vendors also have an anonymous ftp site from which customers can obtain security patches and other related information. Consider joining any specialised mailing lists that may be available.

 

If necessary tasks are beyond one's experience and capabilities, consider employing a vendorrecommended external security consultancy team. Exploit any special features of the UNIX system being used, eg. at night, an Indy's digital camera could be used to send single frames twice a second across the network to a remote system for subsequent compression, time-stamping and recording. NB: this is a real example which SGI once helped a customer to do in order to catch some memory thieves. Figure 61. Aspects of a system relevant to security.

Since basic security on UNIX systems relies primarily on login accounts, passwords, file ownership and file permissions, proper administration and adequate education of users is normally sufficient to provide adequate security for most sites. Lapses in security are usually caused by human error, or improper use of system security features. Extra security actions such as commercial security-related software are not worth considering if even basic features are not used or are compromised via incompetence. An admin can alter the way in which failed login attempts are dealt with by configuring the /etc/default/login file. There are many possibilities and options - see the 'login' reference page for details (man login). For example, an effective way to enhance security is to make repeated guessing of account passwords an increasingly slow process by penalising further login attempts with ever increasing delays between login failures. Note that GUI-based login systems may not support features such as this, though one can always deactivate them via an appropriate chkconfig command. Most UNIX vendors offer the use of hardware-level PROM passwords to provide an extra level of security, ie. a password is required from a users who attempts to gain access to any low-level hardware PROM-based 'Command Monitor', giving greater control over who can carry out admin-level actions. While PROM passwords cannot prevent physical theft (eg. someone stealing a disk and accessing its data by installing it as an option drive on another system), they do limit the ability of malicious users to boot a system using their own program or device (a common flaw with Mac systems), or otherwise harm the system at its lowest level. If the PROM password has been forgotten, the root user can reset it. If both are lost, then one will usually have to resort to setting a special jumper on the system motherboard, or temporarily removing the PROM chip altogether (the loss of power to the chip resets the password).

Shadow Passwords If the /etc/passwd file can be read by users, then there is scope for users to take a copy away to be brute-force tested with password-cracking software. The solution is to use a shadow password file called /etc/shadow - this is a copy of the ordinary password file (/etc/passwd) which cannot be accessed by non-root users. When in use, the password fields in /etc/passwd are replaced with an 'x'. All the usual password-related programs work in the same way as before, though shadow passwords are dealt with in a different way for systems using NIS (this is because NIS keeps all password data for ordinary users in a different file called /etc/passwd.nis). Users won't notice any

difference when shadow passwords are in use, except that they won't be able to see the encrypted form of their password anymore. The use of shadow passwords is activated simply by running the 'pwconv' program (see the man page for details). Shadow passwords are in effect as soon as this command has been executed.

Password Ageing. An admin can force passwords to age automatically, ensuring that users must set a new password at desired intervals, or no earlier than a certain interval, or even immediately. The passwd command is used to control the various available options. Note that NIS does not support password ageing.

Choosing Passwords. Words from the dictionary should not be used, nor should obvious items such as film characters and titles, names of relatives, car number plates, etc. Passwords should include obscure characters, digits and punctuation marks. Consider using and mixing words from other languages, eg. Finnish, Russian, etc. An admin should not use the same root password for more than one system, unless there is good reason. When a new account is created, a password should be set there and then. If the user is not immediately present, a default password such as 'password' might be used in the expectation that the user will login in immediately and change it to something more suitable. An admin should lockout the account if the password isn't changed after some duration: replace the password entry for the user concerned in the /etc/passwd file with anything that contains at least one character that is not used by the encryption schema, eg. '*'. Modern UNIX systems often include a minimum password length and may insist on certain rules about what a password can be, eg. at least one digit.

Network Security. As with other areas of security, GUI tools may be available for controlling network-related security issues, especially those concerning the Internet. Since GUI tools may vary between different UNIX OSs, this discussion deals mainly with the command line tools and related files. Reminder: there is little point in tightening network security if local security has not yet been dealt with, or is lax. Apart from the /etc/passwd file, the other important files which control network behaviour are:

/etc/hosts.equiv

A list of trusted hosts.

.rhosts

A list of hosts that are allowed access to a specific user account.

Figure 62. Files relevant to network behaviour.

These three files determine whether a host will accept an access request from programs such as rlogin, rcp, rsh, or rdist. Both hosts.equiv and .rhosts have reference pages (use 'man hosts.equiv' and 'man rhosts'). Suppose a user on host A attempts to access a remote host B. As long as the hosts.equiv file on B contains the host name of A, and B's /etc/passwd lists A's user ID as a valid account, then no further checks occur and the access is granted (all successful logins are recorded in /var/adm/SYSLOG). The hosts.equiv file used by the Ve24 Indys contains the following: localhost yoda.comp.uclan.ac.uk akira.comp.uclan.ac.uk ash.comp.uclan.ac.uk cameron.comp.uclan.ac.uk chan.comp.uclan.ac.uk conan.comp.uclan.ac.uk gibson.comp.uclan.ac.uk indiana.comp.uclan.ac.uk leon.comp.uclan.ac.uk merlin.comp.uclan.ac.uk nikita.comp.uclan.ac.uk ridley.comp.uclan.ac.uk sevrin.comp.uclan.ac.uk solo.comp.uclan.ac.uk spock.comp.uclan.ac.uk stanley.comp.uclan.ac.uk warlock.comp.uclan.ac.uk wolfen.comp.uclan.ac.uk woo.comp.uclan.ac.uk milamber.comp.uclan.ac.uk

Figure 63. hosts.equiv files used by Ve24 Indys.

Thus, once logged into one of the Indys, a user can rlogin directly to any of the other Indys without having to enter their password again, and can execute rsh commands, etc. A staff member logged into Yoda can login into any of the Ve24 Indys too (students cannot do this). The hosts.equiv files on Yoda and Milamber are completely different, containing only references to each other as needed. Yoda's hosts.equiv file contains: localhost milamber.comp.uclan.ac.uk

Figure 64. hosts.equiv file for yoda.

Thus, Yoda trusts Milamber. However, Milamber's hosts.equiv only contains: localhost

Figure 65. hosts.equiv file for milamber.

ie. Milamber doesn't trust Yoda, the rationale being that even if Yoda's root security is compromised, logging in to Milamber as root is blocked. Hence, even if a hack attack damaged the server and Ve24 clients, I would still have at least one fully functional secure machine with which to tackle the problem upon its discovery. Users can extend the functionality of hosts.equiv by using a .rhosts file in their home directory, enabling or disabling access based on host names, group names and specific user account names. The root login only uses the /.rhosts file if one is present - /etc/hosts.equiv is ignored. NOTE: an entry for root in /.rhosts on a local system allows root users on a remote system to gain local root access. Thus, including the root name in /.rhosts is unwise. Instead, file transfers can be more securely dealt with using ftp via a guest account, or through an NFS-mounted directory. An admin should be very selective as to the entries included in root's .rhosts file. A user's .rhosts file must be owned by either the user or root. If it is owned by anyone else, or if the file permissions are such that it is writeable by someone else, then the system ignores the contents of the user's .rhosts file by default. An admin may decide it's better to bar the use of .rhosts files completely, perhaps because an external network of unknown security status is connected. The .rhosts files can be barred by adding a -l option to the rshd line in /etc/inetd.conf (use 'man rshd' for further details). Thus, the relationship between the 20 different machines which form the SGI network I run is as follows:   

All the Indys in Ve24 trust each other, as well as Yoda and Milamber. Yoda only trusts Milamber. Milamber doesn't trust any system.

With respect to choosing root passwords, I decided to use the following configuration:  

All Ve24 systems have the same root password and the same PROM password. Yoda and Milamber have their own separate passwords, distinct from all others.

This design has two deliberate consequences: 

Ordinary users have flexible access between the Indys in Ve24,



If the root account of any of the Ve24 Indys is compromised, the unauthorised user will not be able to gain access to Yoda or Milamber as root. However, the use of NFS compromises such a schema since, for example, a root user on a Ve24 Indy could easily alter any files in /home, /var/mail, /usr/share and /mapleson.

With respect to the use of identical root and PROM passwords on the Ve24 machines: because Internet access (via a proxy server) has recently been setup for users, I will probably change the schema in order to hinder brute force attacks.

The /etc/passwd File and NIS. The NIS service enables users to login to a client by including the following entry as the last line in the client's /etc/passwd file: +::0:0:::

Figure 66. Additional line in /etc/passwd enabling NIS.

For simplicity, a + on its own can be used. I prefer to use the longer version so that if I want to make changes, the fields to change are immediately visible. If a user logs on with an account ID which is not listed in the /etc/passwd file as a local account, then such an entry at the end of the file instructs the system to try and get the account information from the NIS server, ie. Yoda. Since Yoda and Milamber do not include this extra line in /etc/passwd, students cannot login to them with their own ID anyway, no matter the contents of .rhosts and hosts.equiv.

inetd and inetd.conf inetd is the 'Internet Super-server'. inetd listens for requests for network services, executing the appropriate program for each request. inetd is started on bootup by the /etc/init.d/network script (called by the /etc/rc2.d/S30network link via the init process). It reads its configuration information from /etc/inetd.conf. By using a super-daemon in this way, a single daemon is able to invoke other daemons when necessary, reducing system load and using resources such as memory more efficiently. The /etc/inetd.conf file controls how various network services are configured, eg. logging options, debugging modes, service restrictions, the use of the bootp protocol for remote OS installation, etc. An admin can control services and logging behaviour by customising this file. A reference page is available with complete information ('man inetd').

Services communicate using 'port' numbers, rather like separate channels on a CB radio. Blocking the use of certain port numbers is a simple way of preventing a particular service from being used. Network/Internet services and their associated port numbers are contained in the /etc/services database. An admin can use the 'fuser' command to identify which processes are currently using a particular port, eg. to see the current use of TCP port 25: fuser 25/tcp

On Yoda, an output similar to the following would be given: yoda # fuser 25/tcp 25/tcp: 855o yoda # ps -ef | grep 855 | grep -v grep root 855 1 0 Apr 27 ? 5:01 /usr/lib/sendmail -bd -q15m

Figure 67. Typical output from fuser.

Insert (a quick example of typical information hunting): an admin wants to do the same on the ftp port, but can't remember the port number. Solution: use grep to find the port number from /etc/services: yoda 25# grep ftp /etc/services ftp-data 20/tcp ftp 21/tcp tftp 69/udp sftp 115/tcp yoda 26# fuser 21/tcp 21/tcp: 255o yoda 28# ps -ef | grep 255 | grep -v grep root 255 1 0 Apr 27 ? 0:04 /usr/etc/inetd senslm 857 255 0 Apr 27 ? 11:44 fam root 11582 255 1 09:49:57 pts/1 0:01 rlogind

An important aspect of the inetd.conf file is the user name field which determines which user ID each process runs under. Changing this field to a less privileged ID (eg. nobody) enables system service processes to be given lower access permissions than root, which may be useful for further enhancing security. Notice that services such as http (the WWW) are normally already set to run as nobody. Proxy servers should also run as nobody, otherwise http requests may be able to retrieve files such as /etc/passwd (however, some systems may have the nobody user defined so that it cannot run programs, so another user may have to be used - an admin can make one up). Another common modification made to inetd.conf in order to improve security is to restrict the use of the finger command, eg. with -S to prevent login status, home directory and shell information from being given out. Or more commonly the -f option is used which forces any finger request to just return the contents of a file, eg. yoda's entry for the finger service looks like this: finger stream tcp nowait guest /usr/etc/fingerd fingerd -f /etc/fingerd.message

Figure 68. Blocking the use of finger in the /etc/inetd.conf file.

Thus, any remote user who executes a finger request to yoda is given a brief message [3]. If changes are made to the inetd.conf file, then inetd must be notified of the changes, either by rebooting the system or via the following command (which doesn't require a reboot afterwards): killall -HUP inetd

Figure 69. Instructing inetd to restart itself (using killall).

In general, a local trusted network is less likely to require a highly restricted set of services, ie. modifying inetd.conf becomes more important when connecting to external networks, especially the Internet. Thus, an admin should be aware that creating a very secure inetd.conf file on an isolated network or Intranet may be unduly harsh on ordinary users.

X11 Windows Network Access The X Windows system is a window system available for a wide variety of different computer platforms which use bitmap displays [8]. Its development is managed by the X Consortium, Inc. On SGI IRIX systems, the X Windows server daemon is called 'Xsgi' and conforms to Release 6 of the X11 standard (X11R6). The X server, Xsgi, manages the flow of user/application input and output requests to/from client programs using a number of interprocess communication links. The xdm daemon acts as the display manager. Usually, user programs are running on the same host as the X server, but X Windows also supports the display of client programs which are actually running on remote hosts, even systems using completely different OSs and hardware platforms, ie. X is networktransparent. The X man page says: "X supports overlapping hierarchical subwindows and text and graphics operations, on both monochrome and color displays."

One unique side effect of this is that access to application mouse menus is independent of application focus, requiring only a single mouse click for such actions. For example, suppose two application windows are visible on screen:  

a jot editor session containing an unsaved file (eg. /etc/passwd.nis), a shell window which is partially obscuring the jot window.

With the shell window selected, the admin is about to run /var/yp/ypmake to reparse the password database file, but realises the file isn't saved. Moving the mouse over the partially hidden jot window, the admin holds down the right mouse button: this brings up jot's right-button menu (which may or may not be partly ontop of the shell window even though the jot window is at the back) from which the

admin clicks on 'Save'; the menu disappears, the file is saved, but the shell window is still on top of the jot window, ie. their relative front/back positions haven't changed during the operation.

The ability of X to process screen events independently of which application window is currently in focus is a surprisingly useful time-saving feature. Every time a user does an action like this, at least one extraneous mouse click is prevented; this can be shown by comparing to MS Windows interfaces: 



Under Win95 and Win98, trying to access an application's right-button menu when the application's window is currently not in focus requires at least two extraneous mouse clicks: the first click brings the application in focus (ie. to the front), the second brings up the menu, and a third (perhaps more if the original application window is now completely hidden) brings the original application window back to the front and in focus. Thus, X is at least 66% more efficient for carrying out this action compared to Win95/Win98. Under WindowsNT, attempting the same action requires at least one extraneous mouse click: the first click brings the application in focus and reveals the menu, and a second (perhaps more, etc.) brings the original application window back to the front and in focus. Thus, X is at least 50% more efficient for carrying out this action compared to NT.

The same effect can be seen when accessing middle-mouse menus or actions under X, eg. text can be highlighted and pasted to an application with the middle-mouse button even when that application is not in focus and not at the front. This is a classic example of how much more advanced X is over Microsoft's GUI interface technologies, even though X is now quite old. X also works in a way which links to graphics libraries such as OpenGL.

Note that most UNIX-based hardware platforms use video frame buffer configurations which allow a large number of windows to be present without causing colour map swapping or other side effects, ie. the ability to have multiple overlapping windows is a feature supported in hardware, eg. Indigo2 [6]. X is a widely used system, with emulators available for systems which don't normally use X, eg. Windows Exceed for PCs. Under the X Window System, users can run programs transparently on remote hosts that are part of the local network, and can even run applications on remote hosts across the Internet with the windows displayed locally if all the various necessary access permissions have been correctly set at both ends. An 'X Display Variable' is used to denote which host the application should attempt to display its windows on. Thus, assuming a connection with a remote host to which one had authorised telnet access (eg. haarlem.vuurwerk.nl), from a local host whose domain name is properly visible on the Internet (eg. thunder.uclan.ac.uk), then the local display of applications running on the remote host is enabled with a command such as: haarlem% set DISPLAY = thunder.uclan.ac.uk:0.0

I've successfully used this method while at Heriot Watt to run an xedit editor on a remote system in England but with the xedit window itself displayed on the monitor attached to the system I was physically using in Scotland. The kind of inter-system access made possible by X has nothing to do with login accouns, passwords, etc. and is instead controlled via the X protocols. The 'X' man page has full details, but note: the man page for X is quite large. A user can utilise the xhost command to control access to their X display. eg. 'xhost -' bars access from all users, while 'xhost +harry' gives X access to the user harry. Note that system-level commands and files which relate to xhost and X in general are stored in /var/X11/xdm.

Firewalls [4]. A firewall is a means by which a local network of trusted hosts can be connected to an external untrusted network, such as the Internet, in a more secure manner than would otherwise be the case. 'Firewall' is a conceptual idea which refers to a combination of hardware and software steps taken to setup a desired level of security; although an admin can setup a firewall via basic steps with as-supplied tools, all modern systems have commercial packages available to aid in the task of setting up a firewall environment, eg. Gauntlet for IRIX systems. As with other security measures, there is a tradeoff between ease of monitoring/administration, the degree of security required, and the wishes/needs of users. A drawback of firewalls is when a user has a legitimate need to access packets which are filtered out - an alternative is to have each host on the local network configured according to a strict security regime. The simplest form of a firewall is a host with more than one network interface, called a dualhomed host [9]. Such hosts effectively exist on two networks at once. By configuring such a host in an appropriate manner, it acts as a controllable obstruction between the local and external network, eg. the Internet.

A firewall does not affect the communications between hosts on an internal network; only the way in which the internal network interacts with the external connection is affected. Also, the presence of a firewall should not be used as an excuse for having less restrictive security measures on the internal network. One might at first think that Yoda could be described as a firewall, but it is not, for a variety of reasons. Ideally, a firewall host should be treated thus:  

   

no ordinary user accounts (root admin only, with a different password), as few services as possible (the more services are permitted, the greater is the chance of a security hole; newer, less-tested software is more likely to be at risk) and definitely no NIS or NFS, constantly monitored for access attempts and unusual changes in files, directories and software (commands: w, ps, 'versions changed', etc.), log files regularly checked (and not stored on the firewall host!), no unnecessary applications, no anonymous ftp!

Yoda breaks several of these guidelines, so it cannot be regarded as a firewall, even though a range of significant security measures are in place. Ideally, an extra host should be used, eg. an Indy (additional Ethernet card required to provide the second Ethernet port), or a further server such as Challenge S. A simple system like Indy is sufficient though, or other UNIX system such as an HP, Sun, Dec, etc. - a Linux PC should not be used though since Linux has too many security holes in its present form. [1]

Services can be restricted by making changes to files such as /etc/inetd.conf, /etc/services, and others. Monitoring can be aided via the use of free security-related packages such as COPS - this package can also check for bad file permission settings, poorly chosen passwords, system setup file integrity, root security settings, and many other things. COPS can be downloaded from:

ftp://ftp.cert.org/pub/tools/cops

Monitoring a firewall host is also a prime candidate for using scripts to automate the monitoring process. Other free tools include Tripwire, a file and directory integrity checker: ftp://ftp.cert.org/pub/tools/tripwire

With Tripwire, files are monitored and compared to information stored in a database. If files change when they're supposed to remain static according to the database, the differences are logged and flagged for attention. If used regularly, eg. via cron, action can be taken immediately if something happens such as a hacking attempt.

Firewall environments often include a router - a high speed packet filtering machine installed either privately or by the ISP providing the external connection. Usually, a router is installed inbetween a dual-homed host and the outside world [9]. This is how yoda is connected, via a router whose address is 193.61.250.33, then through a second router at 193.61.250.65 before finally reaching the JANET gateway at Manchester.

Routers are not very flexible (eg. no support for application-level access restriction systems such as proxy servers), but their packet-filtering abilities do provide a degree of security, eg. the router at 193.61.250.33 only accepts packets on the 193.61.250.* address space. However, because routers can block packet types, ports, etc. it is possible to be overly restrictive with their use, eg. yoda cannot receive USENET packets because they're blocked by the router. In such a scenario, users must resort to using WWW-based news services (eg. DejaNews) which

are obviously less secure than running and managing a locally controlled USENET server, as well as being more wasteful of network resources. Accessing sites on the web poses similar security problems to downloading and using Internetsourced software, ie. the source is untrusted, unless vendor-verified with checksums, etc. When a user accesses a site and attempts to retrieves data, what happens next cannot be predicted, eg. a malicious executable program could be downloaded (this is unlikely to damage root-owned files, but users could lose data if they're not careful). Users should be educated on these issues, eg. turning off Java script features and disallowing cookies if necessary. If web access is of particular concern with regard to security, one solution is to restrict web access to just a limited number of internal hosts.

Anonymous ftp. An anonymous FTP account allows a site to make information available to anyone, while still maintaining control over access issues. Users can login to an anonymous FTP account as 'anonymous' or 'ftp'. The 'chroot' command is used to put the user in the home directory for anonymous ftp access (~ftp), preventing access to other parts of the filesystem. A firewall host should definitely not have an anonymous FTP account. A site should not provide such a service unless absolutely necessary, but if it does then an understanding of how the anonymous FTP access system works is essential to ensuring site security, eg. preventing outside agents from using the site as a transfer point for pirated software. How an anon FTP account is used should be regularly monitored. Details of how to setup an anon FTP account can usually be found in a vendor's online information; for IRIX, the relevant source is the section entitled, "Setting Up an Anonymous FTP Account" in chapter three of the, "IRIX Admin: Networking and Mail" guide.

UNIX Fundamentals: Internet access: files and services. Email. For most users, the Internet means the World Wide Web ('http' service), but this is just one service out of many, and was in fact a very late addition to the Internet as a whole. Before the advent of the web, Internet users were familiar with and used a wide range of services, including:

       

telnet (interactive login sessions on remote hosts), ftp

(file/data transfer using continuous connections),

tftp

(file/data transfer using temporary connections)

NNTP

(Internet newsgroups, ie. USENET)

         

SMTP

(email)

gopher (remote host data searching and retrieval system) archie (another data-retrieval system) finger (probe remote site for user/account information) DNS

(Domain Name Service)

Exactly which services users can use is a decision best made by consultation, though some users may have a genuine need for particular services, eg. many public database systems on sites such as NASA are accessed by telnet only. Disallowing a service automatically improves security, but the main drawback will always be a less flexible system from a user's point of view, ie. a balance must be struck between the need for security and the needs of users. However, such discussions may be irrelevant if existing site policies already state what is permitted, eg. UCLAN's campus network has no USENET service, so users exploit suitable external services such as DejaNews [2]. For the majority of admins, the most important Internet service which should be appropriately configured with respect to security is the web, especially considering today's prevalence of Java, Java Script, and browser cookie files. It is all too easy for a modern web user to give out a surprising amount of information about the system they're using without ever knowing it. Features such as cookies and Java allow a browser to send a substantial amount of information to a remote host about the user's environment (machine type, OS, browser type and version, etc.); there are sites on the web which an admin can use to test how secure a user's browser environment is - the site will display as much information as it can extract using all methods, so if such sites can only report very little or nothing in return, then that is a sign of good security with respect to user-side web issues. There are many good web server software systems available, eg. Apache. Some even come free, or are designed for local Intranet use on each host. However, for enhanced security, a site should use a professional suite of web server software such as Netscape Enterprise Server; these packages come with more advanced control mechanisms and security management features, the configuration of which is controlled by GUI-based front-end servers, eg. Netscape Administration Server. Similarly, lightweight proxy servers are available, but a site should a professional solution, eg. Netscape Proxy Server. The GUI administration of web server software makes it much easier for an admin to configure security issues such as access and service restrictions, permitted data types, blocked sites, logging settings, etc. Example: after the proxy server on the SGI network was installed, I noticed that users of the campus-wide PC network were using Yoda as a proxy server, which would give them a faster service than the University's proxy server. A proxy server which is accessible in this way is said to be 'open'. Since all accesses from the campus PCs appear in the web logs as if they originate from the Novix security system (ie. there is no indication of individual workstation or user), any

illegal activity would be untraceable. Thus, I decided to prevent campus PCs from using Yoda as a proxy. The mechanism employed to achieve this was the ipfilterd program, which I had heard of before but not used. ipfilterd is a network packet-filtering daemon which screens all incoming IP packets based on source/destination IP address, physical network interface, IP protocol number, source/destination TCP/UDP port number, required service type (eg. ftp, telnet, etc.) or a combination of these. Up to 1000 filters can be used. To improve efficiency, a configurable memory caching mechanism is used to retain recently decided filter verdicts for a specified duration. ipfilterd operates by using a searchable database of packet-filtering clauses stored in the /etc/ipfilterd.conf file. Each incoming packet is compared with the filters in the file one at a time until a match is found; if no match occurs, the packet is rejected by default. Since filtering is a line-by-line database search process, the order in which filters are listed is important, eg. a reject clause to exclude a particular source IP address from Ethernet port ec0 would have no effect if an accept clause was earlier in the file that accepted all IP data from ec0, ie. in this case, the reject should be listed before the accept. IP addresses may be specified in hex, dot format (eg. 193.61.255.4 - see the man page for 'inet'), host name or fully-qualified host name. With IRIX 6.2, ipfilterd is not installed by default. After consulting with SGI to identify the appropriate source CD, the software was installed, /etc/ipfilterd.conf defined, and the system activated with: chkconfig -f ipfilterd on reboot

Since there was no ipfilterd on/off flag file in /etc/config by default, the -f forces the creation of such a file with the given state. Filters in the /etc/ipfilterd.conf file consist of a keyword and an expression denoting the type of filter to be used; available keywords are:

       

accept

Accept all packets matching this filter

reject

Discard all packets matching this filter (silently)

grab

Grab all packets matching this filter

define

Define a new macro

ipfilterd supports macros, with no limit to the number of macros used. Yoda's /etc/ipfilterd.conf file looks like this:

# # ipfilterd.conf # $Revision: 1.3 $ # # Configuration file for ipfilterd(1M) IP layer packet filtering. # Lines that begin with # are comments and are ignored. # Lines begin with a keyword, followed either by a macro definition or # by an optional interface filter, which may be followed by a protocol filter. # Both macros and filters use SGI's netsnoop(1M) filter syntax. # # The currently supported keywords are: # accept : accept all packets matching this filter # reject : silently discard packets matching this filter # define : define a new macro to add to the standard netsnoop macros # # See the ipfilterd(1M) man page for examples of filters and macros. # # The network administrator may find the following macros useful: # define ip.netAsrc (src&0xff000000)=$1 define ip.netAdst (dst&0xff000000)=$1 define ip.netBsrc (src&0xffff0000)=$1 define ip.netBdst (dst&0xffff0000)=$1 define ip.netCsrc (src&0xffffff00)=$1 define ip.netCdst (dst&0xffffff00)=$1 define ip.notnetAsrc not((src&0xff000000)=$1) define ip.notnetAdst not((dst&0xff000000)=$1) define ip.notnetBsrc not((src&0xffff0000)=$1) define ip.notnetBdst not((dst&0xffff0000)=$1) define ip.notnetCsrc not((src&0xffffff00)=$1) define ip.notnetCdst not((dst&0xffffff00)=$1) # # Additional macros: # # Filters follow: # accept -i ec0 reject -i ec3 ip.src 193.61.255.21 ip.dst 193.61.250.34 reject -i ec3 ip.src 193.61.255.22 ip.dst 193.61.250.34 accept -i ec3

Any packet coming from an SGI network machine is immediately accepted (traffic on the ec0 network interface). The web logs contained two different source IP addresses for accesses coming from the campus PC network. These are rejected first if detected; a final accept clause is then included so that all other types of packet are accepted. The current contents of Yoda's ipfilterd.conf file does mean that campus PC users will not be able to access Yoda as a web server either, ie. requests to www.comp.uclan.ac.uk by legitimate users will be blocked too. Thus, the above contents of the file are experimental. Further refinement is required so that accesses to Yoda's web pages are accepted, while requests which try to use Yoda as a proxy to access non-UCLAN sites are rejected. This can be done by using the ipfilterd-expression equivalent of the following if/then C-style statement:

if ((source IP is campus PC) and (destination IP is not Yoda)) then reject packet;

Using ipfilterd has system resource implications. Filter verdicts stored in the ipfilterd cache by the kernel take up memory; if the cache size is increased, more memory is used. A longer cache and/or a larger number of filters means a greater processing overhead before each packet is dealt with. Thus, for busy networks, a faster processor may be required to handle the extra load, and perhaps more RAM if an admin increases the ipfilterd kernel cache size. In order to monitor such issues and make decisions about resource implications as a result of using ipfilterd, the daemon can be executed with the -d option which causes extra logging information about each filter to be added to /var/adm/SYSLOG, ie. an /etc/config/ipfilterd.options file should be created, containing '-d'. As well as using programs like 'top' and 'ps' to monitor CPU loading and memory usage, log files should be monitored to ensure they do not become too large, wasting disk space (the same applies to any kind of log file). System logs are 'rotated' automatically to prevent this from happening, but other logs created by 3rd-party software usually are not; such log files are not normally stored in /var/adm either. For example, the proxy server logs are in this directory: /var/netscape/suitespot/proxy-sysname-proxy/logs

If an admin wishes to retain the contents of older system logs such as /var/adm/oSYSLOG, then the log file could be copied to a safe location at regular intervals, eg. once per night (the old log file could then be emptied to save space). A wise policy would be to create scripts which process the logs, summarising the data in a more intuitive form. General shell script methods and programs such as grep can be used for this.

The above is just one example of the typical type of problem and its consequences that admins come up against when managing a system: 

The first problem was how to give SGI network users Internet access, the solution to which was a proxy server. Unfortunately, this allowed campus-PC users to exploit Yoda as an open proxy, so ipfilterd was then employed to prevent such unauthorised use.

Thus, as stated in the introduction, managing system security is an ongoing, dynamic process.

Another example problem: in 1998, I noticed that some students were not using the SGIs (or not asking if they could) because they thought the machines were turned off, ie. the monitor powersaving feature would blank out the screen after some duration. I decided to alter the way the Ve24 Indys behaved so that monitor power-saving would be deactivated during the day, but would still happen overnight. The solution I found was to modify the /var/X11/xdm/Xlogin file. This file contains a section controlling monitor power-saving using the xset command, which normally looks like this:

#if [ -x /usr/bin/X11/xset ] ; then # /usr/bin/X11/xset s 600 3600 #fi

If these lines are uncommented (the hash symbols removed), a system whose monitor supports power-saving will tell the monitor to power down after ten minutes of unuse, after the last user logs out. With the lines still commented out, modern SGI monitors use power-saving by default anyway. I created two new files in /var/X11/xdm: -rwxr-xr-x -rwxr-xr-x

1 root 1 root

sys sys

1358 Oct 28 1361 Oct 28

1998 Xlogin.powersaveoff* 1998 Xlogin.powersaveon*

They are identical except for the the section concerning power-saving. Xlogin.powersaveoff contains: if [ -x /usr/bin/X11/xset ] ; then /usr/bin/X11/xset s 0 0 fi

while Xlogin.powersaveon contains: #if [ -x /usr/bin/X11/xset ] ; then # /usr/bin/X11/xset s 0 0 #fi

The two '0' parameters supplied to xset in the Xlogin.powersaveoff file have a special effect (see the xset man page for full details): the monitor is instructed to disable all power-saving features. The cron system is used to switch between the two files when no one is present: every night at 9pm and every morning at 8am, followed by a reboot after the copy operation is complete. The entries from the file /var/spool/cron/crontabs/cron on any of the Ve24 Indys are thus: # Alternate monitor power-saving. Turn it on at 9pm. Turn it off at 8am. 0 21 * * * /bin/cp /var/X11/xdm/Xlogin.powersaveon /var/X11/xdm/Xlogin && init 6& # 0 8 * * * /bin/cp /var/X11/xdm/Xlogin.powersaveoff /var/X11/xdm/Xlogin && init 6&

Hence, during the day, the SGI monitors are always on with the login logo/prompt visible students can see the Indys are active and available for use; during the night, the monitors turn themselves off due to the new xset settings. The times at which the Xlogin changes are made were chosen so as to occur when other cron jobs would not be running. Students use the Indys each day without ever noticing the change, unless they happen to be around at the right time to see the peculiar sight of 18 Indys all rebooting at once.

Static Routes. A simple way to enable packets from clients to be forwarded through an external connection is via the use of a 'static route'. A file called /etc/init.d/network.local is created with a simple script that adds a routing definition to the current routing database, thus enabling packets to be forwarded to their destination. To ensure the script is executed on bootup or shutdown, extra links are added to the /etc/rc0.d and /etc/rc2.d directories (the following commands need only be executed once as root): ln -s /etc/init.d/network.local /etc/rc0.d/K39network ln -s /etc/init.d/network.local /etc/rc2.d/S31network

Yoda once had a modem link to 'Demon Internet' for Internet access. A static route was used to allow SGI network clients to access the Internet via the link. The contents of /etc/init.d/network.local (supplied by SGI) was: #!/sbin/sh #Tag 0x00000f00 IS_ON=/sbin/chkconfig case "$1" in 'start') if $IS_ON network; then /usr/etc/route add default 193.61.252.1 1 fi ;; 'stop') /usr/etc/route delete default 193.61.252.1 ;; *) echo "usage: $0 {start|stop}" ;; esac

Note the use of chkconfig to ensure that a static route is only installed on bootup if the network is defined as active. The other main files for controlling Internet access are /etc/services and /etc/inetd.conf. These were discussed earlier.

Internet Access Policy. Those sites which choose to allow Internet access will probably want to minimise the degree to which someone outside the site can access internal services. For example, users may be able to telnet to remote hosts from a company workstation, but should the user be able to successfully telnet to that workstation from home in order to continue working? Such an ability would obviously be very useful to users, and indeed administrators, but there are security implications which may be prohibitive.

For example, students who have accounts on the SGI network cannot login to Yoda because the /etc/passwd file contains /dev/null as their default shell, ie. they can't login because their account 'presence' on Yoda itself does not have a valid shell - another cunning use of /dev/null. The /etc/passwd.nis file has the main user account database, so users can logon to the machines in Ve24 as desired. Thus, with the use of /dev/null in the password file's shell field, students cannot login to Yoda via telnet from outside UCLAN. Staff accounts on the SGI network do not have /dev/null in the shell field, so staff can indeed login to Yoda via telnet from a remote host. Ideally, I'd like students to be able to telnet to a Ve24 machine from a remote host, but this is not yet possible for reasons explained in Appendix A (detailed notes for Day 2 Part 1). There are a number of Internet sites which are useful sources of information on Internet issues, some relating to specific areas such as newsgroups. In fact, USENET is an excellent source of information and advice on dealing with system management, partly because of preprepared FAQ files, but also because of the many experts who read and post to the newsgroups. Even if site policy means users can't access USENET, an admin should exploit the service to obtain relevant admin information. A list of some useful reference sites are given in Appendix C.

Example Questions: 1. The positions of the 'accept ec0' and 'reject' lines in /etc/ipfilterd.conf could be swapped around without affecting the filtering logic. So why is the ec0 line listed first? The 'netstat -i' command (executed on Yoda) may be useful here. 2. What would an appropriate ipfilterd.conf filter (or filters) look like which blocked unauthorised use of Yoda as a proxy to connect to an external site but still allowed access to Yoda's own web pages via www.comp.uclan.ac.uk? Hint: the netsnoop command may be useful.

Course summary.

This course has focused on what an admin needs to know in order to run a UNIX system. SGI systems running IRIX 6.2 have been used as an example UNIX platform, with occasional mention of IRIX 6.5 as an example of how OSs evolve. Admins are, of course, ordinary users too, though they often do not use the same set of applications that other users do. Though an admin needs to know things an ordinary user does not, occasionally users should be made aware of certain issues, eg. web browser cookie files, choosing appropriate passwords etc. Like any modern OS, UNIX has a vast range of features and services. This course has not by any means covered them all (that would be impossible to do in just three days, or even thirty). Instead, the basic things a typical admin needs to know have been introduced, especially the

techniques used to find information when needed, and how to exploit the useful features of UNIX for daily administration. Whatever flavour of UNIX an admin has to manage, a great many issues are always the same, eg. security, Internet concepts, etc. Thus, an admin should consider purchasing relevant reference books to aid in the learning process. When writing shell scripts, knowledge of the C programming language is useful; since UNIX is the OS being used, a C programming book (mentioned earlier) which any admin will find particularly useful is: "C Programming in a UNIX Environment" Judy Kay & Bob Kummerfeld, Addison Wesley Publishing, 1989. ISBN: 0 201 12912 4

For further information on UNIX or related issues, read/post to relevant newsgroups using DejaNews; example newsgroups are given in Appendix D.

Background Notes: 1. UNIX OSs like IRIX can be purchased in a form that passes the US Department of Defence's Trusted-B1 security regulations (eg. 'Trusted IRIX'), whereas Linux doesn't come anywhere near such rigorous security standards as yet. The only UNIX OS (and in fact the only OS of any kind) which passes all of the US DoD's toughest security regulations is Unicos, made by Cray Research (a subsidiary of SGI). Unicos and IRIX will be merged sometime in the future, creating the first widely available commercial UNIX OS that is extremely secure - essential for fields such as banking, local and national government, military, police (and other emergency/crime services), health, research, telecoms, etc.

References: 2. DejaNews USENET Newsgroups, Reading/Posting service: http://www.dejanews.com/

4. "Firewalls: Where there's smoke...", Network Week, Vol4, No. 12, 2nd December 1998, pp. 33 to 37. 5. Gauntlet 3.2 for IRIX Internet Firewall Software: http://www.sgi.com/solutions/internet/products/gauntlet/

6. Framebuffer and Clipping Planes, Indigo2 Technical Report, SGI, 1994:

http://www.futuretech.vuurwerk.nl/i2sec4.html#4.3 http://www.futuretech.vuurwerk.nl/i2sec5.html#5.6.3

7. Useful security-related web sites: UKERNA: JANET: CERT: RootShell: 2600:

http://www.ukerna.ac.uk/ http://www.ja.net/ http://www.cert.org/ http://www.rootshell.com/ http://www.2600.com/mindex.html

8. "About the X Window System", part of X11.org: http://www.X11.org/wm/index.shtml

9. Images are from the online book, "IRIX Admin: Backup, Security, and Accounting.", Chapter 5.

Appendix B: 3. Contents of /etc/fingerd.message: Sorry, the finger service is not available from this host. However, thankyou for your interest in the Department of Computing at the University of Central Lancashire. For more information, please see: http://www.uclan.ac.uk/ http://www.uclan.ac.uk/facs/destech/compute/comphom.htm Or contact Ian Mapleson at [email protected] Regards, Ian. Senior Technician, Department of Computing, University of Central Lancashire, Preston, England, PR1 2HE. [email protected] Tel: (+44 -0) 1772 893297 Fax: (+44 -0) 1772 892913 Doom Help Service (DHS):

http://doomgate.gamers.org/dhs/

SGI/Future Technology/N64: http://sgi.webguide.nl/ BSc Dissertation (Doom): http://doomgate.gamers.org/dhs/diss/

Appendix C: Example web sites useful to administrators: AltaVista: http://altavista.digital.com/cgibin/query?pg=aq Webcrawler: http://webcrawler.com/ Lycos: http://www.lycos.com/ Yahoo: http://www.yahoo.com/ DejaNews: http://www.dejanews.com/ SGI Support: http://www.sgi.com/support/ SGI Tech/Advice Center: http://www.futuretech.vuurwerk.nl/sgi.html X Windows: http://www.x11.org/ Linux Home Page: http://www.linux.org/ UNIXHelp for Users: http://unixhelp.ed.ac.uk/ Hacker Security Update: http://www.securityupdate.com/ UnixVsNT: http://www.unix-vs-nt.org/ RootShell: http://www.rootshell.com/ UNIX System Admin (SunOS): http://sunos-wks.acs.ohiostate.edu/sysadm_course/html/sysadm-1.html

Appendix D: Example newsgroups useful to administrators: comp.security.unix comp.unix.admin comp.sys.sgi.admin comp.unix.admin comp.sys.sun.admin comp.sys.next.sysadmin comp.unix.aix comp.unix.cray comp.unix.misc comp.unix.questions comp.unix.shell comp.unix.solaris comp.unix.ultrix comp.unix.wizards comp.unix.xenix.misc comp.sources.unix comp.unix.bsd.misc comp.unix.sco.misc comp.unix.unixware.misc comp.sys.hp.hpux comp.unix.sys5.misc comp.infosystems.www.misc

Detailed Notes for Day 3 (Part 5) Project: Indy/Indy attack/defense (IRIX 5.3 vs. IRIX 6.5) The aim of this practical session, which lasts two hours, is to give some experience of how an admin typically uses a UNIX system to investigate a problem, locate information, construct and finally implement a solution. The example problem used will likely require:    

the use of online information (man pages, online books, release notes, etc.), writing scripts and exploiting shell script methods as desired, the use of a wide variety of UNIX commands, identifying and exploiting important files/directories,

and so on. A time limit on the task is included to provide some pressure, which often happens in real-world situations.

The problem situation is a simulated hacker attack/defense. Two SGI Indys are directly connected together with an Ethernet cable; one Indy, referred to here as Indy X, is using an older version of IRIX called IRIX 5.3 (1995), while the other (Indy Y) is using a much newer version, namely IRIX 6.5 (1998). Students will be split into two groups (A and B) of 3 or 4 persons each. For the first hour, group A is placed with Indy X, while group B is with Indy Y. For the second hour, the situation is reversed. Essentially, each group must try to hack the other group's system, locate and steal some key information (described below), and finally cripple the enemy machine. However, since both groups are doing this, each group must also defend against attack. Whether a group focuses on attack or defense, or a mixture of both, is for the group's members to decide during the preparatory stage. The first hour is is dealt with as follows: 



For the first 35 minutes, each group uses the online information and any available notes to form a plan of action. During this time, the Ethernet cable between the Indys X and Y is not connected, and separate 'Research' Indys are used for this investigative stage in order to prevent any kind of preparatory measures. Printers will be available if printouts are desired. After a short break of 5 minutes to prepare/test the connection between the two Indys and move the groups to Indys X and Y, the action begins. Each group must try to hack into the other group's Indy, exploiting any suspected weaknesses, whilst also defending against the other group's attack. In addition, the hidden data must be found, retrieved, and the enemy copy erased. The end goal is to shutdown the enemy system after retrieving the hidden data. How the shutdown is effected is entirely up to the group members.

At the end of the hour, the groups are reversed so that group B will now use an Indy running IRIX 5.3, while group A will use an Indy running IRIX 6.5. The purpose of this second attempt is to demonstrate how an OS evolves and changes over time with respect to security and OS features, especially in terms of default settings, online help, etc.

Indy Specifications. Both systems will have default installations of the respective OS version, with only minor changes to files so that they are aware of each other's existence (/etc/hosts, and so on). All systems will have identical hardware (133MHz R4600PC CPU, 64MB RAM, etc.) except for disk space: Indys with IRIX 6.5 will use 2GB disks, while Indys with IRIX 5.3 will use 549MB disks. Neither system will have any patches installed from any vendor CD updates. The hidden data which must be located and stolen from the enemy machine by each group is the Blender V1.57 animation and rendering archive file for IRIX 6.2: blender1.57_SGI_6.2_iris.tar.gz Size: 1228770 bytes.

For a particular Indy, the file will be placed in an appropriate directory in the file system, the precise location of which will only be made known to the group using that Indy - how an attacking group locates the file is up to the attackers to decide.

It is expected that groups will complete the task ahead of schedule; any spare time will be used for a discussion of relevant issues:    

Reliability of relying on default settings for security, etc. How to detect hacking in progress, especially if an unauthorised person is carrying out actions as root. Whose responsibility is it to ensure security? The admin or the user? If a hacker is 'caught', what kind of evidence would be required to secure a conviction? How reliable is the evidence?

END OF COURSE.

Figure Index for Detailed Notes. Day 1: Figure 1. A typical root directory shown by 'ls'. Figure 2. The root directory shown by 'ls -F /'. Figure 3. Important directories visible in the root directory. Figure 4. Key files for the novice administrator. Figure 5. Output from 'man -f file'. Figure 6. Hidden files shown with 'ls -a /'. Figure 7. Manipulating an NFS-mounted file system with 'mount'. Figure 8. The various available shells. Figure 9. The commands used most often by any user. Figure 10. Editor commands. Figure 11. The next most commonly used commands. Figure 12. File system manipulation commands. Figure 13. System Information and Process Management Commands. Figure 14. Software Management Commands. Figure 15. Application Development Commands. Figure 16. Online Information Commands (all available from the 'Toolchest') Figure 17. Remote Access Commands. Figure 18. Using chown to change both user ID and group ID. Figure 19. Handing over file ownership using chown.

Day 2: Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46.

IP Address Classes: bit field and width allocations. IP Address Classes: supported network types and sizes. The contents of the /etc/hosts file used on the SGI network. Yoda's /etc/named.boot file. The example named.boot file in /var/named/Examples. A typical find command. Using cat to quickly create a simple shell script. Using echo to create a simple one-line shell script. An echo sequence without quote marks. The command fails due to * being treated as a Using a backslash to avoid confusing the shell. Using find with the -exec option to execute rm. Using find with the -exec option to execute ls. Redirecting the output from find to a file. A simple script with two lines. The simple rebootlab script. The simple remountmapleson script. The daily tasks of an admin. Using df without options. The -k option with df to show data in K. Using df to report usage for the file Using du to report usage for several directories/files. Restricting du to a single directory. Forcing du to ignore symbolic links. Typical output from the ps command. Filtering ps output with grep. top shows a continuously updated output.

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

47. 48. 49. 50. 51. 51. 52. 53. 54. 55. 56. 57. 58. 59.

The IRIX 6.5 version of top, giving extra information. System information from osview. CPU information from osview. Memory information from osview. Network information from osview. Miscellaneous information from osview. Results from ttcp between two hosts on a 10Mbit network. The output from netstat. Example use of the ping command. The output from rup. The output from uptime. The output from w showing current user activity. Obtaining full domain addresses from w with the -W option. The output from rusers, showing who is logged on where.

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

60. 61. 62. 63. 64. 65. 66. 67. 68. 69.

Standard UNIX security features. Aspects of a system relevant to security. Files relevant to network behaviour. hosts.equiv files used by Ve24 Indys. hosts.equiv file for yoda. hosts.equiv file for milamber. Additional line in /etc/passwd enabling NIS. Typical output from fuser. Blocking the use of finger in the /etc/inetd.conf file. Instructing inetd to restart itself (using killall).

Day 3:

Unix Administration

Short Description

Description

Comments

We need your help!