December 19, 2016 | Author: Rajesh Ganta | Category: N/A
UNIX SHELL SCRIPTING
Perhaps the most important achievement of UNIX is to demonstrate that a powerful operating system for interactive use need not be expensive either in equipment or in human effort: UNIX can run on hardware costing as little as $40,000, and less than two man-years were spent on the main system software. The UNIX Time-Sharing System (1974) Dennis M. Ritchie and Ken Thompson
1
Tabel of Contents
Module 1.
Introduction to Operating System
01
Module 2
Exploring the UNIX Shell
20
Module 3
Processes
41
Module 4
A Shell Script
45
Module 5
A Overview
49
Module 6
The vi Editor
54
Module 7
The variable
57
Module 8
Parameters
68
Module 9
Regular Expressions
71
Module 10
A Sample Shell Script
80
Module 11
Useful Utilities of Shell
93
Module 12
Arithmetic on Shell
103
Module 13
Functions
104
Module 14
Sed and AWK
107
Module 15
Database Using Shell Script
117
Module 16
Overview of Perl
123
Exercise for Lab Experience Appendix
List of UNIX Commands
126
2
Module 1 Introduction to Operating System: In simple terms, an operating system is a manager. It manages all the available resources on a computer. These resources can be the hard disk, a printer, or the monitor screen. Even memory is a resource that needs to be managed. Within an operating system are the management functions that determine who gets to read data from the hard disk, what file is going to be printed next, what characters appear on the screen, and how much memory a certain program gets. Note: An operating system (OS) is a collection of system programs that together control the operation of a computer system. Operating systems may be classified by both how many tasks they can perform `simultaneously' and by how many users can be using the system `simultaneously'. That is: single-user or multi-user and single-task or multitasking. A multi-user system must clearly be multi-tasking. Single User Operating System MS D O S /P C D O S w a s d e sig n e d sp e cific a ly to su it a sin g le u s e r‘s requirements. The user can run only one program at a time. At any instance of time there is only one process going on in the CPU Multi User Operating System
Here the system is such that many users can work at a time. There is one large CPU and high capacity storage medium enclosed into what is called as the system unit and different terminals are attached to it. E a ch u se r w o rks o n a se p a ra te te rm in a l a n d u tiliz e s th e C P U ‘s resources. Each users p ro g ra m a n d o th e r file s a re sto re d in th e syste m u n its‘ storage media. Thus the CPU is one and many users are using it. Therefore there is a need of such an OS that will effectively divide the resources of the CPU among all users. Such an OS is called a multi user OS. 3
Features of Multi User OS 1. Multi Processing
As many users are working at a time, every user will run their own program. W hen one program is run by a user it is a process. When the same program is run by another user it is another process. If there are different users running different programs there are many processes undergoing execution. A u se r sh o u ld n o t w a it u n til o th e r u se rs‘p ro g ra m s fin is h e xe cu tio n.
Same program can share by many users at a time and run that together. This ability of the OS to run several processing together is called multi-processing. 2. Time Sharing
The CPU can execute only one instruction at a time. Since there are several users running their programs the OS divides the CPU time for each user. It allots a definite time interval called time slice w ith in w h ic h th a t u se r‘s p ro g ra m is e xe cu te d . O n ce th e tim e slic e is o v e r th e C P U sw itch e s to th e n e xt u se r a n d e xe c u te s th a t u se r‘s p ro g ra m . A fte r th e tim e slic e o f th e u se r is o ve r th e n e xt u se r‘s program is executed. Thus eve ry u se r‘s p ro g ra m is co n sta n tly b e in g in te rru p te d b y a n o th e r u se r‘s p ro g ra m b u t n o u se r re a liz e s th is b e ca u se th e C P U is very fast. Thus the OS effectively divides the CPU time between several users.
3. Memory Management
A program can run only if it is loaded into the internal memory. So when many users are running their programs that all programs have to be loaded into the memory. So the CPU memory is divided logically such that all users programs get their share of the CPU m e m o ry. A ls o w h e n a u se r‘s p rogram finishes execution it has to be eliminated from the internal memory and that part of the memory sh o u ld b e u tiliz e d fo r sto rin g o th e r u se r‘s p ro g ra m .
4. Multi Tasking
Many users work on a multi user environment each running their own process. Thus there is more than one process executing together. But a user can run more than one process or program for him self if his requirement demands. Such an activity when a number of processes are running for one user is called Multi4
tasking.
PARTS OF Operating System:
Any Operating System consists of two parts. o The Shell and o The Kernel
The Shell: The shell acts as an interface between the user and the machine and effectively interprets every command given by the user and advices the kernel to act accordingly. A single user OS will have only one shell devoted entirely to the user whereas in a multi user OS every user will have a separate shell. Kernel: The Kernel is the part of OS that interacts directly with the hardware of the Computer system.
Why is UNIX Important? During the past 25 years the UNIX Operating System has evolved into a powerful, flexible, and versatile operating system. It serves as the Operating System for all types of computers, including single user personal computers and engineering workstations, multi-user microcomputers, minicomputers, mainframes and supercomputers, as well as special purpose devices, with approximately 20 million computers now running UNIX and more than 100 million people using these systems. This rapid growth is expected to continue. The success of UNIX is due to many factors, including its portability to a wide range of machines, its adaptability and simplicity, the wide range of tasks that it can perform, its multi-user and multi tasking nature, and its suitability for networking, which has become increasingly important as the Internet has blossomed. What follows is a description of the features that have made the UNIX system so popular.
Understanding UNIX: The UNIX operating system was designed to let a number of programmers access the computer at the same time and share its resources. The operating system coordinates the use of the computer's resources, allowing one person, for example, to run a spell check program while another creates a document, lets another edit a document while another creates graphics, and lets another user format a document -- all at the same time, with each user oblivious to the activities of the others. The operating system controls all of the commands from all of the keyboards and all of the data being generated, and permits each user to believe he or she is the only person working on the computer. This real-time sharing of resources makes UNIX one of the most powerful 5
operating systems ever. Although UNIX was developed by programmers for programmers, it provides an environment so powerful and flexible that it is found in businesses, sciences, academia, and industry. Many telecommunications switches and transmission systems also are controlled by administration and maintenance systems based on UNIX. While initially designed for medium-sized minicomputers, the operating system was soon moved to larger, more powerful mainframe computers. As personal computers grew in popularity, versions of UNIX found their way into these boxes, and a number of companies produce UNIX-based machines for the scientific and programming communities.
The uniqueness of UNIX The features that made UNIX a hit from the start are:
Multitasking capability Multi-user capability Portability Cooperative Tools and Utilities Excellent Networking capability Open Source Code
Multitasking Many computers do just one thing at a time, as anyone who uses a PC or laptop can attest. Try logging onto your company's network while opening your browser while opening a word processing program. Chances are the processor will freeze for a few seconds while it sorts out the multiple instructions. UNIX, on the other hand, lets a computer do several things at once, such as printing out one file while the user edits another file. This is a major feature for users, since users don't have to wait for one application to end before starting another one. Multi-user The same design that permits multitasking permits multiple users to use the computer. The computer can take the commands of a number of users -determined by the design of the computer -- to run programs, access files, and print documents at the same time. The computer can't tell the printer to print all the requests at once, but it does prioritize the requests to keep everything orderly. It also lets several users access the same document by compartmentalizing the document so that the changes of one user don't override the changes of another user. Portability A major contribution of the UNIX system was its portability, permitting it to move from one brand of computer to another with a minimum of code 6
changes. At a time when different computer lines of the same vendor didn't talk to each other -- yet alone machines of multiple vendors -- that meant a great savings in both hardware and software upgrades. It also meant that the operating system could be upgraded without having all the customer's data inputted again. And new versions of UNIX were backward compatible with older versions, making it easier for companies to upgrade in an orderly manner.
Cooperative Tools and Utilities UNIX comes with hundreds of programs that are divided into two classes:
Integral utilities that are absolutely necessary for the operation of the computer, such as the command interpreter, and
Tools that aren't necessary for the operation of UNIX but provide the user with additional capabilities, such as typesetting capabilities and email.
Man DC fsck
Mail nroff
vi
Calendar
Fig 1.1 UNIX Tools Tools can be added or removed from a UNIX system, depending upon the applications required.
Excellent Networking Capability: The UNIX system provides an excellent environment for networking. It offers programs and utilities that provide the services needed to build networked applications-the basis for distributed, networked computing. With networked computing, information and processing is shared among different computers in a network. The UNIX system has proved to be useful in client/server computing. The UNIX system also has been the base system for the development of Internet Services. UNIX provides an excellent platform for Web Servers.
7
Open Source Code: UNIX has provision for protecting data and communicating with other users. The source code (Open Source) for the UNIX system has been made available to users and programmers.
History of UNIX: 1965 Bell Laboratories joins with MIT and General Electric in the development effort for the new operating system, Multics, which would provide multi-user, multi-processor, and multi-level (hierarchical) file system, among its many forward-looking features. 1969 AT&T was unhappy with the progress and drops out of the Multics project. Some of the Bell Labs programmers who had worked on this project, Ken Thompson, Dennis Ritchie, Rudd Canaday, and Doug McIlroy designed and implemented the first version of the Unix File System on a PDP7 along with a few utilities. It was given the name UNIX by Brian Kernighan as a pun on Multics. 1971 The system now runs on a PDP-11, with 16Kbytes of memory, including 8Kbytes for user programs and a 512Kbyte disk. Its first real use is as a text processing tool for the patent department at Bell Labs. That utilization justified further research and development by the programming group. UNIX caught on among programmers because it was designed with these features:
Programmers environment Simple user interface Simple utilities that can be combined to perform powerful functions Hierarchical file system Simple interface to devices consistent with file format Multi-user, multi-process system Architecture independent and transparent to the user.
1973 UNIX is re-written using C, a new language developed by Dennis Ritchie. Being written in this high-level language greatly decreased the effort needed to port it to new machines. 1974 Thompson and Ritchie publish a paper in the Communications of the ACM describing the new Unix OS. This generates enthusiasm in the Academic community which sees a potentially great teaching tool for studying programming systems development. Since AT&T is prevented from marketing the product due to the 1956 Consent Decree they license it to Universities for educational purposes and to commercial entities. By 1977, the fifth and sixth editions had been released; these contained many new tools and utilities. The number of machines running the UNIX System, 8
primarily at Bell laboratories and Universities, increased to more than 600 by 1978. The seventh edition, the direct ancestor of the UNIX Operating System available today, was released in 1979. UNIX System III, based on the Seventh edition, became A T & T ‘s first commercial release of the UNUX System in 1982. However, after System III was released, AT&T, through its W estern Electric manufacturing subsidiary, continued to sell versions of the UNIX system. UNIX System III, the various research editions, and experimental versions were distributed to colleagues at universities and other research laboratories.
A UNIX System Timeline The following timeline summarizes the development of UNIX from its beginning Year
UNIX Variant or Standard
Comments
1969
UNICS (later called UNIX)
A new operating system invented by Ken Thompson and Dennis Ritchie for the PDP-7
1973
Fourth Edition
Written in C programming language; widely used inside Bell Laboratories
1975
Sixth Edition
First version widely available outside of Bell Labs; more than 600 machines ran it
1978
3BSD
Virtual memory
1979
Seventh Edition
Included the Bourne shell, UUCP, and C; the direct ancestor or modern UNIX
1980
Xenix
1980
4BSD
1982
System III
Introduced by Microsoft Introduced by UC Berkeley
1983
System V Release 1
First public release outside of Bell Labs
1983
4.1BSD
First supported release
4.2BSD
UC Berkeley release with performance enhancements
1984
9
System V Release 2
UC Berkeley release with many networking capabilities
1986
HP-UX
Protection and locking of files, enhanced system administration, and job control features added
1987
System V Release 3
First version of HP-UX released for HP Precision Architecture
1987
4.3BSD
STREAMS, RFS, TLI added
1988
POSIX
Minor enhancements to 4.2BSD
1989
System V Release 4
POSIX.l published
1990
XPG3
Unified System V, BSD, and Xenix
1990
OSF/1
X/Open specification set
1991
Linux 0.01
Open Software Foundation release designed to compete with SVR4
1992
SVR4.2
Linus Torvalds started development of Linux
1992
HP-UX 9.0
USL developed version of SVR4 for the desktop
1993
Solaris 2.3
1993
4.4BSD
1993
SVR4.2MP
1994
Linux 1.0
1984
Supported workstations including a GUI POSIX compliant Final Berkeley release Last version of UNIX developed by USL
1994
Solaris 2.4
1995
UNIX 95
1995
Solaris 2.5
1995
HP-UX 10.0
First version of Linux not considered a "beta" Motif supported X/Open mark for systems registered under the Single UNIX Specification CDE supported
1996
Linux 2.0
Conformed to the Single UNIX 10
Specification and the Common Desktop Environment (CDE) 1997
Solaris 2.6
1997
Performance improvements and Single UNIX Specification, Ver2 networking software added
1997
System V Release 5 (SVR5) (SCO)
UNIX 95 compliant, JAVA supported Open Group specification set
1997
UnixWare 7
1997
HP-UX 11.0
Enhanced SV kernel, including 64bit support, increased reliability, and performance enhancements
1998
UNIX 98
SCO UNIX based on SVR5 kernel 64-bit operating system
1998
Solaris 7
1999
Linux 2.2
Open Group mark for systems registered under the Single UNIX Specification, Version 2 Support for 64-bit applications, free for noncommercial users Device drivers added
Versions of UNIX Today With most things in life, where there is active competition the best will ultimately survive and triumph. This is the case with several different versions or flavors of UNIX. Although many different versions exist, a common design and/or code base is present in most of them. Also, two major kinds of UNIX operating system software markets exist today. The commercial market is where customers generally have to pay for the operating system software and generally may not get any source code (well, not for free anyway!). The other market is also commercial, but is considered open source. Open source means that you get full access to the source code of the system or programs and can make changes or modifications to that source code as long as you maintain the rights of the original software owner. Today, the UNIX leaders include Solaris, Linux, HP-UX, AIX, and SCO.
11
Why UNIX Is Popular? Many people ask why UNIX is so popular or why it is used so much, in so many different ways and in so many computing environments. The answer lies with the very nature of UNIX and the model that was used to design, build, and continuously improve the operating system. Availability of Source Code One of the most significant points of UNIX is the availability of source code for the system. (For those new to software, source code contains the programming elements that, when passed through a compiler, will produce a binary program— which can be executed.) The binary program contains sp e cific co m p u te r in stru ctio n s, w h ic h te ls th e syste m ―w h a t to d o .‖ W h e n th e source code is available, it means that the system (or any subcomponent) can be modified without consulting the original author ofthe program. Access to the source code is a very positive thing and can result in many benefits. For example, if software defects (bugs) are found within the source code, they can be fixed right away— without perhaps waiting for the author to do so. Another great reason is that new software functions can be integrated into the source code, thereby increasing the usefulness and the overall functionality of th e so ftw a re . H a vin g th e a b ilty to e xte n d th e so ftw a re to th e u s e r‘s requirements is a massive gain for the end user and the software industry as a whole. Over time, the software can become much more useful. One downside to having access to the source code is that it can become hard to manage, because it is possible that many different people could have modified the code in unpredictable (and perhaps negative) ways. However, th is p ro b le m is typ ic a ly a d d re sse d b y h a vin g a ―so u rce co d e m a in ta in e r,‖ which reviews the source code changes before the modifications are incorporated into the original version. Another downside to source code access is that individuals may use this information with the goal in mind of compromising system or component security. The Internet Worm of 1988 is one such popular example. The author, who was a graduate student at Cornell University at the time, was able to exploit known security problems within the UNIX system to launch a software program that gained unauthorized access to systems and was able to replicate itself to many networked computers. The Worm was so successful in attaching and attacking systems that it caused many of the computers to crash due to the amount of resources needed to replicate. Although the Worm d id n ‘t a ctu a ly ca u se sig n ific a n t p e rm a n e n t d a m a g e to th e s yste m s it in fe cte d , it opened the eyes of the UNIX community about the dangers of source code access and security on the Internet as a whole. Flexible Design 12
UNIX was designed to be modular, which makes it a very flexible architecture. The modularity helps provide a framework that makes it much easier to introduce new operating system tools, applications, and utilities, or to help in the migration of the operating system to new computer platforms or other d e vic e s. A lth o u g h so m e m ig h t a rg u m e n t th a t U N IX is n ‘t fle xib le e n o u g h fo r their needs, it is quite adaptable and can handle most requirements. This is evidenced by the fact that UNIX runs on more general computer platforms and devices than any other operating system. GNU The GNU project, started in the early 1980s, was intended to act as a counterbalance to the widespread activity of corporate greed and adoption of lic e n s e a g re e m e n ts fo r co m p u te r so ftw a re . T h e ―GNU is not UNIX‖ p ro je ct w a s re sp o n sib le fo r p ro d u cin g so m e o f th e w o rld ‘s m o st p o p u la r U N IX software. This includes the Emacs editor and the gcc compiler. They are the cornerstones of the many tools that a significant number of developers use every day. Open Software UNIX is open, which basically means that no single company, institution, or individual owns UNIX— nor can it be controlled by a central authority. However, the UNIX name remains a trademark. Anyone using the Internet may obtain open source software, install it, and modify it, and then redistribute the software without ever having to shell out any money in the process. The open source movement has gained great advances and has clearly demonstrated that quality software can, in fact, be free. Granted, it is quite true that certain versions of UNIX are not open, and you do indeed need to pay to use these operating systems in the form of an end-user licensing agreement. Generally speaking, vendors that charge for UNIX represent only a portion of the total number of UNIX releases available within the UNIX community. Programming Environment UNIX provides one ofthe best development environments available by providing many of the important tools software developers need. Also, there are software tools such as compilers and interpreters for just about every major programming language known in the world. Not only can one write programs in just about any computer language, UNIX also provides additional development tools such as text editors, debuggers, linkers, and related software. UNIX was conceived and developed by programmers for programmers, and it stands to reason that it will continue to be the p ro g ra m m e r‘s d e ve lo p m ent platform of choice now and in the future. 13
Availability of Many Tools UNIX comes with a large number of useful applications, utilities, and p ro g ra m s, w h ic h m a n y p e o p le co n sid e r to b e o n e o f U N IX ‘s g re a te st strengths. They are collectively known or commonly referred to as UNIX ―to o ls ,‖ a n d th e y co ve r a w id e ra n g e o f fu n ctio n s a n d p u rp o se s. O n e o f th e most significant aspects of UNIX is the availability of software to accomplish one or more very specific tasks. You will find throughout this text that the concept of tools is quite universal and is used repeatedly. This book not only discusses the subject of system administration but also provides detailed descriptions of UNIX-based tools. As a system administrator, you will come to depend on certain tools to help you do your job. Just as construction workers rely on the tools they use, so too will the administrator rely on the software that permits them to handle a wide range of functions, tasks, issues, and problems. There are tools to handle many system administration tasks that you might encounter. Also, there are tools for development, graphics manipulation, text processing, database operations— just about any user- or systemrelated re q u ire m e n t. If th e b a s ic o p e ra tin g s yste m v e rsio n d o e s n ‘t p ro v id e a p a rticular tool that you need, chances are that someone has already developed the tool and it would be available via the Internet. System Libraries A system library is a collection of software that programmers use to augment their applications. UNIX comes with quite a large collection of functions or routines that can be accessed from several different languages to aid the application writer with a variety of tasks. For example, should the need arise to sort data, UNIX provides several different sort functions. Well Documented UNIX is well documented with both online manuals and with many reference books and user guides from publishers. Unlike some operating systems, UNIX provides online main page documentation of all tools that ship with the system. Also, it is quite documentation.
customary that
open
source tools
provide good
Further, the UNIX community provides journals and magazine articles about UNIX, tools, and related topics of interest.
14
ARCHITECTURE OF UNIX SYSTEM: To understand how the UNIX System works, you need to understand its structure. The UNIX Operating System is made up of several major components. Those components include the Kernel, the shell, the file system, and the commands or user programs. UNIX is a layered operating system. The innermost layer is the hardware that provides the services for the OS. The operating system, referred to in UNIX as the kernel, interacts directly with the hardware and provides the services to th e u se r p ro g ra m s. T h e se u se r p ro g ra m s d o n ‘t n e e d to kn o w a n ything about the hardware. They just need to know how to interact with the kernel a n d it‘s u p to th e k e rn e lto p ro vid e th e d e sire d se rvic e . O n e o f th e b ig a p p e a ls of UNIX to programmers has been that most well written user programs are independent of the underlying hardware, making them readily portable to new systems. Note: The core of the UNIX system is the Kernel. The kernel controls the co m p u ter’s reso u rces,allo ttin g th em to d ifferen t u sers an d to d ifferen t tasks. User programs interact with the kernel through a set of standard system calls. These system calls request services to be provided by the kernel. Such services would include accessing a file: open close, read, write, link, or execute a file; starting or updating accounting records; changing ownership of a file or directory; changing to a new directory; creating, suspending, or killing a process; enabling access to hardware devices; and setting limits on system resources. UNIX is a multi-user, multi-tasking operating system. You can have many use rs lo g g e d in to a syste m sim u lta n e o u sly , e a ch ru n n in g m a n y p ro g ra m s. It‘s th e ke rn e l‘s jo b to ke e p e a ch p ro ce ss a n d u se r se p a ra te a n d to re g u la te access to system hardware, including CPU, memory, disk and other I/O devices. UNIX utilities or commands are a collection of about 200 programs that service the day-to-day processing requirements. These programs are invoked through the shell, which is itself another utility. Apart from the utilities that are provided as part of the UNIX operating system, more than a thousand UNIX based application programs, like database management systems, word processors, accounting software etc., The basic unit used to organize information in the UNIX System is called a file. The UNIX file system provides a logical method for organizing, storing, retrieving, manipulating, and managing information.
15
UNIX SHELLS The Shell reads your commands and interprets them as requests to execute a program or programs, which it then arranges to have carried out. Because the shell plays this role, it is called a command interpreter. Besides being a command interpreter, the shell is also a programming language. As a programming language, it permits you to control how and when commands are carried out. For each user working with UNIX at any time different shell programs are raining. There may be several shells running in memory, but only one kernel. UNIX shell, including three major variants of the shell. 1. The Bourne shell 2. The C Shell 3. The Korn shell
16
The original UNIX system shell, sh, was written by Steve Bourne, and as a result it is known as the Bourne shell. The C shell, csh, was originally developed as part of BSD UNIX. csh introduced a number of important enhancement to sh, including the concept of a command history list and job control. The Korn shell, ksh, builds on the sh and extends it by adding many features from the C shell. Each of these shells has their own respective prompts. The Bourne shell has the $ prompt. So when you login it is the bourn shell that is established for you and the stage is set for you to work on the machine. Features of Shell:
Interactive Processing: It acts as an interface and provides communication between the users and the system.
Background Processing: Time consuming; non-interactive tasks can proceed while the user continues with other processing.
Input/Output redirection: Programs, which can interact with a user, can be made to take their input from another source, such as a file and send their output to another destination, such as printers.
Shell Scripts: A frequently used sequence of shell commands can be stored in a file. The name of the file can be later used to execute the stored sequence with a single command.
Shell Variables: The user can control the behavior of the shell, as well as other programs utilities by storing data in variables.
17
The File System The UNIX file system looks like an inverted tree structure. You start with the root directory, denoted by /, at the top and work down through sub-directories underneath it.
Sreedhar Solo STUD
Each node is either a file or a directory of files, where the latter can contain other files and directories. You specify a file or directory by its path name, either the full, or absolute, path name or the one relative to a location. The full path name starts with the root, /, and follows the branches of the file system, each separated by /, until you reach the desired file, e.g.: /home/Sreedhar/source/xntp A relative path name specifies the path relative to another, usually the current working directory that you are at. Two special directory entries should be introduced now: ● the current directory ● ● the parent of the current directory S o if I‘m a t /h o m e /fra n k a n d w is h to sp e cify th e p a th a b o ve in a re la tiv e fashion I could use: ● ● /Sreedhar/source/xntp
18
This indicates that I should first go up one directory level, then come down through the Sreedhar directory, followed by the source directory and then to xntp.
Unix Directories, Files and Inodes Every directory and file is listed in its parent directory. In the case of the root directory, that parent is itself. A directory is a file that contains a table listing the files contained within it, giving file names to the inode numbers in the list. An inode is a special file designed to be read by the kernel to learn the information about each file. It specifies the permissions on the file, ownership, date of creation and of last access and change, and the physical location of the data blocks on the disk containing the file. The system does not require any particular structure for the data in the file itself. The file can be ASCII or binary or a combination, and may represent text data, a shell script, compiled object code for a program, directory table, junk, or anything you would like. T h e re ‘s n o h e a d e r, tra ile r, la b e l in fo rm a tio n o r EOF character as part of the file.
Unix Programs A program, or command, interacts with the kernel to provide the environment and perform the functions called for by the user. A program can be: an executable shell file, known as a shell script; a built-in shell command; or a source compiled, object code file. The shell is a command line interpreter. The user interacts with the kernel through the shell. You can write ASCII (text) scripts to be acted upon by a shell. System programs are usually binary, having been compiled from C source code. These are located in places like /bin, /usr/bin, /usr/local/bin, /usr/ucb, etc.
19
Module 2 Exploring the UNIX Shell: The shell is a rather unique component of the UNIX operating system since it is one of the primary ways to interact with the system. It is typically through the shell that users execute other commands or invoke additional functions. The shell is commonly referred to as a command interpreter and is responsible for executing tasks on behalf of the user. Figure 2-1 shows a pictorial view of how the shell fits with the UNIX system. As you can see, the shell operates within the framework just like any other program. It provides an interface between the user, the operating system functions, and ultimately the system Kernel.
The UNIX Shell Another powerful feature of the UNIX shell is the ability to support the development and execution of custom shell scripts. The shell contains a mini programming language that provides a lightweight way to develop new tools and utilities without having to be a heavyweight software programmer. A UNIX shell script is a combination of internal shell commands, regular UNIX commands, and some shell programming rules. UNIX supports a large number of different shells, and also many of the popular ones are freely available on the Internet. Also, many versions of UNIX come with one or more shells and as the system administrator, you can install 20
additional shells when necessary and configure the users of the system to use different shells, depending on specific preferences or requirements. The table below lists many of the popular shells and a general description of each. Once a user has logged into the system, the default shell prompt appears and the shell simply waits for input from the user. Thus, logging into a Solaris system as the root user for example, the standard Bourne shell prompt will be # The system echoes this prompt to signal that it is ready to receive input from the keyboard. At this point, this user is free to type in any standard UNIX command, application, or custom script name and the system will attempt to execute or run the command. The shell assumes that the first argument given Shell Name General Description sh Standard Bourne shell, which is one of the most popular shells around. csh
Standard shell with C like language support
bash
GNU Bourne-Again shell that includes elements from the Korn shell and C shell.
tcsh
Standard C shell with command-line editing and filename completion capabilities.
ksh
The Korn shell combines the best features of the Bourne and C shells and includes powerful programming tools
zsh
Korn shell like, but also provides many more features such as built-in spell correction and programmable command completion.
Accessing a UNIX System The configuration you use to access your UNIX System can be based on one of two basic models: using multi-user computer or single user computer. On a multi-user system, you use your own terminal device to access the UNIX system. The computer you access can be a workstation, a microcomputer, a mainframe computer, or even a super computer. Single user systems are direct personal computer. In this you can directly run UNIX OS. (UnixWare 7.1 by SCO, Solaris 7 from SunSoft, Public domain Version of UNIX, and popular variant of UNIX known as Linux can use on single user system). 21
Your display can be character-based, or it can be bit mapped. It may display a single window or multiple windows, as in the X-Windows system.
Before You Start UNIX System from a PC: Many different application packages, called terminal emulators, run on a PC and enable you to connect to a UNIX system. Terminal emulators all function the same basic way, in that they act as terminal attached to the UNIX machine. This allows you to enter commands the same way that you would if you were using a terminal. UNIX System from a Terminal: If your terminal has not been set to work with a UNIX System, you must have its options set appropriately. Setting options is done in different ways on different terminals. Selecting a LOGIN : Every UNIX System has at least one person, called the System Administrator, whose job is to maintain the system, and make it available to its users. The system administrator is also responsible for adding new users to the system and setting up their initial work environment on the computer. Login name should created by the system administrator. In general, login name (logname) can be almost any combination of letters and numbers, but the UNIX System places some constraints on logname selections:
Login name must be more than two characters long, and if it is longer than eight, only the first eight characters are relevant.
It can contain any combination of lowercase letters and numbers and must begin with a lowercase letter. If you log in using uppercase letters, a UNIX system will assume that your terminal can only receive uppercase letters, and will only send uppercase letters for the entire session.
Your logname should not have any symbols or spaces in it, and it must be unique for each user. Some lognames are reserved customarily for certain uses. For example, the root normally refers to the system administrator or superuser who is responsible for the whole system.
Connecting to a UNIX System: Direct Connect: With single user workstations and personal computers, and with the primary administration terminal on a multi-user system (console), a cable permanently connects the terminal with the computer. After booting your PC and invoking your terminal emulator or turning on your terminal, hit the carriage return and you should see the UNIX System prompt that says
22
login: Dial in Access: You may have to dial into the computer using a modem before you are connected. Use your emulator or dial function to dial the UNIX System access number. W hen the system answers the call, you will hear a high-pitched tone you should see some characters appear on screen. Then you getting UNIX system login prompt. Local Area Network: Another means of connecting your PC or terminal to the UNIX System is via a local I network. A local area network (LAN) is a set of communication devices and cables t connects several PCs or terminals and computers. A number of LAN environments are in use today, such as LAN Manager and NetWare. Each LAN environment provides a set of software that can be used in conjunction with a specialized hardware card at each end of the network, called a NIC (network interface card) or a LAN card; that enables you to connect a client machine to a server machine. The clients and servers may be running Windows or UNIX, or both. The protocol most frequently used to connect a client machine to a UNIX server is TCP/IP, with other protocols such as IPX and SPX also widely used on LANs. An example of this environment would be a group of Windows PCs connected to a common UNIX server running a UNIX operating system such as UnixWare 7, Solaris, or Linux. This type of environment usually is maintained by a LAN administrator, a person who knows how local area networks work. This is often the same person like system administrator. In accessing a UNIX System on a LAN, you first need to configure your PC to be able to recognize the system you wish to connect to. IP Network: If PC is connected to an IP network, such as the Internet or an intranet, you can use the telnet command to access any computer on this network that allows such connections. The computer you access may be a UNIX computer, or a computer running some other Operating system, and it may be a local computer or one located thousands of miles away. A variety of telnet commands can help you manage a telnet session with the computer you accessing. Logging In: As a multi-user system, the UNIX System first requires that you identify yourself before you access to the system. login:
23
Changing Your Password: When you first log into a UNIX System, you will have either no password at all (a null password) or an arbitrary password assigned by the system administrator. These are only intended for temporary use. Neither offers any real security. A null password gives anyone access to your account; one assigned by the system administrator is likely to be easily guessed by someone. Officially assigned passwords often consist of simple combinations of your initials and your student, employee, or social security number. If your password is simply your employee number and the letter X, anyone with access to this information has access to all of your computer files. Sometimes random combinations of letters and numbers are used. Such passwords are difficult to remember, and consequently users will be tempted to write them down in a convenient place. (Resist this temptation!) The passwd Command : You change your password by using the passwd command. When you issue this command, the system checks to see if you are the owner of the login. This prevents someone from changing your password and locking you out of your own account. passwd first announces that it is changing the password, and then it asks for your (current) old password, like this: $ passwd passwd: changing password Old password: New password: Re-enter new password: $ The system asks for a new password and asks for the password to be verified (you do this by retyping it). The next time you log in, the new password is effective. Although you can ordinarily change your password whenever you want, on some systems after you change your password you must wait a specific period of time before you can change it again.
How to pick a password? When choosing a password, it is important that it be something that could not be guessed -- either by somebody unknown to you trying to break in, or by an acquaintance who knows you. Suggestions for choosing and using a password follow:
24
Don't
.
Do
.
Use a word (or words) in any language Use a proper name Use information that can be found in your wallet Use information commonly known about you (car license, pet name, etc) Use control characters. Some systems can't handle them Write your password anywhere Ever give your password to *anybody* Use a mixture of character types (alphabetic, numeric, special) Use a mixture of upper case and lower case Use at least 6 characters Choose a password you can remember Change your password often Make sure nobody is looking over your shoulder when you are entering your password
Caution: If you do forget your password, there is no way to retrieve it. Because it is encrypted, even your system administrator cannot lookup your password. If you cannot remember it administrator will have to give you a new password. Changing a Password at Initial Login On some systems, you will be required to change your password the first time you log in. This will work as described previously and will look like this: login: sreedhar Password: Your password has expired. Choose a new one. Old password: New password: Re-enter new password: Password Aging To ensure the secrecy of your password, you will not be allowed to use the same password for long stretches of time. On UNIX Systems, passwords age. When yours gets to the end of its lifespan, you will be asked to change it. The length of time your password will be valid is determined by your system administrator. However, you can view the status of your password on most UNIX systems. Generally, the s option to the passwd command shows you the status of your password, like this:
25
$ passwd -s rayjay PW 04/01/99 7 30 5 name passwd status date last changed min days between changes max days between changes days before user will be warned to change password The first field contains your login name; the next fields list the status of your password, the date it was last changed, and the minimum and maximum days allowed between password changes; and the last field is the number of days before your password will need to be changed. Note that this is simply an example-Km your system, you may not be allowed to read all of these fields. An Incorrect Login If you make a mistake in typing either your login or your password, the UNIX System will respond this way: login: sreedhar Password: Login Incorrect login: You will receive the "Password:" prompt even if you type an incorrect or nonexistent login name. This prevents someone from guessing login names and learning which one is valid by discovering one that yields the "Password:" prompt. Because any login results in "Password:" an intruder cannot guess login names in this way. If you repeatedly type your login or password incorrectly (three to five times, depending on how your system administrator has set the default), the UNIX System will disconnect your terminal if it is connected via modem or LAN. On some systems, the system administrator will be notified of erroneous login attempts as a security measure. If you do not successfully log in within some time interval (usually a minute), you will be disconnected. If you have problems logging in, you might also check to make sure that your CAPS LOCK key has not been set. If it has been set, you will inadvertently enter an incorrect logname or password, because in UNIX uppercase and lowercase letters are treated differently. (Note that unlike in some other environments, your account will not get locked if you enter your password incorrectly some number of times, you will just get disconnected.) 26
When you successfully enter your login and password, the UNIX System responds with a set of messages, similar to this: login: sreedhar Password: UNIX System V/386/486 Release 4.0 Version 3.0 minnie Copyright (c) 1984, 1986, 1987, 1988, 1989, 1990 AT&T Copyright (C) 1987, 1988 Microsoft Corp. Copyright (C) 1990, NCR Corp. All Rights Reserved Last login: Mon January 29 19:55:17 on term/17 You first see the UNIX System announcement that tells you the particular version of UNIX you are using. Next you see the name of your system, minnie in this case. This is followed by the copyright notice. Finally, you see a line that tells you when you logged in last. This is a security feature. If the time of your last login does not agree with when you remember logging in, call your system administrator. This discrepancy could be an indication that someone has broken into your system and is using your login. After this initial announcement, the UNIX System presents system messages and news.
Message of the Day (MOID) Because every user has to log in, the login sequence is the natural place to put messages that need to be seen by all users. When you log in, you will first see a message of the day (MOTD). Because every user must see this MOTD, the system administrator (or root) usually reserves these messages for comments of general interest, such as this: Attention ALL Users !!! minnie will be coming down on Sunday Feb. 5, 2007 from 8:00am until 12:00pm (noon) for system maintenance. Please schedule your work accordingly. Thank you.
The UNIX System Prompt After you log in, you will see the UNIX System command prompt at the far left side of the current line. The default system prompt (for most UNIX Systems) is the dollar sign:
27
$ This $ is the indication that the UNIX System is waiting for you to enter a command. In the examples in this book, you will see the $ at the beginning of a line as it would be seen on the screen, but you are not supposed to type it. The command prompt is frequently changed by users. Users who have accounts on different machines may use a different prompt on each one to remind them which computer they are using. Some users change their prompt to tell them where they are in the UNIX file system or you may simply find the $ symbol unappealing and wish to use a different symbol or set of symbols that you find more attractive. It is simple to do this. The UNIX System enables you to define a prompt string, PS1, which is used as a command prompt. The symbol PS1 is a shell variable (see Chapter 7) that contains the string you want to use as your prompt. To change the command prompt, set PS1 to some new string. For example, $ PS1 = "UNIX:> " changes your primary prompt string from whatever it currently is to the string " UNIX:> ". From that point, whenever the UNIX System is waiting for you to enter a command, it will display this new prompt at the beginning of the line. You can change your prompt to any string of characters you want. You can use it to remind yourself which system you are on, like this: $ PS1="MyUnix-> MyUnix-> or simply to give yourself a reminder: $ PS1="Leave at 4:30 PM> " Leave at 4:30 p.m.> If you redefine your prompt, it stays effective until you change it or until you log off. Later in this chapter, you will learn how to make these changes automatically when you first log in.
Some Basic UNIX Commands Entering Commands on UNIX Systems The UNIX System makes a large number of programs available to the user. To run one of these programs you issue a command. For example, when you type news or passwd, you are really instructing the UNIX System command interpreter to execute a program with the name news or passwd, and to display the results on your screen. 28
Some commands simply provide information to you; news works this way. An often-used command is date, which prints out the current day, date, and time. There are hundreds of other commands, and you will learn about many of them in this book. Different variants of the UNIX system share a large common set of commands (sometimes different names are used for the same command in different UNIX variants) and provide other commands that are unique for that particular version of UNIX.
Unix Command Line Structure The UNIX system offers several file and directory related commands which the user can use according to his requirement. A command is a program that tells the Unix system to do something. It has the form: command [options] [arguments] where an argument indicates on what the command is to perform its action, usually a file or series of files. An option modifies the command, changing the way it performs. Commands are case sensitive. command and Command are not the same. Options are generally preceded by a hyphen (-), and for most commands, more than one option can be strung together, in the form: command -[option][option][option] e.g.: ls – alR will perform a long list on all files in the current directory and recursively perform the list through all sub-directories. For most commands you can separate the options, preceding each with a hyphen, e.g.: command -option1 -option2 -option3 as in: ls -a -l – R Some commands have options that require parameters. Options requiring parameters are usually specified separately, e.g.:
lpr – P printer3 -# 2 file
will send 2 copies of file to printer3. 29
These are the standard conventions for commands. However, not all Unix co m m a n d s w il fo lo w th e sta n d a rd . S o m e d o n ‘t re q u ire th e h yp h e n b e fo re o p tio n s a n d so m e w o n ‘t le t yo u g ro u p o p tio n s to g e th e r, i.e . th e y m a y re q u ire that each option be preceded by a hyphen and separated by white space from other options and arguments. Options and syntax for a command are listed in the man page for the command.
UNIX Commands: UNIX comes with a large number of commands that fall under each of the categories listed above for both the generic user and the system administrator. It is quite hard to list and explain all of the available UNIX functions and/or commands in a single book. Therefore, a review of some of the more important user-level commands and functions has been provided and subsequent modules provide a more in-depth look at system-level commands. All of the commands discussed below can be run by generic users and of course by the system administrator. However, one or more subfunctions of a command may be available only to the system administrator. The standard commands are listed bellow, which are available across many different versions of UNIX. For example, if we wanted to get a listing of all the users that are currently logged into the system, the who command can be used. UNIX
Command Meaning
cat
Show the content of file.
date
Show system date and time.
hostname
Display name of system.
find
Search for a specific file.
grep
Search a file for specified pattern.
ls
List files in a directory.
more
Another command to show content of file.
ps
Show status of processes.
who
Show current users on the system.
30
Metacharacters and Wildcards
The metacharacters have special meaning to the shell; they should not normally be used as any part of a file name. The "-" symbol can usually be used in a filename provided it is not the first character. For example, if we had a file called -l then issuing the command ls -l would give you a long listing of the current directory because the ls command would think the l was an option rather than -l being a file name argument. Some UNIX commands provide facilities to overcome this problem. The shell offers certain special characters called a wild card character that helps us to specify certain patterns. The shell will then match the pattern in the file names and select all the files whose name matches the pattern and will apply the specified file command. The wild card characters are as follows This wild card character matches any number of characters.
Therefore any pattern which contains the symbol it will be replaced by any number of any characters.
31
The wildcard ? is expanded by the shell to match any single character in a file name. The exception is that the ? w il N O T m a tch a d o t ―.‖ a s the first character of a file name (for example, in a hidden file). The wildcard * is expanded by the shell to match zero to any number of characters in a file name. The single * will be expanded to mean all files in the current directory except those beginning with a dot. Beware of the command rm * which could cause serious damage removing all files! Specifying a Multiple File Names Multiple filenames can be specified using special pattern-matching characters. The rules are:
'?' matches any single character in that position in the filename. '*' matches zero or more characters in the filename. A '*' on its own will match all files. '*.*' matches all files with containing a '.'. Characters enclosed in square brackets ('[' and ']') will match any filename that has one of those characters in that position. A list of comma separated strings enclosed in curly braces ("{" and "}") will be expanded as a Cartesian product with the surrounding characters.
For example: 1. ??? matches all three-character filenames.
32
2. ?ell? matches any five-character filenames with 'ell' in the middle. 3. he* matches any filename beginning with 'he'. 4. [m-z]*[a-l] matches any filename that begins with a letter from 'm' to 'z' and ends in a letter from 'a' to 'l'. 5. {/usr,}{/bin,/lib}/file expands to /usr/bin/file /usr/lib/file /bin/file and /lib/file. Note that the UNIX shell performs these expansions (including any filename matching) on a command's arguments before the command is executed.
Example *c includes all files ending with '.c' because * stands for any number of any characters, e.g new.c, ptr.c, str.c etc. A command like rm *.c will therefore delete all files ending with '.c' The other files which do not end with '.c' will be retained. The pattern specifies that the files must neccessarily end with '.c'. ? ▬ T h is w ild ca rd sp e cifie s a n y o n e ch a ra cte r. T h e re fo re in a p a tte rn if th e wild card ? appears then it will be replaced by any one character. Example cat ab?xy The above command will display the contents of all files whose name starts with ab followed by any one character followed by xy. This wild card specifies any one of the character listed out within the [ ]. Example rm ab[efg]yz The above command will delete all the files that begin with ab followed by either e, f, or g followed by xy. PIPES UNIX offers a provision whereby the output of one program can be made the input of another program. Both the programs are separated by the | symbol. Example $ cat fil.cjpg
33
The above command will display the contents of the file fll.c page by page because the output is piped to a program called pg which displays the output only one screenful at a time.
UNIX Standard Files: There are three files are automatically opened for each process in the system. These files are referred to as standard input, standard output and standard error. Standard input, sometimes abbreviated to stdin is where a command expects to find its input, usually the keyboard. Standard out (stdout) and standard error (stderr) is where the command expects to put its output, usually the screen. These defaults can be changed using redirection.
34
Note: Remember that in AIX, not all file names refer to real data files! S o m e file s m a y b e ―sp e cia lfile s‖ w h ic h in re a lity a re a p o in te r to so m e of the devices on the system (for example /dev/tty0).
35
36
37
Two or more commands can be separated by a pipe on a single command line. The requirement is that any command to the left of a pipe must send output to standard output. Any command to the right of the pipe must take its input from standard input. The example on the visual shows that the output of who is passed as input to wc -l, which gives us the number of active users on the system.
38
A command is referred to as a filter if it can read its input from standard input, alter it in some way, and write its output to standard output. A filter can be used as an intermediate command between pipes. A filter is commonly used with a string of piped commands, as in the example above. The ls -l command lists all the files in the current directory and then pipes this information to the grep command. The grep command will be covered in more detail later in the course, but in this example, the grep command is used to find all lines beginning with a d (directories). The output of the grep command is then piped to the wc -l command. The result is that the command is counting the number of directories. In this example, the grep command is acting as a filter.
Placing multiple commands separated b y a ― ; ‖ o n a sin g le lin e p ro d u ce s th e same result as entering each command on a separate command line. There need be no association between the two commands.
39
The \ must be the last character on the line and immediately followed by pressing Enter. Do not confuse the continuation prompt > with the redirection character >. The secondary prompt will not form part of the completed command line. If you require a redirection character you must type it explicitly.
Module 3 Processes:
A program or a command that is actually running on a system is referred to as 40
a process. UNIX can run a number of different processes at the same time as well as many occurrences of a program (such as vi) existing simultaneously in the system. The process ID (PID) is extracted from a process table. In a shell environment, the process ID is stored in the variable $$. To identify the running processes, execute the command ps, which will be covered later in this course. For example, ps -u team01 shows all running processes from user team01.
41
ps prints information only about processes started from your current terminal. Only the Process ID, Terminal, Elapsed Time and Command are displayed. The -e option displays information about EVERY process running in the system. The -f option in addition to the default information provided by ps, displays the User Name, PPID, start time for each process (that is, a FULL listing). The -l option displays the User ID, PPID and priorities for each process in addition to the information provided by ps (that is, a LONG listing)
42
Processes that are started from and require interaction with the terminal are called foreground processes. Processes that are run independently of the initiating terminal are referred to as background processes. Background processes are most useful with commands that take a long time to run. A process can only be run in the background if: 1. It doesn't require keyboard input, and 2. It is invoked with an ampersand & as the last character in the command line.
Notes: The may not always work. A Shell script or program can trap the signal a generates and ignore its meaning.
43
You can stop a foreground process by pressing . This does not terminate the process; it suspends it so that you can subsequently restart it. To restart a suspended processes in the background, use the bg command. To bring a suspended or background process into the foreground, use the fg command. To find out what suspended/background jobs you have, issue the jobs command. The bg, fg, kill commands can be used with a job number. For instance, to kill job number 3, you can issue the command: kill %3 The jobs command does not list jobs that were started with the nohup command if the user has logged off and then logged back into the system. On the other hand, if a user invokes a job with the nohup command and then issues the jobs command without logging off, the job will be listed.
44
Module 4 Shell Script:
A shell script is a simple text file that contains UNIX commands. When a shell script is executed, the shell reads the file one line at a time and processes the commands in sequence. Any UNIX command can be run from within a shell script. There are also a number of built-in shell facilities which allow more complicated functions to be performed. These will be illustrated later. Any UNIX editor can be used to create a shell script.
45
A shell script is a collection of commands in a file. In the example a shell script hello is shown. To execute this script, start the program ksh and pass the name of the shell script as argument: $ ksh hello This shell reads the commands from the script and executes all commands line by line.
The .profie file A fte r a u se r lo g s in a n d a s p a rt o f sta rtin g u p th e u se r‘s sh e ll, two profile files are executed. The first is the system profile /etc/profile, which is run by every user, and the second is the .profile in the user home directory, which is only run by the user who owns it. The .profile contains a sequence of commands that help you customize your environment. Because the .profile is read each time you start a new Korn shell, the commands you put in this file to customize your environment will be executed each time you start a new ksh. These commands can include, but are certainly not limited to, the following:
46
1. aliases 2. terminal control characteristics 3. creation/definition of shell environment variables (including your prompt)
The first file that the operating system uses at login is the /etc/environment file. This file contains variables specifying the basic environment for all processes and can only be changed by the system administrator. The second file that the operating system uses at login time is the /etc/profile file. This file controls system-wide default variables such as the mail messages and terminal types. /etc/profile can only be changed by the administrator. The .profile file is the third file read at login time. It resides in a user's login directory and enables a user to customize their individual working environment. The .profile file overrides commands run and variables set and exported by the /etc/profile file. Ensure that newly created variables do not conflict with standard variables such as MAIL, PS1, PS2 and so forth.
47
At startup time the shell checks to see if there is any new mail in /usr/spool/mail/$LOGNAME. If there is then MAILMSG is echoed back. In normal operation, the shell checks periodically. The ENV="$HOME/.kshrc" variable will cause the file $HOME/.kshrc to be run every time a new Korn shell is explicitly started. This file will usually contain Korn shell specifics. The .profile file is read only when the user logs in. Be aware that your .profile file may not be read if you are accessing the system through CDE (the Common Desktop Environment). By default, CDE instead uses a file called .dtprofile. In the CDE environment, if you wish to use the .profile file, it is necessary to uncomment the DTSOURCEPROFILE variable assignment at the end of the .dtprofile file.
48
Module 5 Overview The tilde (~) Expansion: The C shell provides an easy way to abbreviate the pathname of your home directory. When the tilde symbol (~) appears at the beginning of a word in your command line, the shell replaces it with the full pathname of your login directory. Example: % mv file ~/newfile Is the abbreviated way of typing this % mv file $home/newfile The whence Command The whence command can be used to determine exactly where the command you specify is located. For instance, it may be a command located on the disk drive, it may be an alias, or it may be built-in to the Korn shell. whence reports the proper location. whence $ whence ls /bin/ls
$ whence dir /bin/ls -al | more $ whence echo echo
49
Aliases Aliases in the Korn shell allow you to create your own commands. You can simply rename existing commands, or you can group commands together to create entirely new commands. This feature is also available in the C shell, but the command syntax is slightly different. The ksh syntax for alias commands: alias name='value'
50
The ENV variable specifies a Korn shell script to be invoked every time a new shell is created. The shell script in this example is .kshrc (which is the standard name used), but any other filename can also be used. The difference between .profile and .kshrc is that .kshrc is read each time a subshell is spawned, whereas .profile is read once at login. You can also set the following variable in $HOME/.profile: EDITOR=/usr/bin/vi export EDITOR It will do the same thing that the set -o vi command does as shown in the example.
The alias command invoked with no arguments prints the list of aliases in the form name=value on standard output. 51
The Korn shell sets up a number of aliases by default. Notice that the history and r commands are in fact aliases of the fc command. Once this alias is established, typing an r will reexcute the previously entered command. To carry down the value of an alias to subsequent subshells, the ENV variable has to be modified. The ENV variable is normally set to $HOME/.kshrc in the .profile file (although you can set ENV to any shell script). By adding the alias definition to the .kshrc file (by using one of the editors) and invoking the .profile file, the value of the alias will be carried down to all subshells, because the .kshrc file is run every time a Korn shell is explicitly invoked. The file pointed to by the ENV variable should contain Korn shell specifics.
The unalias command will cancel the alias named. The names of the aliases specified with the unalias command will be removed from the alias list.
52
The /etc/environment file contains default variables set for each process. Only the system administrator can change this file. PATH is the sequence of directories that is searched when looking for a command whose path name is incomplete. TZ is the time zone information. LANG is the locale name currently in effect. LOCPATH is the full path name of the location of National Language Support information, part of this being the National Language Support Table. NLSPATH is the full path name for messages.
53
Module 6 The vi Editor
It is important to know vi for the following reasons: • It is th e only editor available in maintenance mode on RISC System/6000 • S ta n d a rd e d ito r a cro ss a l U N IX syste m s • C o m m a n d -line editing feature • U se d a s d e fa u lt e d ito r fo r so m e p ro g ra m s This unit covers only a subset of the vi functions. It is a very powerful editor. Refer to the online documentation for additional functions. vi does its editing in a buffer. When a session is initiated, one of two things happens: • If th e file to b e e d ite d e xis ts, a co p y o f th e file is p u t in to a b u ffe r in /tmp by default. • If the file does not exist, an empty buffer is opened for this session. Tildes represent empty lines in the editor. 54
The editor starts in command mode.
55
56
Module 7 The Variables: There are a number of variables automatically set by the shell when it starts. These allow you to reference arguments on the command line. User Variables It is legal to assign any sequence of non-blank characters as the name of a variable. The sample session below creates a variable called person and initializes it with the string Richard. It is important to note that you must NOT precede or follow the equal sign with a space or TAB character. Sample Session: $person=Sreedhar This sample session indicates that person does not represent the string Richard. The string person is echoed as person. The BourneShell will only do the substitution of the value of the variable when the name of the variable is preceded with a dollar sign ($). Sample Sesssion:
$echo person person $echo $person Sreedhar $ If you want to have imbedded spaces in a variable, it is necessary to quote the string. Sample Session:
$person=‘S re e d h a r a nd Venkatesh' $echo $person Sreedhar and Venkatesh $
57
Shell variables are an integral part of shell programming. They provide the ability to store and manipulating information within a shell program. All shell variable names are case sensitive. For example, HOME and home are not the same. As a convention uppercase names are used for the standard variables set by the system and lowercase is used for the variables set by the user.
58
The set command displays your current option settings for all the variables. The set command is a built-in command of the shell, and therefore gives a different output depending on the shell being run, for instance a Bourne or a Korn shell.
The echo command displays the string of text to standard out (by default to the screen). To set a variable, use the = with NO SPACES on either side. Once the variable has been set, to refer to the value of that variable precede the variable name with a $. There must be NO SPACE between the $ and the variable name.
59
Notice there need not be a space BEFORE the $ of the variable in order for the shell to do variable substitution. Note, though, what happened when there was no space AFTER the variable name. The shell searched for a variable whose name was xylong, which did not exist. When a variable that has not been defined is referenced, the user does not get an error. Rather a null string is returned. To eliminate the need for a space after the variable name, the curly braces { } are used. Note that the $ is OUTSIDE of the braces.
60
A variable can be set to the output of some command or group of commands by using the backquotes (also referred to as grave accents). They should not be mistaken for single quotes. In the examples the output of the date and who commands are stored in variables. The backquotes are supported by the bourne shell, C shell and Korn shell. The use of $(command) is specific to the Korn shell.
Read-Only User Variables The contents of the user variables and the shell variables can be modified by the user. It is possible to assign a new value to them. The new value can be assigned from the dollar ($) prompt or from inside a BourneShell script. Read-only variables are different. The value of read-only variables can not be changed. The variable must be initialized to some value; and then, by entering the following command, it can be made read only.
Command format:
readonly variable_name
variable_name = name of the variable to be made read only
61
Sample Session:
$person=Sreedhar $readonly person $echo $person Sreedhar $person=Venkatesh person: is read only $ The readonly command given without any arguments will display a list of all the read-only variables. Sample Session:
$person=Sreedhar $readonly person $example=Venkatesh $readonly example $readonly readonly person readonly example $ Read-Only Shell Variables The read-only shell variables are similar to the read-only user variables; except the value of these variables is assigned by the shell, and the user CANNOT modify them. Name of the Calling Program The shell will store the name of the command you used to call a program in the variable named $0. It has the number zero because it appears before the first argument on the command line. Sample Session: $cat name_ex echo 'The name of the command used' echo 'to execute this script was' $0 $name_ex The name of the command used to execute this script was name_ex 62
$ Arguments The BourneShell will store the first nine command line arguments in the variables named $1, $2, ..., $9. These variables appear in this section because you cannot change them using the equal sign. It is possible to modify them using the set command. Sample Session: $cat arg_ex echo 'The first five command line' echo 'arguments are' $1 $2 $3 $4 $5 $arg_ex Sreedhar Venkatesh Santhosh The first five command line arguments are Sreedhar venkatesh Santhosh $ The script arg_ex will display the first five command-line arguments. The variables representing $4 and $5 have a null value. The BourneShell variable $* represents all of the command-line arguments as shown in the following example. Sample Session: $cat display_all echo $* $display_all Sreedhar venkatesh Santhosh Sreedhar venkatesh Santhosh $ The BourneShell variable $# contains the number of arguments on the command line. This is a string variable that represents a decimal number. You can use the expr utility to perform calculations with that number and test to perform logical tests on it. Sample Session: $cat num_args echo 'This script was called with' echo $# 'arguments' $num_args Sreedhar venkatesh Santhosh This script was called with 3 arguments $
63
BourneShell Environment - Exporting Variables Within a process, you can declare, initialize, read, and modify variables. The variable is local to that process. W hen a process forks a child process, the parent process does not automatically pass the value of the variable to the child process. Here is an example of the variables not being exported. Sample Session: $cat no_export car=mercedes # set the variable echo $0 $car $$ # $0 = name of file executed # $car =value of variable car # $$ = PID number (process id) inner # execute another BourneShell script echo $0 $car $$ # display same as above $cat inner echo $0 $car $$ # display variables for this process $chmod a+x no_export $chmod a+x inner $no_export no_export mercedes 4790 inner 4792 no_export mercedes 4790 $
When no_export was executed, it, of course, assigned a value of mercedes to the variable car and printed it out. The call to inner created a child process. Its PID is 4792, while the parent PID is 4790. Notice, when inner tried to print the value of car, it printed nothing. The reason is because the value of car was not passed by the parent. Can the value be passed from parent to child process? Yes, by using the export command. Let's look at an example. Sample Session:
$cat export_it car=mercedes export car echo $0 $car $$ inner1 echo $0 $car $$ $cat inner1 echo $0 $car $$ car=chevy 64
echo $0 $car $$ $chmod a+x export_it $chmod a+x inner1 $export_it export_it mercedes 4798 inner1 mercedes 4800 inner1 chevy 4800 export_it mercedes 4798 $
In the export_it BourneShell script, the variable car was initialized to mercedes; and then it was exported. This means that the value of car is now available to a child process. When inner1 prints out the value of car it has the value of mercedes. This is as we expect because the value of car was exported from the parent. The next line of inner1 changes the value of car to chevy. This is shown in the next line of the sample session. The last line of the session shows the return to the parent process and the value is still mercedes. How is this possible? Exporting variables is only valid from the parent to the child process. The child process cannot change the parent's variable.
Reading Input Into a Shell Variable The BourneShell script can read user input from standard input. The read command will read one line from standard input and assign the line to one or more variables. The following example shows how this works. Sample Session: $cat read_script echo "Please enter a string of your choice" read a echo $a $ This simple script will read one line from standard input (keyboard) and assign it to the variable a. Sample Session:
$read_script Please enter a string of your choice Here it is Here it is $
65
The line read from standard input can also be assigned to several variables as shown in the following example. Sample Session: $cat reads echo "Please enter three strings" read a b c echo $a $b $c echo $c echo $b echo $a $ This time, we will turn on the trace mechanism and follow the execution of this BourneShell script. Sample Session:
$sh -x reads + echo Please enter three strings Please enter three strings + read a b c this is more than three strings + echo this is more than three strings this is more than three strings + echo more than three strings more than three strings + echo is is + echo this this $
It is interesting to note that the spaces separate the values for the variables a,b, and c. For example, the variable a was assigned the string this, the variable b was assigned the string is, and the remainder of the line was assigned to c (including the spaces). Sample Session:
$cat read_ex echo 'Enter line: \c' read line echo "The line was: $line" $
66
In this example, the \c option will suppress the carriage return. The single quote marks protect the backslash from being interpreted by the shell. Also notice that the double quote marks have no effect on the substitution of the variable line. Sample Session:
$read_ex Enter line: All's well that ends well The line was: All's well that ends well $
67
Module 8 Parameters: A shell is invoked by typing its name. Parameters are passed to the script by appending them to the script name, with spaces as separators. POSITIONAL PARAMETERS A BourneShell script can also read in command-line arguments. The first argument is referred to as $1, the second is $2, and so on. Command-line arguments are referred to as positional parameters. Let's look at an example BourneShell script to see how these are used. Sample Session: $cat neat_shell echo $1 $2 $3 echo $0 is the name of the shell script echo "There were $# arguments." echo $* $ Insure that the BourneShell script is executable by issuing this command: Sample Session:
$chmod a+x neat_shell $
Now, if we type the name of the BourneShell script with no arguments, we get the following results. Sample Session:
$neat_shell neat_shell is the name of the shell script There were 0 arguments. $
68
In this sample session, there were no arguments given so none were printed. $0 is the positional parameter that refers to the name of the script. Since there were no arguments given with this invocation of neat_shell, there were zero arguments listed. $0: The Name of the Invoking Command The special variable $0 represents the name of the executing program. The following shell, if called script.sh would output This program is called script.sh.: #!/bin/sh echo This program is called $0. exit 0 $1 $2 $3 ... $9, $*: Shell Parameters The first parameter to the shell is known as $1, the second as $2, etc. The collection of ALL parameters is known as $*. Consider the following as an example (file prog): #!/bin/sh echo the first parameter is $1 echo the second parameter is $2 echo the collection of ALL parameters is $* exit 0 The output of that program could be: sh_prompt;SPMgt; prog first second the first parameter is first the second parameter is second the collection of ALL parameters is first second sh_prompt;SPMgt; $#: Number of Parameters The number of parameters used can be obtained by looking at the value of $#.
Setting values of positional Parameters Though we have compared the positional parameters with variables, they are in essence quite different. For insta n ce yo u ca n ‘t a ssig n va lu e s to $ 1 , $ 2 ..
69
etc. as we do to any other user-defined variables, or system variables for that matter. Saying a=10 or b=alpha is fine but $1=dollar or $2=100 is simply not done. There is one way to assign values to the positional parameters using the set command. $ set Friends come and go, but enemies accumulate T h e a b o ve co m m a n d se ts th e va lu e $ 1 w ith ‗F rie n d s‘, $ 2 w ith ‗co m e ‘ a n d so on. To verify, we use the echo statement to display their values. $ echo $1 $2 $3 $4 $5 $6 $7 Friends come and go, but enemies accumulate
Using shift: Shifts Parameters When a large number of parameters (more than 9) are passed to the shell, shift can be used to read those parameters. If the number of parameters to be read is known, say three, a program similar to the following could be written: #!/bin/sh echo The first parameter is $1. shift echo The second parameter is $1. shift echo The third parameter is $1. exit 0 Obviously the above example contains redundancy, especially if there are a large number of parameters. To solve this problem: use a for or while loop.
70
Module 9 Regular Expresiion: What is a Regular Expression? A regular expression is a set of characters that specify a pattern. The term "regular" has nothing to do with a high-fiber diet. It comes from a term used to describe grammars and formal languages. Regular expressions are used when you want to search for specify lines of text containing a particular pattern. Most of the UNIX utilities operate on ASCII files a line at a time. Regular expressions search for patterns on a single line, and not for patterns that start on one line and end on another. It is simple to search for a specific word or string of characters. Almost every editor on every computer system can do this. Regular expressions are more powerful and flexible. You can search for words of a certain size. You can search for a word with four or more vowels that end with an "s." Numbers, punctuation characters, you name it, a regular expression can find it. W hat happens once the program you are using find it is another matter. Some just search for the pattern. Others print out the line containing the pattern. Editors can replace the string with a new pattern. It all depends on the utility. Regular expressions confuse people because they look a lot like the file matching patterns the shell uses. They even act the same way--almost. The square brackers are similar, and the asterisk acts similar to, but not identical to the asterisk in a regular expression. In particular, the Bourne shell, C shell, find, and cpio use file name matching patterns and not regular expressions. The Structure of a Regular Expression There are three important parts to a regular expression. Anchors are used to specify the position of the pattern in relation to a line of text. Character Sets match one or more characters in a single position. Modifiers specify how many times the previous character set is repeated. A simple example that demonstrates all three parts is the regular expression "^#*." The up arrow is an anchor that indicates the beginning of the line. The character "#" is a simple character set that matches the single character "#." The asterisk is a modifier. In a regular expression it specifies that the previous character set can appear any number of times, including zero. This is a useless regular expression, as you will see shortly. There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression. A few utilities like awk and egrep use the extended expression. Most use the "regular" regular 71
expression. From now on, if I talk about a "regular expression," it describes a feature in both types. Here is a table of the Solaris (around 1991) commands that allow you to specify regular expressions: Utility Regular Expression Type vi
Basic
sed
Basic
grep
Basic
csplit
Basic
dbx
Basic
dbxtool Basic more
Basic
ed
Basic
expr
Basic
lex
Basic
pg
Basic
nl
Basic
rdist
Basic
awk
Extended
nawk
Extended
egrep
Extended
EMACS EMACS Regular Expressions PERL
PERL Regular Expressions
The Anchor Characters: ^ and $ Most UNIX text facilities are line oriented. Searching for patterns that span several lines is not easy to do. You see, the end of line character is not included in the block of text wthat is searched. It is a separator. Regular expressions examine the text between the separators. If you want to search for a pattern that is at one end or the other, you use anchors. The character "^" is the starting anchor, and the character "$" is the end anchor. The regular expression "^A" will match all lines that start with a capital A. The expression "A$" will match all lines that end with the capital A. If the anchor characters are not used at the proper end of the pattern, then they no longer act as anchors. That is, the "^" is only an anchor if it is the first character in a regular expression. The "$" is only an anchor if it is the last character. The expression "$1" does not have an anchor. Neither is "1^." If you need to match a "^" at the beginning of the line, or a "$" at the end of a line, you must escape the special characters with a back slash. Here is a summary:
72
Pattern Matches ^A
"A" at the beginning of a line
A$
"A" at the end of a line
A^
"A^" anywhere on a line
$A
"$A" anywhere on a line
^^
"^" at the beginning of a line
$$
"$" at the end of a line
The use of "^" and "$" as indicators of the beginning or end of a line is a convention other utilities use. The vi editor uses these two characters as commands to go to the beginning or end of a line. The C shell uses "!^" to specify the first argument of the previous line, and "!$" is the last argument on the previous line. It is one of those choices that other utilities go along with to maintain consistancy. For instance, "$" can refer to the last line of a file when using ed and sed. Cat -e marks end of lines with a "$." You might see it in other programs as well.
Matching a character with a character set The simplest character set is a character. The regular expression "the" contains three character sets: "t," "h" and "e." It will match any line with the string "the" inside it. This would also match the word "other." To prevent this, put spaces before and after the pattern: " the ." You can combine the string with an anchor. The pattern "^From: " will match the lines of a mail message that identify the sender. Use this pattern with grep to print every address in your incoming mail box: grep '^From: ' /usr/spool/mail/$USER Some characters have a special meaning in regular expressions. If you want to search for such a character, escape it with a back slash. Match any character with . The character "." is one of those special meta-characters. By itself it will match any character, except the end-of-line character. The pattern that will match a line with a single characters is ^.$ Specifying a Range of Characters with [...] If you want to match specific characters, you can use the square brackets to identify the exact characters you are searching for. The pattern that will match any line of text that contains exactly one number is
73
^[0123456789]$ This is verbose. You can use the hyphen between two characters to specify a range: ^[0-9]$ You can intermix explicit characters with character ranges. This pattern will match a single character that is a letter, number, or underscore: [A-Za-z0-9_] Character sets can be combined by placing them next to each other. If you wanted to search for a word that 1. 2. 3. 4. 5.
Started with a capital letter "T." Was the first word on a line The second letter was a lower case letter Was exactly three letters long, and The third letter was a vowel
the regular expression would be "^T[a-z][aeiou] ." Exceptions in a character set You can easily search for all characters except those in square brackets by putting a "^" as the first character after the "[." To match all characters except vowels use "[^aeiou]." Like the anchors in places that can't be considered an anchor, the characters "]" and "-" do not have a special meaning if they directly follow "[." Here are some examples: Regular Expression Matches []
The characters "[]"
[0]
The character "0"
[0-9]
Any number
[^0-9]
Any character other than a number
[-0-9]
Any number or a "-"
[0-9-]
Any number or a "-"
[^-0-9]
Any character except a number or a "-"
[]0-9]
Any number or a "]"
[0-9]]
Any number followed by a "]"
[0-9-z]
Any number, or any character between "9" and "z".
[0-9\-a\]]
Any number, or 74
a "-", a "z", or a "]"
Repeating character sets with * The third part of a regular expression is the modifier. It is used to specify how may times you expect to see the previous character set. The special character "*" matches zero or more copies. That is, the regular expression "0*" matches zero or more zeros, while the expression "[0-9]*" matches zero or more numbers. This explains why the pattern "^#*" is useless, as it matches any number of "#'s" at the beginning of the line, including zero. Therefore this will match every line, because every line starts with zero or more "#'s." At first glance, it might seem that starting the count at zero is stupid. Not so. Looking for an unknown number of characters is very important. Suppose you wanted to look for a number at the beginning of a line, and there may or may not be spaces before the number. Just use "^ *" to match zero or more spaces at the beginning of the line. If you need to match one or more, just repeat the character set. That is, "[0-9]*" matches zero or more numbers, and "[0-9][09]*" matches one or more numbers.
Matching a specific number of sets with \{ and \} You can continue the above technique if you want to specify a minimum number of character sets. You cannot specify a maximum number of sets with the "*" modifier. There is a special pattern you can use to specify the minimum and maximum number of repeats. This is done by putting those two numbers between "\{" and "\}." The back slashes deserve a special discussion. Normally a backslash turns off the special meaning for a character. A period is matched by a "\." and an asterisk is matched by a "\*." If a backslash is placed before a "," "{," "}," "(," ")," or before a digit, the back slash turns on a special meaning. This was done because these special functions were added late in the life of regular expressions. Changing the meaning of "{" would have broken old expressions. This is a horrible crime punishable by a year of hard labor writing COBOL programs. Instead, adding a back slash added functionality without breaking old programs. Rather than complain about the unsymmetry, view it as evolution. Having convinced you that "\{" isn't a plot to confuse you, an example is in order. The regular expression to match 4, 5, 6, 7 or 8 lower case letters is [a-z]\{4,8\} Any numbers between 0 and 255 can be used. The second number may be omitted, which removes the upper limit. If the comma and the second number
75
are omitted, the pattern must be duplicated the exact number of times specified by the first number. You must remember that modifiers like "*" and "\{1,5\}" only act as modifiers if they follow a character set. If they were at the beginning of a pattern, they would not be a modifier. Here is a list of examples, and the exceptions: Regular Expression Matches _ *
Any line with an asterisk
\*
Any line with an asterisk
\\
Any line with a back slash
^*
Any line starting with an asterisk
^A*
Any line
^A\*
Any line starting with an "A*"
^AA*
Any line if it starts with one "A"
^AA*B
Any line with one or more "A"'s followed by a "B"
^A\{4,8\}B
Any line starting with 4, 5, 6, 7 or 8 "A"'s followed by a "B"
^A\{4,\}B
Any line starting with 4 or more "A"'s followed by a "B"
^A\{4\}B
Any line starting with "AAAAB"
\{4,8\}
Any line with "{4,8}"
A{4,8}
Any line with "A{4,8}"
Matching words with \< and \> Searching for a word isn't quite as simple as it at first appears. The string "the" will match the word "other." You can put spaces before and after the letters and use this regular expression: " the ." However, this does not match words at the beginning or end of the line. And it does not match the case where there is a punctuation mark after the word. There is an easy solution. The characters "\" are similar to the "^" and "$" anchors, as they don't occupy a position of a character. They do "anchor" the expression between to only match if it is on a word boundary. The pattern to search for the word "the" would be "\." The character before the "t" must be either a new line character, or anything except a letter, number, or underscore. The character after the "e" must also be a character other than a number, letter, or underscore or it could be the end of line character.
76
Backreferences - Remembering patterns with \(, \) and \1 Another pattern that requires a special mechanism is searching for repeated words. The expression "[a-z][a-z]" will match any two lower case letters. If you wanted to search for lines that had two adjoining identical letters, the above pattern wouldn't help. You need a way of remembering what you found, and seeing if the same pattern occurred again. You can mark part of a pattern using "\(" and "\)." You can recall the remembered pattern with "\" followed by a single digit. Therefore, to search for two identical letters, use "\([a-z]\)\1." You can have 9 different remembered patterns. Each occurrence of "\(" starts a new pattern. The regular expression that would match a 5 letter palindrome, (e.g. "radar"), would be \([a-z]\)\([a-z]\)[a-z]\2\1 Potential Problems That completes a discussion of the Basic regular expression. Before I discuss the extensions the extended expressions offer, I wanted to mention two potential problem areas. The "\" characters were introduced in the vi editor. The other programs didn't have this ability at that time. Also the "\{min,max\}" modifier is new and earlier utilities didn't have this ability. This made it difficult for the novice user of regular expressions, because it seemed each utility has a different convention. Sun has retrofited the newest regular expression library to all of their programs, so they all have the same ability. If you try to use these newer features on other vendor's machines, you might find they don't work the same way. The other potential point of confusion is the extent of the pattern matches. Regular expressions match the longest possible pattern. That is, the regular expression A.*B matches "AAB" as well as "AAAABBBBABCCCCBBBAAAB." This doesn't cause many problems using grep, because an oversight in a regular expression will just match more lines than desired. If you use sed, and your patterns get carried away, you may end up deleting more than you wanted too. Extended Regular Expressions Two programs use the extended regular expression: egrep and awk. With these extensions, those special characters preceded by a back slash no longer have the special meaning: "\{," "\}," "\," "\(," "\)" as well as the "\digit." There is a very good reason for this, which I will delay explaining to build up suspense.
77
The character "?" matches 0 or 1 instances of the character set before, and the character "+" matches one or more copies of the character set. You can't use the \{ and \} in the extended regular expressions, but if you could, you might consider the "?" to be the same as "\{0,1\}" and the "+" to be the same as "\{1,\}." By now, you are wondering why the extended regular expressions is even worth using. Except for two abbreviations, there are no advantages, and a lot of disadvantages. Therefore, examples would be useful. The three important characters in the expanded regular expressions are "(," "|," and ")." Together, they let you match a choice of patterns. As an example, you can egrep to print all From: and Subject: lines from your incoming mail: egrep '^(From|Subject): ' /usr/spool/mail/$USER All lines starting with "From:" or "Subject:" will be printed. There is no easy way to do this with the Basic regular expressions. You could try "^[FS][ru][ob][mj]e*c*t*: " and hope you don't have any lines that start with "Sromeet:." Extended expressions don't have the "\" characters. You can compensate by using the alternation mechanism. Matching the word "the" in the beginning, middle, end of a sentence, or end of a line can be done with the extended regular expression: (^| )the([^a-z]|$) There are two choices before the word, a space or the beginining of a line. After the word, there must be something besides a lower case letter or else the end of the line. One extra bonus with extended regular expressions is the ability to use the "*," "+," and "?" modifiers after a "(...)" grouping. The following will match "a simple problem," "an easy problem," as well as "a problem." egrep "a[n]? (simple|easy)? problem" data I promised to explain why the back slash characters don't work in extended regular expressions. Well, perhaps the "\{...\}" and "\" could be added to the extended expressions. These are the newest addition to the regular expression family. They could be added, but this might confuse people if those characters are added and the "\(...\)" are not. And there is no way to add that functionality to the extended expressions without changing the current usage. Do you see why? It's quite simple. If "(" has a special meaning, then "\(" must be the ordinary character. This is the opposite of the Basic regular expressions, where "(" is ordinary, and "\(" is special. The usage of the parentheses is incompatable, and any change could break old programs. If the extended expression used "( ..|...)" as regular characters, and "\(...\|...\)" for specifying alternate patterns, then it is possible to have one set of regular expressions that has full functionality. This is exactly what GNU emacs does, by the way. 78
The rest of this is random notes. Regular Expression
Class
Type
Meaning
.
all
Character Set
A single character (except newline)
^
all
Anchor
Beginning of line
$
all
Anchor
End of line
[...]
all
Character Set Range of characters
*
all
Modifier
zero or more duplicates
\<
Basic
Anchor
Beginning of word
\>
Basic
Anchor
End of word
\(..\)
Basic
Backreference Remembers pattern
\1..\9
Basic
Reference
_+
Extended Modifier
One or more duplicates
?
Extended Modifier
Zero or one duplicate
\{M,N\}
Extended Modifier
M to N Duplicates
(...|...)
Extended Anchor
Shows alteration
\(...\|...\)
EMACS
Anchor
Shows alteration
\w
EMACS
Character set Matches a letter in a word
\W
EMACS
Character set Opposite of \w
_
Recalls pattern
_
79
Module 10 A Sample Shell Script
This visual shows another way of invoking a shell script. This method relies on the user first making the script an executable file with the chmod command. After this step the script can be invoked by its name. Note that the shell uses the PATH variable to find executable files. If you get an error message like the following, $ hello ksh: hello: not found check your PATH variable. The directory in which the shell script is stored must be defined in the PATH variable.
80
Each shell script is executed in a subshell. Variables defined in a shell script cannot be passed back to the parent shell. If you invoke a shell script with a . (dot), it runs in the current shell. Variables defined in this script (dir1, dir2) are therefore defined in the current shell.
Every process gives back an exit status to its parent process. Per convention 0 is given back when the process ended successfully and not equal 0 in all other cases. 81
To find out the exit code of a completed command, use echo $?: $ date $ echo $? 0 $_ This shows successful execution of the date command. The visual shows an example for an unsuccessful execution of a command. CONTROL CONSTRUCTS: The BourneShell control constructs can alter the flow of control within the script. The BourneShell provides simple two-way branch if statements and multiple-branch case statements, plus for, while, and until statements. In discussing these control structures, the BourneShell keywords will be in bold type and the normal type are the user supplied items to cause the desired effect in command format boxes.
Types of Tests Used with Control Constructs: The test utility evaluates expressions and returns a condition indicating whether or not the expression is true (equal to zero) or false (not equal to zero). There are no options with this utility. The format for this utility is as follows: Command Format: test expression expression - composed of constants, variables, and operators
Expressions will be looked at in greater detail later with some examples. There are a few items that need to be mentioned that apply to expressions. Expressions can contain one or more evaluation criteria that test will evaluate. A -a that separates two criteria is a logical AND operator. In this case, both criteria must evaluate to true in order for test to return a value of true. The -o is the logical OR operator. When this operator separates two criteria, one or the other (or both) must be true for test to return a true condition. You can negate any criterion by preceding it with an exclamation mark (!). Parentheses can be used to group criteria. If there are no parentheses, the -a (logical AND operator) takes precedence over the -o (logical OR operator). The test utility will evaluate operators of equal precedence from left to right.
82
Within the expression itself, you must put special characters, such as parentheses, in quote marks so the BourneShell will not evaluate them but will pass them to test. Since each element (evaluation criterion, string, or variable) in an expression is a separate argument, each must be separated by a space. The test utility will work from the command line but it is more often used in a script to test input or verify access to a file. Another way to do the test evaluation is to surround the expression with left and right brackets. A space character must appear after the left bracket and before the right bracket. test expression
= [ expression ]
Test on Numeric Values Test expressions can be in many different forms. The expressions can appear as a set of evaluation criteria. The general form for testing numeric values is: int1 op int2 This criterion is true if the integer int1 has the specified algebraic relationship to integer int2. The valid operators (op) are: -eq
equal
-ne
not equal
-gt
greater than
-lt -ge -le
less than greater than or equal less than or equal
Test on Character Strings
The evaluation criterion for character strings is similar to numeric comparisons. The general form is: string1 op string2 83
The operators (op) are: string1 = string2
true if string1 and string 2 are equal
string1 != string2
true if string1 and string2 are not equal
string1
true if string1 is not the null string
Sample Session:
$cat test_string number=1 numero=0001 if test $number = $numero then echo "String vals for $number and $numero are =" else echo "String vals for $number and $numero not =" fi if test $number -eq $numero then echo "Numeric vals for $number and $numero are =" else echo "Numeric vals for $number and $numero not =" fi $chmod 755 test_string $sh -x test_string number=1 numero=0001 + test 1 = 0001 + echo String vals for 1 and 0001 not = String vals for 1 and 0001 not = + test 1 -eq 0001 + echo Numeric vals for 1 and 0001 are = Numeric vals for 1 and 0001 are = $test_string String vals for 1 and 0001 not = Numeric vals for 1 and 0001 are = $ Test on File Types The test utility can be used to determine information about file types. All of the criterion can be found in Appendix B. A few of them are listed here: -r filename
true if filename exists and is readable
-w filename
true if filename exists and is writable 84
-x filename
true if filename exists and is executable
-f filename
true if filename exists and it is a plain file
-d filename
true if filename exists and it is a directory.
-s filename
true if filename exits and it contains information (has a size greater than 0 bytes)
Example: $test -d new_dir If new_dir is a directory, this criterion will evaluate to true. If it does not exist, then it will be false.
Taking Decisions using if then The format for this construct is: Command Format: if expression then commands fi
The if statement evaluates the expression and then returns control based on this status. The fi statement marks the end of the if, notice that fi is if spelled backward. The if statement executes the statements immediately following it if the expression returns a true status. If the return status is false, control will transfer to the statement following the fi. Sample Session:
$cat check_args if (test $# = 0) then echo 'Please supply at least 1 argument' exit fi echo 'Program is running' $
85
This little script will check to insure that you are giving at least one argument. If none are given it will display the error message and exit. If one or more arguments are given it will display "Program is running" and run the rest of the script, if any. Sample Session:
$check_args Please supply at least 1 argument $check_args xyz Program is running $
Taking Decision using if then else The format for this construct is: Command Format: if expression then commands else commands fi
The else part of this structure makes the single-branch if statement into a twoway branch. If the expression returns a true status, the commands between the then and the else statement will be executed. After these have been executed, control will start again at the statement after the fi. If the expression returns false, the commands following the else statement will be executed. Sample Session:
$cat test_string number=1 numero=0001 if test $number = $numero then echo "String values of $number and $numero are equal" else echo "String values of $number and $numero not equal" fi if test $number -eq $numero then echo "Numeric values of $number and $numero are equal" else echo "Numeric values of $number and $numero not equal" fi $
86
Taking Decision using if then elif The format for this construct is: Command Format: if expression then commands elif expression then commands else commands fi
The elif construct combines the else and if statements and allows you to construct a nested set of if then else structures.
The case control Structure The format for this construct is: Command Format: case test-string in pattern-1 ) commands-1 ;; pattern-2 ) commands-2 ;; pattern-3 ) commands-3 ;; . . . *) commands ;; esac
The case structure allows a multiple-branch decision mechanism. The path that is taken depends on a match between the test-string and one of the patterns. Sample Session: $cat case_ex echo 'Enter A, B, or C: \c' read letter case $letter in A) echo 'You entered A' ;; B) echo 'You entered B' ;; C) echo 'You entered C' ;; *) echo 'You did not enter A, B, or C' ;; esac $chmod a+x case_ex $case_ex 87
Enter A, B, or C: B You entered B $case_ex Enter A, B, or C: b You did not enter A, B, or C $
This example uses the value of a character that the user entered as the test string. The value is represented by the variable letter. If letter has the value of A, the structure will execute the command following A. If letter has a value of B or C, then the appropriate commands will be executed. The asterisk indicates any string of characters; and it, therefore, functions as a catchall for a no-match condition. The lowercase b in the second sample session is an example of a no match condition.
The Loop Control Structure The for Loop: The format for this construct is: Command Format: for loop-index in argument-list do commands done
This structure will assign the value of the first item in the argument list to the loop index and executes the commands between the do and done statements. The do and done statements indicate the beginning and end of the for loop. After the structure passes control to the done statement, it assigns the value of the second item in the argument list to the loop index and repeats the commands. The structure will repeat the commands between the do and done statements once for each argument in the argument list. When the argument list has been exhausted, control passes to the statement following the done. Sample Session:
$cat find_henry1 for x in project1 project2 project3 do grep henry $x done 88
Sample Session: $head project? ==> project1 project2 project3 project4 awk program [ file ] or UNIX> awk -f program-file [ file ] Like sed, awk can work on standard input or on a file. Like the shell, if you start an awk program with #!/bin/awk – f then you can execute the program directly from the shell. Most systems also have nawk, which stands for ``new awk.'' Nawk has many more features than awk and is generally more useful. I am just going to cover awk, but you should check out nawk too in your own time. Nawk has some nice things like a random number generator, that awk doesn't have. awk programs are composed of ``pattern-action'' statements of the form: pattern { action } What such a statement does is apply the action to all lines that match the pattern. If there is no pattern, then it applies the action to all lines. If there is 108
no action, then the default action is to copy the line to standard output. Patterns can be regular expressions enclosed in slashes (they can be more than that, but for now, just assume that they are regular expressions). So, for example, the program awkgrep works just like ``grep Jim''. UNIX> cat awkgrep #!/bin/awk -f /Jim/ UNIX> cat input Which of these lines doesn't belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX> awkgrep input Jimmy Carter UNIX> awkgrep < input Jimmy Carter UNIX> Basically look like C programs. There are some big differences, but for the most part, you can do most basic things that you can do in C. Awk breaks up each line into fields, which are basically whitespace-separated words. You can get at word i by specifying $i. The variable NF contains the number of words on the line. The variable $0 is the line itself. So, to print out the first and last words on each line, you can do: UNIX> cat input Which of these lines doesn't belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX> awk '{ print $1, $NF }' input Which belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX> 109
An alternative awkgrep prints out $0 when it finds the pattern: UNIX> cat awkgrep2 #!/bin/awk -f /Jim/ { print $0 } UNIX> awkgrep2 input Jimmy Carter UNIX> Awk has a printf just like C. You don't have to use parentheses when you call it (although you can if you'd like). Unlike print, printf will not print a newline if you don't want it to. So, for example, awkrev reverses the lines of a file: UNIX> cat awkrev #!/bin/awk -f { for (i = NF; i > 0; i-- ) printf "%s ", $i printf "\n" } UNIX> awkrev input belong: doesn't lines these of Which Clinton Bill Bush George Reagan Ronald Carter Jimmy Stallone Sylvester UNIX> A few things that you'll notice about awkrev: Actions can be multiline. You don't need semicolons to separate lines like in C. However, you can specify multiple commands on a line and separate them with semi-colons as in C. And you can block commands with curly braces as in C. If you want a command to span two lines (this often happens with complex printf statements), you need to end the first line with a backslash. Also, you'll notice that awkrev didn't declare the variable i. Awk just figured out that it's an integer.
Type casting Awk lets you convert variables from one type to another on the fly. For example, to convert an integer to a string, you simply use it as a string. String construction can be done with concatenation, which is often very convenient. These principles are used in awkcast: UNIX> echo "4 Jim" | awkcast Word 1: as a number: 4, as a string: 4. 0 appended: number: 40, string 40 Word 2: as a number: 0, as a string: Jim. 110
0 appended: number: 0, string Jim0 UNIX> Casting a string to an integer gives it its atoi() value. BEGIN and END There are two special patterns, BEGIN and END, which cause the corresponding actions to be executed before and after any lines are processed respectively. Therefore, the following program (awkwc) counts the number of lines and words in the input file. UNIX> cat awkwc #!/bin/awk -f BEGIN { nl = 0; nw = 0 } { nl++ ; nw += NF } END { print "Lines:", nl, "words:", nw } UNIX> awkwc awkwc Lines: 5 words: 26 UNIX> wc awkwc 5 26 103 awkwc UNIX> next and exit Awk tries to process each statement on each line. Unlike sed, there is no ``hold space.'' Instead, each statement is processed on the original version of each line. Two special commands in awk are next and exit. Next specifies to stop processing the current input line, and to go directly to the next one, skipping all the rest of the statements. Exit specifies for awk to exit immediately. Here are some simple examples. awkpo prints out only the odd numbered lines (note that this is an awkward way to do this, but it works): UNIX> cat awkpo #!/bin/awk -f BEGIN { ln=0 } { ln++ if (ln%2 == 0) next print $0 } UNIX> cat -n input 1 Which of these lines doesn't belong: 2 111
3 4 5 6 7
Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone
UNIX> cat -n input | awkpo 1 Which of these lines doesn't belong: 3 Bill Clinton 5 Ronald Reagan 7 Sylvester Stallone UNIX> awkptR prints out all lines until it reaches a lines with a capital R UNIX> cat awkptR #!/bin/awk -f /R/
{ exit } { print $0 }
UNIX> awkptR input Which of these lines doesn't belong: Bill Clinton George Bush UNIX> Arrays Arrays in awk are a little odd. First, you don't have to malloc() any storage -just use it and there it is. Second, arrays can have any indices -- integers, floating point numbers or strings. This is called ``associative'' indexing, and can be very convenient. You cannot have multi-dimensional arrays or arrays of arrays though. To simulate multidimensional arrays, you can just concatenate the indices. Take a look at awkgolf. This is typical of quick-and-dirty awk programs that you sometimes write to look at data. This one processes golf scores. Suppose you have some score files, as in the files usopen, masters, kemper and memorial. These files first have the name of the tournament in all caps, and then scores for a bunch of golfers. Suppose you'd like to see all the golfers with scores for each tournament in a readable form. This is what awkgolf does. Let's break it into its four parts. The first part is the BEGIN line: BEGIN { nt = 0 ; np = 0 }
112
This simply initializes two variables: nt is the number of tournaments, and np is the number of players. The next line looks a little cryptic: /^[A-Z]*$/ { this = $0; tourn[nt] = $0 ; nt++; next } This only works on lines that are all capital letters. These are the lines that identify tournaments. On these lines, it does the following:
Sets the this variable to be the tournament name. Puts the tournament's name into the tourn array. Increments nt variable. Skips the rest of the program and goes onto the next line.
The next part works on all lines that contain the pattern '--'. These are the lines with golfers' scores: /--/
{ golfer = $1 for (i = 2; $i != "--" ; i++) golfer = golfer" "$i if (isgolfer[golfer] != "yes") { isgolfer[golfer] = "yes" g[np] = golfer np++; } score[golfer" "this] = $(i+1) } The first two lines of this action set the golfer variable to be the golfer's name. Note that you can do string comparison in awk using standard boolean operators, unlike in C where you would have to use strcmp(). The next 5 lines use awk's associative arrays: The array isgolfer is checked to see if it contains the string ``yes'' under the golfer's name. If so, we have processed this golfer before. If not, we sed the golfer's entry in isgolfer to ``yes,'' set the np-th entry of the array g to be the golfer, and increment np. Finally, we set the golfer's score for the tournament in the score array. Note that we don't use double-indirection. Instead, we simply concatenate the golfer's name and the tournament's name, and use that as the index for the array. The last part of the program does the final formatting: END
{ printf("%-25s", " "); for (j = 0; j < nt; j++) printf("%9s", tourn[j]) printf("\n") for (i = 0; i < np; i++) { printf("%-25s", g[i]) for (j = 0; j < nt; j++) printf("%9s", score[g[i]" "tourn[j]]) 113
printf("\n") } } The first three lines print out 25 spaces, and then the names of the tournaments as held in the tourn array. Then we loop through each golfer, and print the golfer's name, padded to 25 characters, and then his score in each tournament. Note that if the golfer didn't play in the tournament, that entry of the tournament array will be the null string. This is quite convenient, because we don't have to test for whether the golfer played the tournament -we can just use awk's default values. Ok, lets try awkgolf: UNIX> awkgolf kemper # Note that the ouput is only sorted because its # sorted in the input file KEMPER Justin Leonard -10 Greg Norman -7 Nick Faldo -7 Nick Price -7 Loren Roberts -6 Jay Haas -5 Paul Stankowski -5 Lee Janzen -4 Phil Mickelson -4 Davis Love III -3 Tom Lehman 0 Vijay Singh 0 Kirk Triplett 1 Steve Jones 2 Mark O'Meara 5 Don Pooley missed Ernie Els missed Fred Couples missed Hal Sutton missed Jesper Parnevik missed Scott McCarron missed Steve Stricker missed UNIX> cat masters usopen kemper memorial | awkgolf MASTERS USOPEN KEMPER MEMORIAL Tiger Woods 281 6 5 Tommy Tolles 283 2 -11 Tom Watson 284 16 0 Paul Stankowski 285 6 -5 -3 Fred Couples 286 13 missed Davis Love III 286 5 -3 -7 Justin Leonard 286 9 -10 0 Steve Elkington 287 7 Tom Lehman 287 -2 0 -3 Ernie Els 288 -4 missed -1 114
Vijay Singh Jesper Parnevik Lee Westwood Nick Price Lee Janzen Jim Furyk Mark O'Meara Scott McCarron Scott Hoch Jumbo Ozaki Frank Nobilo Bob Tway Brad Faxon David Duval Greg Norman Loren Roberts Nick Faldo Phil Mickelson Steve Jones Steve Stricker Jay Haas Billy Andrade Hal Sutton Kirk Triplett Don Pooley UNIX>
288 21 0 -14 289 11 missed -4 291 6 291 6 -7 292 13 -4 -11 293 2 -12 294 9 5 -2 294 3 missed missed 298 3 -11 300 missed 303 9 -10 missed 2 -7 missed 17 2 missed 11 -5 missed missed -7 -12 missed 4 -6 missed 11 -7 missed 10 -4 missed 15 2 3 missed 9 missed -1 2 -5 -4 4 -7 6 missed -1 1 -2 missed -4
File indirection You can specify that the output of print and printf go to a file with indirection. For example, to copy standard input to the file f1 you could do: UNIX> awk '{print $0 > "f1"}' < input UNIX> cat f1 Which of these lines doesn't belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX> Awk without standard input Sometimes you just want to write a program that doesn't use standard input. To do this, you just write the whole program as a BEGIN statement, exiting at the end. 115
Multiline awk programs in the Bourne shell The Bourne shell lets you define multiline strings simply by putting newlines in the string (within single or double quotes, of course). This means that you can embed simple multiline awk scripts in a sh program without having to use cumbersome backslashes, or intermediate files. For example, shwc works just like awkwc, but works as a shell script rather than an awk program. UNIX> shwc awkwc Lines: 5 words: 26 UNIX> shwc < awkwc Lines: 5 words: 26 UNIX> shwc awkwc awkwc usage: shwc [ file ] UNIX> Awk's limitations Awk is useful for simple data processing. It is not useful when things get more complex for a few reasons. First, if your data file is huge, you'll do better to write a C program (using for example the fields library from CS302/360) because it will be more efficient sometimes by a factor of 60 or more. Second, once you start writing procedure calls in awk, it seems to me you may as well be writing C code. Third, you often find awk's lack of double indirection and string processing cumbersome and inefficient. Awk is not a good language for string processing. Irritatingly, it doesn't let you get at string elements with array operations. I.e. the following will fail: UNIX> cat sp.awk { s = $1 ; s[0] = 'a' ; print s } UNIX> awk -f sp.awk input awk: syntax error near line 1 awk: illegal statement near line 1 UNIX> Of course, sed is ideal for string processing, so often you can get what you want with a combination of sed and awk.
116
Module 15 Database Using Shell Scripts There are one or two facts about databases. If you know anything at all about databases you'll know everything that follows. 1. A database consists of one (or more) tables which consist of a sequence of identically structured rows or records. The rows (records) are subdivided into fields or columns. A schema is a table that describes a table or tables. 2. The data in a database is manipulated (updated, queried etc.,) using commands written in SQL (Structured Query Language). Many people seem to associate SQL with one particular database package, this is wrong, all well known database packages (Oracle, MySQL, MS Access, Postgres, MS SQL Server etc.,) support SQL although there may be minor differences. 3. Most database packages operate in a client/server fashion. The database server receives SQL requests via the net and returns results via the net. The results of such queries will, in general, be sets of rows or records. The database server is a permanently running programme in principle similar to a WWW server. One exception to this rule is MS Access which operates by direct manipulation of the host operating system files that hold the database tables. 4. How databases actually store their tables, schemas etc., varies from package to package and is, almost always, of no concern to the user. For information MS Access stores all the tables and schemas of a database in a single file whose name conventionally ends in the letters ".mdb". For each table MySQL maintains several Unix file system files, typically one for the data, one for the schema and one for the index. Oracle stores everything for all its databases in a group of 4-10 files that are built on top of the local file system.
A Shell Script (CGI Backend) #!/bin/sh PATH=$PATH:/usr/local/mysql/bin export PATH echo "Content-type: text/html" echo PLACE=`echo $QUERY_STRING | cut -d= -f2` echo "Shell Example #3" echo "Shell Example #3Results of database query for" echo $PLACE 117
echo "" echo "use mydatabase;" > /tmp/$$.sql echo "select latitude,longitude,easting,northing from gazetteer where feature = '$PLACE';" >> /tmp/$$.sql mysql -u demo < /tmp/$$.sql > /tmp/$$.res ROWS=`cat /tmp/$$.res | wc -l` if [ $ROWS -eq 0 ] then echo "No information for" $PLACE else echo "" tail +2 /tmp/$$.res | sed -e 's/^// s/ //g' echo "" fi echo "" rm /tmp/$$.* Actual database access is performed using the command line MySQL client programme. To ensure that this can be found the search path is modified by the second and third lines of the script. PATH=$PATH:/usr/local/mysql/bin export PATH The name of the location being queried is then extracted from the QUERY_STRING environment variable. The MySQL command line client can be used non-interactively by arranging for it to read SQL from its standard input, in this case using redirection from a file. The required SQL is constructed in a temporary file.
On a normal Unix system any user can create files in the directory /tmp, the symbol $$ in the file name is replaced by the current process identification number, this is always unique so avoids any problems with two instances of the back end running simultaneously. Here is a typicaly example of the contents of the SQL file. use mydatabase; select latitude,longitude,easting,northing from gazetteer where feature = 'Prague'; The output from the MySQL client is also written to a temporary file. Typical text is shown below (for a different query). latitude longitude easting northing 195180 -21240 145 487 190860 -8040 384 346 118
188820 -11160 325 197880 -5820 424
284 563
It will be noted that the output file includes column names and that columns are separated by TAB characters. The next step is to determine the number of lines in the output file, this will be zero if no matches have been found. This is done by arranging the for the standard Unix command wc to read the file and write the number of lines to its standard output. The code if [ $ROWS -eq 0 ] then echo "No information for" $PLACE else echo "" tail +2 /tmp/$$.res | sed -e 's/^// s/ //g' echo "" fi operates conditionally on the number of rows. The interesting case arises when the number of rows is non-zero. In this case the standard Unix command tail is used to transfer the file, less its first line, to the standard input of the standard Unix command sed. sed is the Unix non-interactive editor that is used here to modify the MySQL command line client output by
Inserting at the start of every line. Remember that the metacharacter ^ matches the start of a line in the regular expressions used by all Unix editors. Replacing all occurences of TAB characters by the string . The final g on the sed sub-command ensures that the substitution is global.
Note that the sed edit script, introduced by the sed command line argument -e spreads over two lines.
Simple File Creation: There are two simple ways to create another file, one uses the cat command in conjunction with the redirect symbol, the other way is to use the echo command in conjunction with the redirect symbol. The example Indented Cat is a good example of the cat method in the Pipes and Redirects section. This example only contains litteral text however. It is more appropriate to see something like the example below, which shows a variable being used in the source data block.
119
Example cat and variables cat >> $sql0 > $sql0 echo "SET FEEDBACK OFF" >> $sql0 echo "SET HEADING OFF" >> $sql0 echo "SELECT my_package.my_function($column)" >> $sql0 echo " FROM v\$database" >> $sql0 if [ "$db_type" = "m" ] then echo " WHERE name = '$db_name';" >> $sql0 else echo " WHERE name LIKE '%&1%';" >> $sql0 fi echo "EXIT" >> $sql0 sqlplus -s $uid/$password@database @$sql0 $sql_arg_1 > $log0 This is basically the same block except the WHERE clause has been hidden inside an if statement. Now, depending on the Database Type in the $db_type variable, the WHERE clause can take one of two forms. Conveniently, the additional argument which is not required by SQL*Plus in the first form, is ignored at execution time, even though it is still available on the last line. This is common with all scripts, arguments are only used if they are referenced from within the script. So there you have the first two ways of creating another file from a script. The version using cat can only cope with a single output form, the version using echo can output a multitude of forms depending on the complex command forms you use. The choice is yours. There are, however, other ways to create output files. You can use direct generation as in the example List to create a list of files. Or the indirect method shown in the example Counted List where lines are built inside a loop construct and then appended to the file to create a menu file. Or in the example Sorted List where a list of words is sorted into alphabetic order, duplicates are removed, then the rest stored in a file. Example list ls -1 *.log > $lst0 Example counted list count=1 121
for file in `ls -1 *.log` do echo "$count: $file" >> $mnu0 count=`expr $count + 1` done Example sorted list echo $@ | tr ' ' '\n' | sort -u > $lst0
122
Module 16 OVERVIEW OF PERL What is perl? Perl, sometimes referred to as Practical Extraction and Reporting Language, is an interpreted programming language with a huge number of uses, libraries and resources. Arguably one of the most discussed and used languages on the internet, it is often referred to as the swiss army knife, or duct tape, of the web. Perl was first brought into being by Larry Wall circa 1987 as a general purpose Unix scripting language to make his programming work simpler. Although it has far surpassed his original creation, Larry Wall still oversees development of the core language, and the newest version, Perl 6. Running Perl The simplest way to run a Perl program is to invoke the Perl interpreter with the name of the Perl program as an argument: perl sample.pl The name of the Perl file is sample.pl, and perl is the name of the Perl interpreter. This example assumes that Perl is in the execution path; if not, you will need to supply the full path to Perl too: /usr/local/hin/perl sample.pl This is the preferred way of invoking Perl because it eliminates the possibility that you might accidentally invoke a copy of Perl other than the one you intended. W e will use the full path from now on to avoid any confusion. This type of invocation is the same on all systems with a command-line interface. The following line will do the trick on Windows NT, for example: c:\NTperl\perl sample.pl
123
Invoking Perl on UNIX UNIX systems have another way to invoke an interpreter on a script file. Place a line like #!/usr/local/bin/perl at the start of the Perl file. This tells UNIX that the rest of this script file is to be interpreted by /usr/local/bin/perl. Then make the script itself executable: chmod +x sample.pl You can then "execute" the script file directly and let the script file tell the operating system what interpreter to use while running it. You can supply Perl command-line arguments on the interpreter invocation line in UNIX scripts. The following line is a good start to any Perl script: #!/usr/local/bin/perl -w -t A Perl Script A Perl program consists of an ordinary text file containing a series of Perl commands. Commands are written in what looks like a bastardized amalgam of C, shell script, and English. In fact, that's pretty much what it is. Perl code can be quite free-flowing. The broad syntactic rules governing where a statement starts and ends are
Leading white space is ignored. You can start a Perl statement anywhere you want: at the beginning of the line, indented for clarity (recommended), or even right-justified (definitely frowned on) if you like. Commands are terminated with a semicolon. White space outside of string literals is irrelevant; one space is as good as a hundred. That means you can split statements over several lines for clarity. Anything after a pound sign (#) is ignored. Use this to pepper your code with useful comments.
Here's a Perl statement: print "My name is Sreedhar\n"; No prizes for guessing what happens when Perl runs this code; it prints My name is Sreedhar
124
If the \n doesn't look familiar, don't worry; it simply means that Perl should print a newline character after the text; in other words, Perl should go to the start of the next line. Printing more text is a matter of either stringing together statements or giving multiple arguments to the print function: print "My name is Sreedhar,\n"; print "I live in Bangalore,\n", "I work in a Wipro there.\n"; That's right, print is a function. It may not look like it in any of the examples so far, where there are no parentheses to delimit the function arguments, but it is a function, and it takes arguments. You can use parentheses in Perl functions if you like; it sometimes helps to make an argument list clearer. More accurately, in this example the function takes a single argument consisting of an arbitrarily long list. We'll have much more to say about lists and arrays later, in the "Data Types" section. There will be a few more examples of the more common functions in the remainder of this chapter, but refer to the "Functions" chapter for a complete run-down on all of Perl's built-in functions. So what does a complete Perl program look like? Here's a trivial UNIX example, complete with the invocation line at the top and a few comments: #!/usr/local/bin/perl -w print "My name is Sreedhar,\n"; print "I live in Bangalore,\n", "I work in a Wipro there.\n"; the line breaks
# Show warnings # Let's introduce ourselves # Remember
That's not at all typical of a Perl program though; it's just a linear sequence of commands with no structural complexity. The "Flow Control" section later in this overview introduces some of the constructs that make Perl what it is. For now, we'll stick to simple examples like the preceding for the sake of clarity.
Exercise: 1. Write a shell script to modify all files in a directory. 2. Create a shell script to write to create a user screen, which will allow user to enter data in a file, delete a record, add a record, and also allow updating or querying the file.
125
Appendix A List of basic UNIX Commands: The basic UNIX commands include some of the most commonly used commands for users, and constructs for building shell scripts. The following charts offer a summary of some simple UNIX commands. These are certainly not all of the commands available in this robust operating system, but these will help you get started.
Ten ESSENTIAL UNIX Commands: These are ten commands that you really need to know in order to get started with UNIX. They are probably similar to commands you already know for another operating system. Command
Example
Description
1.
ls
ls ls -alF
Lists files in current directory List in long format
2.
cd
cd tempdir cd .. cd ~dhyatt/web-docs
Change directory to tempdir Move back one directory Move into dhyatt's web-docs directory
3. mkdir
mkdir graphics
Make a directory called graphics
4.
rmdir
rmdir emptydir
Remove directory (must be empty)
5.
cp
cp file1 web-docs cp file1 file1.bak
Copy file into directory Make backup of file1
6.
rm
rm file1.bak rm *.tmp
Remove or delete file Remove all file
7.
mv
mv old.html new.html
Move or rename files
8.
more
more index.html
Look at file, one page at a time
9.
lpr
lpr index.html
Send file to printer
man ls
Online manual (help)
10. man
126
Ten VALUABLE UNIX Commands: Once you have mastered the basic UNIX commands, these will be quite valuable in managing your own account. Command
Example
Description
1. grep
grep "bad word" *
Find which files contain a certain word
2. chmod
chmod 644 *.html chmod 755 file.exe
Change file permissions read only Change file permissions to executable
3.
passwd
passwd
Change passwd
4.
ps
ps aux ps aux | grep dhyatt
List all running processes by #ID List process #ID's running by dhyatt
5.
kill
kill -9 8453
Kill process with ID #8453
6. gcc (g++)
gcc file.c -o file g++ fil2.cpp -o fil2
Compile a program written in C Compile a program written in C++
7.
gzip
gzip bigfile gunzip bigfile.gz
Compress file Uncompress file
8.
mail (pine)
mail
[email protected] < file1 pine
Send file1 by email to someone Read mail using pine
9.
telnet ssh
telnet vortex.tjhsst.edu ssh -l dhyatt jazz.tjhsst.edu
Open a connection to vortex Open a secure connection to jazz as user dhyatt
ftp station1.tjhsst.edu ncftp metalab.unc.edu
Upload or Download files to station1 Connect to archives at UNC
10. ftp ncftp
127
Ten FUN UNIX Commands: These are ten commands that you might find interesting or amusing. They are actually quite helpful at times, and should not be considered idle entertainment.
Command
Example
Description
1.
who
who
Lists who is logged on your machine
2.
finger
finger
Lists who is on computers in the lab
3. ytalk
ytalk dhyatt@threat
Talk online with dhyatt who is on threat
4.
history
history
Lists commands you've done recently
5.
fortune
fortune
Print random humerous message
6.
date
date
Print out current date
7. cal
cal 9 2000
Print calendar for September 2000
8.
xeyes
xeyes &
Keep track of cursor (in "background")
9.
xcalc
xcalc &
Calculator ("background" process)
mpage -8 file1 | lpr
Print 8 pages on a single sheet and send to printer (the font will be small!)
10. mpage
128
Ten HELPFUL UNIX Commands These ten commands are very helpful, especially with graphics and word processing type applications. Command
Example
Description
1.
netscape
netscape &
Run Netscape browser
2.
xv
xv &
Run graphics file converter
3.
xfig / xpaint
xfig & (xpaint &)
Run drawing program
4.
gimp
gimp &
Run photoshop type program
5.
ispell
ispell file1
Spell check file1
6.
latex
latex file.tex
Run LaTeX, a scientific document tool
7.
xemacs / pico
xemacs (or pico)
Different editors
8.
soffice
soffice &
Run StarOffice, a full word processor
9. m-tools (mdir, mcopy, mdel, mformat, etc. )
mdir a: mcopy file1 a:
DOS commands from UNIX (dir A:) Copy file1 to A:
10. gnuplot
gnuplot
Plot data graphically
129
Ten USEFUL UNIX Commands: These ten commands are useful for monitoring system access, or simplifying your own environment. Command
Example
Description
1.
df
df
See how much free disk space
2.
du
du -b subdir
Estimate disk usage of directory in Bytes
3.
alias
alias lls="ls -alF"
Create new command "lls" for long format of ls
4.
xhost
xhost + threat.tjhsst.edu xhost -
Permit window to display from x-window program from threat Allow no x-window access from other systems
5.
fold
fold -s file1 | lpr
Fold or break long lines at 60 characters and send to printer
6.
tar
tar -cf subdir.tar subdir tar -xvf subdir.tar
Create an archive called subdir.tar of a directory Extract files from an archive file
7. ghostview (gv)
gv filename.ps
View a Postscript file
8. ping (traceroute)
ping threat.tjhsst.edu traceroute www.yahoo.com
See if machine is alive Print data path to a machine
9.
top
Print system usage and top resource hogs
logout or exit
How to quit a UNIX shell.
top
10. logout (exit)
130