January 27, 2017 | Author: Juan Luis Leiva Torres | Category: N/A
Calculus J.M. Ward MT1174, 2790174
2011
Undergraduate study in Economics, Management, Finance and the Social Sciences This subject guide is for a 100 course offered as part of the University of London International Programmes in Economics, Management, Finance and the Social Sciences. This is equivalent to Level 4 within the Framework for Higher Education Qualifications in England, Wales and Northern Ireland (FHEQ). For more information about the University of London International Programmes undergraduate study in Economics, Management, Finance and the Social Sciences, see: www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by: J.M. Ward, Department of Mathematics, London School of Economics and Political Science. This is one of a series of subject guides published by the University. We regret that due to pressure of work the author is unable to enter into any correspondence relating to, or arising from, the guide. If you have any comments on this subject guide, favourable or unfavourable, please use the form at the back of this guide.
University of London International Programmes Publications Office Stewart House 32 Russell Square London WC1B 5DN United Kingdom Website: www.londoninternational.ac.uk Published by: University of London © University of London 2011 The University of London asserts copyright over all material in this subject guide except where otherwise indicated. All rights reserved. No part of this work may be reproduced in any form, or by any means, without permission in writing from the publisher. We make every effort to contact copyright holders. If you think we have inadvertently used your copyright material, please let us know.
Contents
Contents Preface
1
1 Introduction
3
1.1
This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.3.1
The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.3.2
Making use of the Online Library . . . . . . . . . . . . . . . . . .
7
1.4
Using this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.5
Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.6
The use of calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2 Functions 2.1
9
Introduction: What is a function? . . . . . . . . . . . . . . . . . . . . . .
9
2.1.1
Some elementary functions and their graphs . . . . . . . . . . . .
11
2.1.2
Combinations of functions . . . . . . . . . . . . . . . . . . . . . .
15
2.1.3
Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.1.4
Identities
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.1.5
Applications of functions . . . . . . . . . . . . . . . . . . . . . . .
26
Conic sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.2.1
Parabolae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.2.2
Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2.2.3
Ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
2.2.4
Hyperbolae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
2.2
3 Differentiation 3.1
Introduction: What is differentiation? . . . . . . . . . . . . . . . . . . . .
53 53
i
Contents
3.2
How to find derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
3.2.1
Standard derivatives . . . . . . . . . . . . . . . . . . . . . . . . .
56
3.2.2
The rules of differentiation . . . . . . . . . . . . . . . . . . . . . .
57
3.2.3
Higher-order derivatives . . . . . . . . . . . . . . . . . . . . . . .
65
Using derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
3.3.1
The meaning of the derivative . . . . . . . . . . . . . . . . . . . .
66
3.3.2
Tangent lines and linear approximations . . . . . . . . . . . . . .
68
3.3.3
Applications of derivatives . . . . . . . . . . . . . . . . . . . . . .
72
3.3.4
Existence of derivatives . . . . . . . . . . . . . . . . . . . . . . . .
74
Using higher-order derivatives . . . . . . . . . . . . . . . . . . . . . . . .
78
3.4.1
Maclaurin series . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
3.4.2
Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
3.3
3.4
4 One-variable optimisation 4.1
Introduction: What is optimisation? . . . . . . . . . . . . . . . . . . . . .
103
4.2
Using first-order derivatives . . . . . . . . . . . . . . . . . . . . . . . . .
104
4.2.1
Increasing and decreasing functions . . . . . . . . . . . . . . . . .
104
4.2.2
Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . .
106
4.2.3
An application: Elasticities revisited . . . . . . . . . . . . . . . . .
109
Using second-order derivatives . . . . . . . . . . . . . . . . . . . . . . . .
110
4.3.1
Second-derivatives and stationary points . . . . . . . . . . . . . .
110
4.3.2
Convex and concave functions . . . . . . . . . . . . . . . . . . . .
111
4.3.3
Points of inflection . . . . . . . . . . . . . . . . . . . . . . . . . .
113
Curve sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
114
4.4.1
Sketching curves defined by polynomials . . . . . . . . . . . . . .
115
4.4.2
Sketching curves defined using other elementary functions . . . .
119
4.4.3
Asymptotes and cusps . . . . . . . . . . . . . . . . . . . . . . . .
121
Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123
4.5.1
Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . .
125
4.5.2
What happens when differentiability fails? . . . . . . . . . . . . .
126
4.5.3
Applications of optimisation . . . . . . . . . . . . . . . . . . . . .
127
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
130
4.3
4.4
4.5
ii
103
Contents
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
131
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
136
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
138
5 Integration
145
5.1
Introduction: What is integration? . . . . . . . . . . . . . . . . . . . . . .
145
5.2
How to find indefinite integrals . . . . . . . . . . . . . . . . . . . . . . .
147
5.2.1
Standard integrals . . . . . . . . . . . . . . . . . . . . . . . . . .
147
5.2.2
The basic rules of integration . . . . . . . . . . . . . . . . . . . .
149
5.2.3
Integration by substitution . . . . . . . . . . . . . . . . . . . . . .
150
5.2.4
Integration by parts
. . . . . . . . . . . . . . . . . . . . . . . . .
158
5.2.5
Using partial fractions to simplify integrands . . . . . . . . . . . .
162
5.2.6
Using trigonometric identities to simplify integrands . . . . . . . .
167
5.3
. . . . . . . . . . . . . . . . . . . . . . . . .
170
5.3.1
Definite integrals and what they represent . . . . . . . . . . . . .
170
5.3.2
Definite integrals and the other rules of integration . . . . . . . .
178
Applications of integrals . . . . . . . . . . . . . . . . . . . . . . . . . . .
182
5.4.1
Marginal functions revisited . . . . . . . . . . . . . . . . . . . . .
182
5.4.2
Consumer and producer surpluses . . . . . . . . . . . . . . . . . .
183
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
186
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
187
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
196
5.4
Definite integrals and areas
6 Functions of several variables
201
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
201
6.2
Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
202
6.2.1
Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203
6.2.2
Contours and sections . . . . . . . . . . . . . . . . . . . . . . . .
204
Partial differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
210
6.3.1
Sections and partial derivatives . . . . . . . . . . . . . . . . . . .
211
6.3.2
Finding partial derivatives . . . . . . . . . . . . . . . . . . . . . .
212
6.3.3
The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
214
6.3.4
An application: Homogeneous functions . . . . . . . . . . . . . . .
220
6.3.5
Second-order partial derivatives . . . . . . . . . . . . . . . . . . .
224
Using partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . .
226
6.3
6.4
iii
Contents
6.4.1
Tangent planes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
226
6.4.2
Gradient vectors . . . . . . . . . . . . . . . . . . . . . . . . . . .
230
6.4.3
Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . .
232
6.4.4
Implicitly defined functions of two variables . . . . . . . . . . . .
234
6.4.5
Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
238
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
242
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
255
7 Two-variable optimisation 7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
261
7.2
Unconstrained optimisation . . . . . . . . . . . . . . . . . . . . . . . . .
261
7.2.1
Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . .
262
7.2.2
Classifying stationary points . . . . . . . . . . . . . . . . . . . . .
264
7.2.3
Convex and concave functions . . . . . . . . . . . . . . . . . . . .
269
7.2.4
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
272
Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . .
275
7.3.1
Finding optimal points on the boundary of a region . . . . . . . .
277
7.3.2
The method of Lagrange multipliers . . . . . . . . . . . . . . . . .
279
7.3.3
The meaning of the Lagrange multiplier . . . . . . . . . . . . . .
282
7.3.4
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
284
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
289
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
290
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
294
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
296
7.3
8 Differential equations
303
8.1
Introduction: What is a differential equation? . . . . . . . . . . . . . . .
303
8.2
First-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
306
8.2.1
Separable first-order ODEs . . . . . . . . . . . . . . . . . . . . . .
307
8.2.2
Linear first-order ODEs . . . . . . . . . . . . . . . . . . . . . . .
308
8.2.3
Homogeneous first-order ODEs . . . . . . . . . . . . . . . . . . .
310
Second-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
312
8.3.1
Homogeneous second-order ODEs . . . . . . . . . . . . . . . . . .
312
8.3.2
Non-homogeneous second-order ODEs . . . . . . . . . . . . . . .
314
8.3
iv
261
Contents
8.4
Systems of first-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . .
318
8.4.1
Simple systems of first-order ODEs . . . . . . . . . . . . . . . . .
319
8.4.2
Other systems of first-order ODEs . . . . . . . . . . . . . . . . . .
321
Applications of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
323
8.5.1
Determining demand functions from elasticities . . . . . . . . . .
323
8.5.2
Continuous price adjustment . . . . . . . . . . . . . . . . . . . . .
324
8.5.3
Continuous cash flows . . . . . . . . . . . . . . . . . . . . . . . .
325
8.5.4
Market trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
327
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
327
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
328
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
334
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
334
8.5
A Sample examination paper
339
B Solutions to the sample examination paper
341
v
Contents
vi
Preface This subject guide is not a course text. It sets out a logical sequence in which to study the topics in this subject. Where coverage in the main texts is weak, it provides some additional background material. I am grateful to Mark Baltovic for his careful reading of a draft of this guide and for his many helpful comments.
1
Preface
2
1
Chapter 1 Introduction In this very brief introduction, we aim to give you an idea of the nature of this subject and to advise you on how best to approach it. We give general information about the contents and use of this subject guide, and on recommended reading and how to use the textbooks.
1.1
This subject
Calculus, as studied in this Level 1 course is primarily the study of derivatives and integrals of functions of one variable and partial derivatives of functions of several variables. Our approach here is not just to help you acquire proficiency in techniques and methods, but also to help you understand some of the theoretical ideas behind these. For example, after completing this course, you will hopefully understand why the derivatives of a function allow you to determine where a function of one variable is optimised. In addition to this, we try to indicate the uses of some of the methods in applications to economics, finance and related disciplines. Aims of the course The broad aims of this course are as follows: to enable students to acquire skills in the methods of calculus (including multivariate calculus), as required for their use in further mathematics subjects and economics-based subjects; to prepare students for further courses in mathematics and/or related disciplines. As emphasised above, however, we do also want you to understand why certain methods work: this is one of the ‘skills’ that you should acquire. Indeed, the examination will not simply test your ability to perform routine calculations, it will also probe your knowledge and understanding of the principles that underlie the material. Learning outcomes We now state the broad learning outcomes of this course, as a whole. At the end of this course and having completed the essential reading and activities, you should be able to:
3
1
1. Introduction
use the concepts, terminology, methods and conventions covered in the course to solve mathematical problems in this subject; solve unseen mathematical problems involving understanding of these concepts and application of these methods; see how calculus can be used to solve problems in economics and related subjects; demonstrate knowledge and understanding of the underlying principles of calculus. There are a couple of things that we should stress at this point. Firstly, note the intention that you will be able to solve unseen problems. This means simply that you will be expected to be able to use your knowledge and understanding of the material to solve problems that are not completely standard. This is not something you should worry unduly about: all topics in mathematics expect this, and you will never be expected to do anything that cannot be done using the material of this course. Secondly, we expect you to be able to ‘demonstrate knowledge and understanding’ and you might well wonder how you would demonstrate this in the examination. Well, it is precisely by being able to grapple successfully with unseen, non-routine, questions that you will indicate that you have a proper understanding of the topic. Topics covered Descriptions of the topics to be covered appear in the relevant chapters. However, it is useful to give a brief overview at this stage. We start by revising some of the basic ideas that are needed for the study of this course and, in particular, the idea of a function of one variable. We then introduce derivatives of such functions and how to find them using the techniques of differentiation. This enables us to see how such functions are behaving and, in particular, enables us to see where such functions are optimised. We then introduce integrals of such functions and how to find them using the techniques of integration. In particular, this will enable us to see how to relate functions to areas. We then introduce functions of several variables and develop techniques for finding their partial derivatives. In particular, we will see how we can use these ideas to see where these slightly more complicated functions are optimised. Lastly, we introduce the idea of a differential equation and examine methods for solving them. Throughout this subject guide, the emphasis will be on the theory as much as on the methods. That is to say, our aim in this subject is not only to provide you with some useful techniques and methods from calculus, but to also enable you to understand why these techniques work.
1.2
Reading
There are many books that would be useful for this subject. We recommend two in particular, and a couple of others for additional, further reading. (You should note, however, that there are very many books suitable for this course. Indeed, almost any text on first-year university calculus will cover the majority of the material.)
4
1.2. Reading
Textbook reading is essential as textbooks will provide you with more in-depth explanations than you will find in this subject guide, and they will also provide many more examples to study and exercises to work through. The books listed are the ones we have referred to in this guide. Essential reading Detailed reading references in this subject guide refer to the editions of the set textbooks listed below. New editions of one or more of these textbooks may have been published by the time you study this course. You can use a more recent edition of any of the books; use the detailed chapter and section headings and the index to identify relevant readings. Also check the virtual learning environment (VLE) regularly for updated guidance on readings. + Binmore, K. and J. Davies Calculus: Concepts and Methods. (Cambridge: Cambridge University Press, 2002, second revised edition) [ISBN 9780521775410]. + Anthony, M. and N. Biggs Mathematics for economics and finance: methods and modelling. (Cambridge: Cambridge University Press, 1996) [ISBN 9780521559133]. By and large we will be following Binmore and Davies but, sometimes, we will follow the simpler treatment found in Anthony and Biggs. Both texts, when used wisely, will provide you with a large number of examples for you to study and exercises for you to attempt. It is recommended that you purchase both of these. Another thing you might like to bear in mind is that some of the material from Binmore and Davies that we omit here will be useful if you go on to study 176 Further Calculus. Further reading Once you have covered the essential reading you are then free to read around the subject area in any text, paper or online resource. You will need to support your learning by reading as widely as possible and by thinking about how these principles apply in the real world. To help you read extensively, you have free access to the VLE and University of London Online Library (see Section 1.3.2). However, two useful textbooks that we have referred to in this subject guide are the following. + Simon, C.P. and L. Blume Mathematics for economists. (New York and London: W.W. Norton and Company, 1994) [ISBN 9780393957334]. + Adams, R.A. and C. Essex Calculus: a complete course. (Toronto: Pearson, 2010, seventh edition) [ISBN 9780321549280]. Simon and Blume is a useful supplementary text with an emphasis on applications of the material to economics; whereas Adams and Essex (which is merely an example from a large range of very similar calculus textbooks) is a detailed calculus textbook which contains much material which is beyond the scope of this course. Both of these texts are suitable as sources of additional explanation, examples and exercises, but they are probably not worth purchasing.
5
1
1
1. Introduction
1.3
Online study resources
In addition to the subject guide and the essential reading, it is crucial that you take advantage of the study resources that are available online for this course, including the virtual learning environment (VLE) and the Online Library. You can access the VLE, the Online Library and your University of London email account via the Student Portal at http://my.londoninternational.ac.uk You should receive your login details in your study pack. If you have not, or you have forgotten your login details, please email
[email protected] quoting your student number.
1.3.1
The VLE
The VLE, which complements this subject guide, has been designed to enhance your learning experience, providing additional support and a sense of community. It forms an important part of your study experience with the University of London and you should access it regularly. The VLE provides a range of resources for EMFSS courses: Self-testing activities: Doing these allows you to test your own understanding of subject material. Electronic study materials: The printed materials that you receive from the University of London are available to download, including updated reading lists and references. Past examination papers and Examiners’ commentaries: These provide advice on how each examination question might best be answered. A student discussion forum: This is an open space for you to discuss interests and experiences, seek support from your peers, work collaboratively to solve problems and discuss subject material. Videos: There are recorded academic introductions to the subject, interviews and debates and, for some courses, audio-visual tutorials and conclusions. Recorded lectures: For some courses, where appropriate, the sessions from previous years’ Study Weekends have been recorded and made available. Study skills: Expert advice on preparing for examinations and developing your digital literacy skills. Feedback forms. Some of these resources are available for certain courses only, but we are expanding our provision all the time and you should check the VLE regularly for updates.
6
1.4. Using this guide
1.3.2
Making use of the Online Library
The Online Library contains a huge array of journal articles and other resources to help you read widely and extensively. To access the majority of resources via the Online Library at http://tinyurl.com/ollathens you will either need to use your University of London Student Portal login details, or you will be required to register and use an Athens login. The easiest way to locate relevant content and journal articles in the Online Library is to use the Summon search engine. If you are having trouble finding an article listed in a reading list, try removing any punctuation from the title, such as single quotation marks, question marks and colons. For further advice, please see the online help pages at www.external.shl.lon.ac.uk/summon/about.php
1.4
Using this guide
We have already mentioned that this guide is not a textbook. It is important that you read textbooks in conjunction with the guide and that you try problems from the textbooks. The exercises at the end of the main chapters of this subject guide are a very useful resource and you should try them once you think you have mastered the material from the chapter. You should really try these exercises before consulting the solutions, as simply reading the solutions provided will not help you at all. Sometimes, the solutions we provide will just be an overview of what is required, i.e. an indication of how you should answer the questions, but in the examination, you must always show all of your calculations. It is vital that you develop and enhance your problem-solving skills and the only way to do this is to try lots of exercises.
1.5
Examination advice
Important: the information and advice given here are based on the examination structure used at the time this guide was written. Please note that subject guides may be used for several years. Because of this we strongly advise you to always check both the current Regulations for relevant information about the examination, and the virtual learning environment (VLE) where you should be advised of any forthcoming changes. You should also carefully check the rubric/instructions on the paper you actually sit and follow those instructions. Remember, it is important to check the VLE for: Up-to-date information on examination and assessment arrangements for this course.
7
1
1
1. Introduction
Where available, past examination papers and Examiners’ commentaries for the course which give advice on how each question might best be answered. This course is assessed by a three hour unseen written examination. There are no optional topics in this subject: you should study them all and this is reflected in the structure of the examination paper. There are five questions (each worth 20 marks) and all questions are compulsory. A sample examination paper may be found in an appendix to this subject guide. Please do not think that the questions in your real examination will necessarily be very similar to the exercises in this subject guide or those in the sample examination paper. The examination is designed to test you. You will get examination questions unlike the questions in this subject guide. The whole point of examining is to see whether you can apply your knowledge in familiar and unfamiliar settings. The Examiners (nice people though they are) have an obligation to surprise you! For this reason, it is important that you try as many examples as possible, from the subject guide and from the textbooks. This is not so that you can cover any possible type of question the Examiners can think of! It is so that you get used to confronting unfamiliar questions, grappling with them, and finally coming up with the solution. Do not panic if you cannot completely solve an examination question. There are many marks to be awarded for using the correct approach or method.
1.6
The use of calculators
You will not be permitted to use calculators of any type in the examination. This is not something that you should worry about: the Examiners are interested in assessing that you understand the key concepts, ideas, methods and techniques, and will set questions which do not require the use of a calculator.
8
Chapter 2 Functions
2
Essential reading (For full publication details, see Chapter 1.) Binmore and Davies (2002) Sections 2.1–2.6, 2.14 and part of 7.1.2. Anthony and Biggs (1996) Chapters 1, 2 and parts of 7. Further reading Simon and Blume (1994) Sections 2.1, part of 2.2, 5.1, 5.3, and 5.4, Appendices A1.1, parts of A1.2 and A2.1–6. Adams and Essex (2010) Preliminaries parts of P.1–P.7, parts of Sections 3.1–3.3 and 3.5. Aims and objectives The objectives of this chapter are as follows. To introduce functions in general and the elementary functions and their graphs in particular. To see how to find combinations of functions and the inverse of a function (if it exists). To see how functions can be used in economics-based subjects. To introduce conic sections and see how to draw them. Specific learning outcomes can be found near the end of this chapter.
2.1
Introduction: What is a function?
NOTE: Before you start this chapter, you should make sure that you have covered the background material in Chapter 1 of 173 Algebra. Given two sets A and B, a function, f , from A to B is a rule which takes each element of A and gives us a unique (or exactly one) element of B. We often express the fact that ‘the function f takes elements from A and gives us elements of B’ by writing
9
2. Functions
‘f : A → B’. In such cases, we call the sets A and B the domain and co-domain of the function respectively.
2
One way of visualising a function f : A → B is to think of it as a ‘black box’ that takes any x ∈ A, the domain, and applies the rule given by f to it to get the unique output f (x) ∈ B, the co-domain, i.e. x ∈ A −→ f −→ f (x) ∈ B. Here, we have used x to denote the independent variable as we are free to choose any element, x ∈ A from the domain. But, of course, the choice of x here is not essential as it is just a ‘dummy variable’ — we could have used any other letter instead and said that the function f : A → B is a ‘black box’ that takes any p ∈ A and applies the rule given by f to it to get the unique output f (p) ∈ B.
It is often convenient to introduce another variable, called the dependent variable, to stand for the elements of B that the function f : A → B gives us. For instance, we could say that this function takes any x ∈ A, the domain, and applies the rule given by f to it to get the unique output y ∈ B, the co-domain, where y = f (x). Of course, here the independent variable, x, via the rule given by f , will determine the value of y and this is why we think of y as the dependent variable. For now, we will only be interested in functions whose domain and co-domain are certain sets of real numbers. In particular, they will either be R itself or certain subsets of R called intervals. Typically, we think of R as the points on a line so that intervals are described by line segments. Indeed, for a, b ∈ R, we will have finite intervals like (a, b) = {x ∈ R | a < x < b}
and
[a, b] = {x ∈ R | a ≤ x ≤ b},
which only differ according to whether the end-points, i.e. the elements a and b, are in the set. Of course, we can also have finite intervals where one end-point, but not the other, is in the set and we denote these by (a, b] = {x ∈ R | a < x ≤ b}
and
[a, b) = {x ∈ R | a ≤ x < b}.
There are also infinite intervals which will have one finite end-point, say a ∈ R, and we denote these by (−∞, a] = {x ∈ R | x ≤ a}
and
[a, ∞) = {x ∈ R | a ≤ x},
and
(a, ∞) = {x ∈ R | a < x},
if this finite end-point is in the set, or by (−∞, a) = {x ∈ R | x < a}
if it isn’t. Of course, as we can see by looking at the sets involved when writing these infinite intervals, the symbols ‘∞’ and ‘−∞’ are not end-points as they are not real numbers, they are just a notational convenience. Putting these ideas together, we find that another way of visualising a function f : A → B is its graph which is the set of all points (x, y) ∈ R2 such that y = f (x). Indeed, as a function f : A → B must give a unique output y ∈ B for each x ∈ A, its graph could look like the one illustrated in Figure 2.1(a) but not like the one in Figure 2.1(b).
10
2.1. Introduction: What is a function?
y
y c
2
c
O
O
a
x y = f (x)
b
a x
b
(a)
(b)
Figure 2.1: In (a) we have the graph of a function f : [0, a] → [b, c] as each input, x ∈ [0, a],
gives a unique output y ∈ [b, c]. In (b), we do not have the graph of a function from [0, a] to [b, c] as each input, x ∈ [0, a], gives two outputs y ∈ [b, c].
2.1.1
Some elementary functions and their graphs
We now revise some elementary functions that will be useful in this course and look at their graphs. Power functions A power function is a function f : R → R given by f (x) = xn ,
where n ∈ N. Depending on the value of n, the graphs of these functions look very much like the ones illustrated in Figure 2.2. In addition to this, we also include the power function f (x) = x0 = 1 as the function whose graph is a horizontal straight line that goes through the point (0, 1). y
y
y
y = xn
y=x O
O
x
y= O
(a) n = 1
(b) n is even
xn
x
x
(c) n ≥ 3 is odd
Figure 2.2: (a) When n = 1, the graph of the function f (x) = xn is just the straight line
y = x. (b) The graph of the function f (x) = xn when n is even. (c) The graph of the function f (x) = xn when n ≥ 3 is odd. Of course, in (b) and (c) we are only looking at the shape of the graph for different values of n without any regard to the scales on the axes. In particular, if we let x → ∞ mean that x is positive and getting arbitrarily large (i.e. we are considering what happens as x takes values far to the right on the x-axis) and
11
2. Functions
x → −∞ means that x is negative but getting arbitrarily large in magnitude (i.e. we are considering what happens as x takes values far to the left on the x-axis), we see that:
2
If n is even, xn → ∞ as x → ∞ and as x → −∞. If n is odd, xn → ∞ as x → ∞ whereas xn → −∞ as x → −∞.
This insight will be important in Section 4.4 when we consider how to sketch the graphs of more complicated functions. Exponential functions An exponential function with base a is a function f : R → (0, ∞) given by f (x) = ax , where a 6= 1 is a positive real number. Depending on the value of a, the graphs of these functions look very much like the ones illustrated in Figure 2.3. y
y
1
y = ax
O
1
y = ax
O
x
(a) 0 < a < 1
x
(b) a > 1
Figure 2.3: (a) The graph of the function f (x) = ax when 0 < a < 1. (b) The graph of the
function f (x) = ax when a > 1. Of course, in both of these graphs we are only looking at the shape of the graph for different values of a without any regard to the scales on the axes. Indeed, looking at these graphs we see that If 0 < a < 1, ax → 0 as x → ∞ and ax → ∞ as x → −∞. If a > 1, ax → ∞ as x → ∞ and ax → 0 as x → −∞.
And, as a0 = 1 for any positive a 6= 1, the graphs of these functions always go through the point (0, 1). Trigonometric functions The two elementary trigonometric functions that we will be using are the sine and cosine functions but, unlike what you may have seen before, we will always be using them for angles that are given in radians instead of degrees. As you may know, we can easily convert between these two units by using the formula angle in radians =
12
2π × angle in degrees, 360
2.1. Introduction: What is a function?
so, in particular, 360◦ is 2π radians, 180◦ is π radians and 90◦ is π/2 radians. Then, measuring θ in radians, we can define the sine and cosine functions for 0 ≤ θ ≤ π/2 by using the right-angled triangle in Figure 2.4 to get opposite hypotenuse
and
s nu e t po
e
hy θ adjacent
cos θ =
adjacent . hypotenuse
opposite
sin θ =
Figure 2.4: Defining the sine and cosine functions, sin θ and cos θ, for 0 ≤ θ ≤ π/2.
In particular, by considering the two special triangles in Figure 2.5, we can see that the values of these functions for some common angles (in radians) are θ sin θ cos θ
π 6 1 √2 3 2
π 4 1 √ 2 1 √ 2
π √3 3 2 1 2
Activity 2.1 Recall that we also have the tangent function which, for 0 ≤ θ ≤ π/2, can be defined by using the right-angled triangle in Figure 2.4 to get tan θ =
opposite . adjacent
√ 2
Use the triangles in Figure 2.5 to find the values of tan θ when θ is π/6, π/4 and π/3 radians. Incidentally, what are these three angles in degrees?
π/4
π 4 1
2
π/6
1
√
(a)
(b)
π 3 1
3
Figure 2.5: Finding sin θ and cos θ when (a) θ = π/4 radians and (b) when θ = π/6 or
θ = π/3 radians. At this point, we’ll stop saying that an angle is in radians as, unless explicitly stated otherwise, this will always be the case.
13
2
2. Functions
2
If we want to extend the definition of the sine and cosine functions to 0 ≤ θ ≤ 2π, we think of a unit circle and a triangle with an hypotenuse of 1 as illustrated in Figure 2.6(a) which, for 0 ≤ θ ≤ π/2 gives us a point (x, y) with x = cos θ
and
y = sin θ,
which can be found as before. But, if we now have π/2 ≤ θ ≤ 2π, we get the situation illustrated in Figure 2.6(b), where we can find the magnitude of x and y using our original triangle and their sign by considering where the point lies in the (x, y)-plane. For instance, in Figure 2.6(b), the angle θ could be 5π/4 and so the angle in the triangle y
y (x, y) 1 O θ
θ x
O
1
x
(x, y)
(a)
(b)
Figure 2.6: Finding sin θ and cos θ when 0 ≤ θ ≤ 2π by considering a unit circle.
− π = π4 as the angle subtended by a straight line — in this case would be π/4 (i.e. 5π 4 √ the x-axis — is π). This gives x and y a magnitude of 1/ 2 and their signs would be negative as x, y < 0 so we see that sin
5π 1 = −√ 4 2
and
cos
5π 1 = −√ , 4 2
using the unit circle method. Activity 2.2 Use the unit circle method to find sin θ and cos θ if θ = 2π/3. Activity 2.3 Use the unit circle method to find the values of sin θ and cos θ when θ = 0 and θ = π/2. Hence deduce the values of these functions when θ = π, θ = 3π/2 and θ = 2π. If we want to extend the definition of the sine and cosine functions to all θ ∈ R, we can see from the unit triangle method that both of these functions are periodic with a period of 2π, i.e. sin(θ + 2π) = sin θ
and
cos(θ + 2π) = cos θ,
and their graphs are illustrated in Figure 2.7. In particular, we observe that cos θ = sin(θ + π2 ), i.e. the graph of the cosine function is what we get when we shift the sine function to the left by π/2.
14
2.1. Introduction: What is a function?
2
Figure 2.7: The graphs of the sine and cosine functions, sin θ (solid line) and cos θ (dashed
line), for −π ≤ θ ≤ 4π.
2.1.2
Combinations of functions
The elementary functions we have seen can be combined in various ways to make more complicated functions. Generally, this is straightforward and works in the way you would expect, but sometimes there are slight complications and so we revise these different types of combination here. Linear combinations of functions If we have two functions with the same domain and co-domain, say f : A → B and g : A → B, we can define a new function which is a linear combination of these two functions. For instance, if k and l are constants, we would have the new function kf + lg : A → B defined by (kf + lg)(x) = kf (x) + lg(x), for all x ∈ A. In particular, this gives us polynomials, i.e. functions pn : R → R which are a linear combination of power functions of the form pn (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 , where the ai for 0 ≤ i ≤ n are real constants. Indeed, if an 6= 0, we say that this is a polynomial of degree n. Of course, you have seen polynomials before as, in Chapter 1 of 173 Algebra, you saw how to solve polynomial equations of the form pn (x) = 0 where n = 1 (a linear equation), n = 2 (a quadratic equation) and n = 3 (a cubic equation). The information we get from solving these equations is useful when we come to draw the graphs of polynomial functions as the next example shows.
15
2. Functions
Example 2.1 Draw the graphs of the functions f : R → R and g : R → R given by f (x) = 5 and g(x) = x + 2 on the same axes. At what point(s) do these graphs intersect?
2
When we draw graphs, we will often do this by doing a sketch. Indeed, for a sketch of the simple functions given here, it suffices to indicate their shape (they are both straight lines) and where they are relative to the x and y-axes (by indicating where they intersect these axes). So, as we saw in Section 2.1.1, we should expect the graph of g(x) to be a horizontal line that goes through the point (0, 5) as g(x) = 5 for all x ∈ R whereas for f (x), we would expect a straight line that has an x-intercept that occurs when f (x) = 0, i.e. when x = −2, and a y-intercept that occurs when x = 0, i.e. when f (0) = 2. This information allows us to obtain the sketch illustrated in Figure 2.8. To find the point(s) at which these two graphs intersect, we are looking for the value(s) of x that make f (x) = g(x), i.e. where 5 = x + 2. This gives x = 3 and we know that the values of the functions here must satisfy f (3) = g(3) = 5 which gives (3, 5) as the required point of intersection.1 y 5
y = g(x) y = f (x)
2 −2
O
3
x
Figure 2.8: The graphs of the functions f (x) = 5 and g(x) = x + 2. Notice that these
graphs intersect at the point (3, 5) which we found in Example 2.1. We will see how to draw the graphs of polynomial functions where n = 2 in Section 2.2.1 and we will develop a more general method for dealing with the case where n ≥ 3 in Section 4.4. Products and quotients of functions If we have two functions with the same domain but possibly different co-domains, say f : A → B and g : A → C, we can define a new function which is the product of these two functions. For instance, here we would have the new function f · g : A → D where D is the possibly new co-domain, defined by (f · g)(x) = f (x)g(x), for all x ∈ A. Of course, we have seen how this works the ‘other way’ in Chapter 1 of 173 Algebra as the process of factorisation involves writing a polynomial of degree n as the 1
Of course, thinking about the graphs of these functions as the points, (x, y), satisfying the equations y = f (x) and y = g(x), all we have done here is solve the equations y = 5 and y = x + 2 simultaneously.
16
2.1. Introduction: What is a function?
product of two polynomials, one of degree m and another of degree p, with n = m + p. However, the quotient of these two functions is slightly more tricky to deal with as the function f /g defined by f (x) , (f /g)(x) = g(x) only makes sense for those x ∈ A where g(x) 6= 0 as, of course, we can never divide by zero. As such, when finding the quotient of two functions, we get a function f /g : A0 → B where A0 is a new domain given by A0 = {x ∈ A | g(x) 6= 0}.
The points at which a quotient are undefined may have interesting consequences for its graph since they can give rise to vertical asymptotes. But, this needn’t be the case as the next example shows. Example 2.2
Discuss the behaviour of the functions given by x+1 f (x) = x−1
x2 + x − 2 and g(x) = , x−1
at the point x = 1. For f (x), the polynomials in the numerator and denominator of the quotient are defined for all x ∈ R, but f itself is not defined at x = 1 because that would entail division by zero. As such, f must be a function from {x ∈ R | x 6= 1} to R. Indeed, if we are considering values of x close to one, i.e. x ' 1, we could say that f (x) '
1+1 2 = , x−1 x−1
and so we see that: If we let x go to one from values of x that are larger than one (here we say “x goes to 1 from above” and write “x → 1+ ”), we see that x − 1 is positive and getting very small, which means that f (x) itself is positive and getting very large. That is, f (x) is getting arbitrarily large as x goes to 1 from above and we write this as f (x) → ∞ as x → 1+ . If we let x go to one from values of x that are smaller than one (here we say “x goes to 1 from below” and write “x → 1− ”), we see that x − 1 is negative and getting very small, which means that f (x) itself is negative and getting very large in magnitude. That is, f (x) is negative but getting arbitrarily large in magnitude as x goes to 1 from below and we write this as f (x) → −∞ as x → 1− .
As such, we see that f (x) has a vertical asymptote at the point x = 1 where it is undefined. The graph of this function is illustrated in Figure 2.9(a) so that you can see this asymptote and you will understand why its graph looks like this away from the asymptote after you have covered the material in Section 2.2.4.
For g(x), the polynomials in the numerator and the denominator of the quotient are defined for all x ∈ R, but g itself is not defined at x = 1 because, again, that would
17
2
2. Functions
2
entail division by zero. As such, g must also be a function from {x ∈ R | x 6= 1} to R. However, in this case, we notice that x = 1 makes both the numerator and the denominator equal to zero and so, in particular, x = 1 must be a root of the numerator. This means that, if we factorise the numerator, we find that g(x) =
(x + 2)(x − 1) x2 + x − 2 = , x−1 x−1
and so, as long as x 6= 1, we have g(x) = x + 2. As such, where it is defined (i.e. for x 6= 1) the graph of g is a straight line like the one sketched in Figure 2.9(b) although, of course, we must exclude the point (1, 3) from this line as g(x) is not defined there. In particular, note that in this case the function does not have a vertical asymptote at x = 1 even though it is undefined there. We will look at asymptotes in more detail when we see them again in Section 2.2.4 and Section 4.4. y 3 2
y y = f (x) O
1
x
−2
(a)
O
y = g(x)
1
x
(b)
Figure 2.9: The graphs of the functions f (x) and g(x) from Example 2.2. In (a), the
vertical asymptote at x = 1 is indicated by a dashed line. In (b), the point where the function is undefined is indicated by “◦”. We can also form quotients using trigonometric functions and, in particular, we can use the triangle in Figure 2.4 to see that tan θ =
opposite/hypotenuse sin θ opposite = = , adjacent adjacent/hypotenuse cos θ
that is, we can think of the tangent function as the quotient tan θ =
sin θ , cos θ
(2.1)
which will be defined for θ ∈ R as long as cos θ 6= 0, i.e. as long as θ 6= (2n + 1) π2 for n ∈ Z. At the points where it is undefined this function has vertical asymptotes and its graph is sketched in Figure 2.10.
18
2.1. Introduction: What is a function?
2
Figure 2.10: The graph of the tangent function, tan θ for −π ≤ θ ≤ 4π. Note the vertical
asymptotes when θ = (2n + 1) π2 for n ∈ Z.
We can also find the reciprocals of our three trigonometric functions and these are defined as follows. The secant function, sec θ =
1 which is defined when θ 6= (2n + 1) π2 for n ∈ Z. cos θ
The cosecant function, cosec θ =
1 which is defined when θ 6= nπ for n ∈ Z. sin θ
The cotangent function, cot θ =
1 which is defined when θ 6= nπ for n ∈ Z. tan θ
These functions will be especially useful in Section 2.1.4. Activity 2.4
Show that we also have cot θ =
cos θ as long as θ 6= nπ for n ∈ Z. sin θ
Compositions of functions If we have two functions, say f : A → B and g : B → C, then we can define the composition g ◦ f : A → C to be the function (g ◦ f )(x) = g(f (x)), and here we say that we are applying “g after f ”. That is, thinking of this in terms of ‘black boxes’ we have x ∈ A −→ f −→ f (x) ∈ B −→ g −→ g(f (x)) ∈ C, i.e. we take an x ∈ A and apply f to get the output f (x) ∈ B which we then use as the input for g yielding the final output g(f (x)) ∈ C which is the value of (g ◦ f )(x).
19
2. Functions
2
Example 2.3 Let f : R → R and g : R → R be the functions f (x) = x2 and g(x) = 2x − 1. What are the functions g ◦ f and f ◦ g? Here, as the functions both go from R to R, we can find both of these compositions. In particular, g ◦ f is the function where (g ◦ f )(x) = g(f (x)) = g(x2 ) = 2x2 − 1, where (g ◦ f ) : R → R. f ◦ g is the function where (f ◦ g)(x) = f (g(x)) = f (2x − 1) = (2x − 1)2 , and (f ◦ g) : R → R.
Indeed, observe that as (2x − 1)2 = 4x2 − 4x + 1, these are certainly not the same function. Activity 2.5 Let f : R → R and g : R → R be the functions f (x) = x2 + 1 and g(x) = 2x . What are the functions g ◦ f and f ◦ g? In particular, we will also need to be able to identify compositions the ‘other way’ when we cover the chain rule in Section 3.2.2. For instance, it should be clear that the function (x2 + 5)3 is the composition of the function x3 after the function x2 + 5. Activity 2.6 Explain why the function (x2 + 5)3 is the composition of the function x3 after the function x2 + 5.
2.1.3
Inverse functions
If A and B are sets and we have a function f : A → B, we know that this means that for every x ∈ A there is a unique y ∈ B such that y = f (x). Now, if we can define another function g : B → A, i.e. for every y ∈ B there is a unique x ∈ A such that y = f (x) if and only if x = g(y), then we call the function, g, the inverse of f and denote it by f −1 . In terms of ‘black boxes’, this means that we have x ∈ A −→ f −→ f (x) ∈ B, for f and, if it exists, we have y ∈ B −→ f −1 −→ f −1 (y) ∈ A, for f −1 , or more usefully, f (x) ∈ B −→ f −1 −→ x ∈ A.
20
2.1. Introduction: What is a function?
In particular, this means that if the inverse, f −1 , of f exists, we see that the composition f −1 after f gives us
2
x ∈ A −→ f −→ f (x) ∈ B −→ f −1 −→ x ∈ A, and so (f −1 ◦ f )(x) = f −1 (f (x)) = x whereas the composition f after f −1 gives us y ∈ B −→ f −1 −→ f −1 (y) ∈ A −→ f −→ y ∈ B, and so (f ◦ f −1 )(y) = f (f −1 (y)) = y. That is, the inverse of a function (if it exists) ‘undoes’ what the function does and vice versa. The question, then, is how can we tell whether an inverse function exists? And, if it does exist, how can we find it? Well, given the function f : A → B, the inverse will exist if we are able to take y = f (x) and solve it to obtain a unique solution, x, in terms of y for every y ∈ B. And, if we can do this, these solutions will tell us what the inverse function is, i.e. they will allow us to identify the function, f −1 (y), by comparison with x = f −1 (y). To make this clear, let’s look at an example. Example 2.4 Consider the function f : R → R given by f (x) = x + 2. Explain why this function has an inverse and find it. Using the graph or common sense, we see that the function f (x) = x + 2 has an inverse, since every y ∈ R where y = f (x) gives rise to a unique x ∈ R given by x = y − 2. As such, we can conclude that the inverse of this function exists and we have x = f −1 (y) = y − 2. Of course, we can now write this inverse as f −1 (x) = x − 2 if we want it in terms of x. Indeed, notice that, if we have the function f (x) and its inverse function f −1 (x), the graph of f −1 is the reflection of the graph of f about the line y = x. This happens because any point (x, y) on the curve y = f (x) becomes, under a reflection about the line y = x, a point (y, x) on the curve x = f (y) which is the same as saying that y = f −1 (x)! Activity 2.7 Verify that the curve y = f −1 (x) is the reflection about the line y = x of the curve y = f (x) using the function we saw in Example 2.4. Of course, not every function has an inverse as the next example shows. Example 2.5 Consider the function f : R → R given by f (x) = x2 . Explain why this function does not have an inverse. If we take any y ∈ R where y = f (x) this gives us the equation y = x2 and, if we are considering x ∈ R, this gives rise to a problem as far as the inverse of f is concerned because: If y < 0, we get no solution for x as we know that x ∈ R means that y = x2 ≥ 0. √ If y > 0, we get two solutions for x as we know that we can get x = ± y ∈ R.
That is, we can find no inverse in this case since we cannot guarantee a unique solution for x ∈ R from the equation y = x2 for all y ∈ R.
21
2. Functions
2
Of course, we can usually get around such problems if we are prepared to restrict the domain and the co-domain of the function. But, in that case, we would be finding the appropriate local inverses as opposed to its inverse (which, remember, doesn’t exist!). Activity 2.8 By considering the domains (−∞, 0] and [0, ∞) and suitably restricting the co-domain of the function in Example 2.5, find its local inverses. Let’s now look at the inverses of the elementary functions we considered in Section 2.1.1. Power functions: root functions If we have the power function f (x) = xn where x ∈ N and f : [0, ∞) → [0, ∞) we can see that the inverse is given by x = f −1 (y) = y 1/n , and this is called a root function. Thus, we have x = y 1/n
if and only if y = xn ,
provided that x, y ≥ 0. In particular, if n = 2, this is the square root function, i.e. we √ have y 1/2 = y. Activity 2.9 Draw the graph of the power function f : [0, ∞) → [0, ∞) where f (x) = x2 and its inverse. This also works for f (x) = xn where f : R → R if n is odd. But, if n is even, the function f (x) = xn where f : R → R does not have an inverse as we saw, for n = 2, in Example 2.5. Activity 2.10 Explain why we can find an inverse of the function f : R → R where f (x) = xn if n is odd. Why doesn’t this work if n is even? Exponential functions: logarithmic functions If we have the exponential function f (x) = ax where f : R → (0, ∞) and a 6= 1 is a positive real number, the inverse is the function f −1 : (0, ∞) → R given by x = f −1 (y) = loga y, which is the logarithm to base a. Thus, we have x = loga y provided that y > 0.
22
if and only if y = ax ,
2.1. Introduction: What is a function?
Activity 2.11 Draw the graph of the exponential function f : R → (0, ∞) where f (x) = 2x and its inverse, f −1 (x) = log2 x where f −1 : (0, ∞) → R. In particular, we see from this that as (f ◦ f −1 )(x) = f (f −1 (x)) = x we have aloga x = x, and as (f −1 ◦ f )(x) = f −1 (f (x)) = x we have
loga ax = x.
These results will be useful in Section 2.1.4 when we consider the laws of of logarithms. Trigonometric functions: inverse trigonometric functions If we want to discuss the inverses of the trigonometric functions sine and cosine, it is first necessary to restrict their domain due to their oscillatory nature. To do this, we consider a certain interval of values of θ, called the principal range, so that each value of the function corresponds to a unique value of θ. Indeed, for the: sine function, we take the principal range to be the interval [− π2 , π2 ] so that the function sin : [− π2 , π2 ] → [−1, 1] where y = sin θ has an inverse. This inverse is denoted by sin−1 (or arcsin) where sin−1 : [−1, 1] → [− π2 , π2 ]. Thus, we have y = sin θ provided that − π2 ≤ θ ≤
π 2
if and only if θ = sin−1 y,
and −1 ≤ y ≤ 1.
cosine function, we take the principal range to be the interval [0, π] so that the function cos : [0, π] → [−1, 1] where y = cos θ has an inverse. This inverse is denoted by cos−1 (or arccos) where cos−1 : [−1, 1] → [0, π]. Thus, we have y = cos θ
if and only if θ = cos−1 y,
provided that 0 ≤ θ ≤ π and −1 ≤ y ≤ 1.
It will also be convenient for us to consider the inverse of the tangent function where, as well as the oscillations, we need to take care to avoid the asymptotes that occur when this function is undefined. As such, for the tangent function, we take the principal range to be the interval (− π2 , π2 ) so that the function tan : (− π2 , π2 ) → R where y = tan θ has an inverse. This inverse is denoted by tan−1 (or arctan) where tan−1 : R → (− π2 , π2 ). Thus, we have y = tan θ
if and only if θ = tan−1 y,
provided that − π2 < θ < π2 .
In particular, observe that sin−1 , cos−1 and tan−1 are the inverses of the functions sin, cos and tan respectively and not their reciprocals which we denoted by cosec, sec and cot respectively in Section 2.1.2!
23
2
2. Functions
2
Activity 2.12 Find the acute angles θ1 , θ2 and θ3 where θ1 = sin−1 12 , θ2 = cos−1 and t3 = tan−1 1.
1 2
Also find cosec θ1 , sec θ2 and cot θ3 .
2.1.4
Identities
An expression such as (x + 1)2 = x2 + 2x + 1, which is true for all x is called an identity and, as you know, these are useful when we need to simplify expressions. In particular, in Chapter 1 of 173 Algebra, you saw that the power laws dictate that am an = am+n ,
am = am−n an
and
(am )n = amn ,
and these are identities that work for any values of a, m and n for which both sides are defined. Indeed, these ‘laws’ allow us to simplify expressions that may result from appropriate products, quotients and compositions of power functions or exponential functions. Activity 2.13 If f (x) = x3 , g(x) = x4 and h(x) = 2x , find the functions (f · g)(x), (f /g)(x) and (g ◦ h)(x) simplifying your answers as far as possible. We now look at some other identities that will be useful in this course. The laws of logarithms For any positive real number a 6= 1, the laws of logarithms state that x loga x + loga y = loga (xy), loga x − loga y = loga and y loga x = loga (xy ), y provided that all of the terms involved are defined. As you may know, these ‘laws’ are easily derived from the power laws we saw above and the fact that aloga x = x, which we saw earlier in Section 2.1.3. Activity 2.14
Derive the laws of logarithms from the power laws.
It is also useful to note that if a, b 6= 1 are positive real numbers, then we have the ‘change of base formula’ which states that loga x =
logb x , logb a
and this allows us to write logarithms to base a in terms of logarithms to base b.
24
2.1. Introduction: What is a function?
Activity 2.15
Derive the change of base formula for logarithms.
2
Trigonometric identities There are also identities that allow us to simplify various expressions involving the trigonometric functions. For instance, using the triangle in Figure 2.4, Pythagoras’ theorem allows us to see that 2 2 opposite adjacent 2 2 sin θ + cos θ = + hypotenuse hypotenuse 2 2 opposite + adjacent = hypotenuse2 hypotenuse2 = hypotenuse2 = 1, and so, for acute angles,2 we have shown that sin2 θ + cos2 θ = 1.
(2.2)
In particular, for natural numbers n ≥ 2, note that we commonly abbreviate things like (sin θ)n by writing them as sinn θ. Further, dividing both sides of this expression by sin2 θ we get 1 + cot2 θ = cosec2 θ, (2.3) and this works as long as θ 6= nπ for n ∈ Z whereas dividing both sides of this expression by cos2 θ we get tan2 θ + 1 = sec2 θ,
(2.4)
and this works as long as θ 6= (2n + 1) π2 for n ∈ Z. We call these three identities the Pythagorean identities as they are simple consequences of Pythagoras’s theorem. Activity 2.16
Use (2.2) to derive the Pythagorean identities (2.3) and (2.4).
Another useful pair of trigonometric identities are the compound-angle formulae given by sin(θ + ϕ) = sin θ cos ϕ + cos θ sin ϕ and
cos(θ + ϕ) = cos θ cos ϕ − sin θ sin ϕ,
which work for all θ, ϕ ∈ R. Activity 2.17 Observe from the graphs of the sine and cosine functions in Figure 2.7 that sine is an odd function, i.e. sin(−θ) = − sin θ, and cosine is an even function, i.e. cos(−θ) = cos θ. Use these facts and the compound-angle formulae to show that we also have sin(θ − ϕ) = sin θ cos ϕ − cos θ sin ϕ and
cos(θ − ϕ) = cos θ cos ϕ + sin θ sin ϕ,
for θ, ϕ ∈ R. 2
Of course, if we consider how we extend the definitions of the sine and cosine functions to all θ ∈ R, it should be clear that this identity is actually true for all θ ∈ R.
25
2. Functions
Usually, we summarise these four compound-angle formulae by writing them as
2
sin(θ ± ϕ) = sin θ cos ϕ ± cos θ sin ϕ and
cos(θ ± ϕ) = cos θ cos ϕ ∓ sin θ sin ϕ, (2.5)
for θ, ϕ ∈ R. Indeed, they are especially useful since, setting ϕ = θ, we can use them to obtain the double-angle formulae sin(2θ) = 2 sin θ cos θ
and
cos(2θ) = cos2 θ − sin2 θ,
(2.6)
which work for all θ ∈ R. These will be especially useful in Chapter 5. Activity 2.18 Use the compound-angle formulae to derive the double-angle formulae given above. Use the Pythagorean identity sin2 θ + cos2 θ = 1 to show that we also have cos(2θ) = 1 − 2 sin2 θ
and
cos(2θ) = 2 cos2 θ − 1,
for all θ ∈ R.
2.1.5
Applications of functions
In economics and related subjects, functions can be used to represent how one quantity depends on another. For instance, as the profit that a company makes, π, would depend on the quantity of goods sold, q, it makes sense to suppose that there is some function of q, say f , that tells us the corresponding profit, π. In this case, we would use an equation of the form π = f (q) to express this dependency and we would have found a profit function. Moreover, if f is invertible, we could find its inverse function, f −1 , and we would use this to find the value of q that corresponds to a given value of π. In which case, the dependency would now be given by an equation of the form q = f −1 (π). We will look at profit functions properly in Section 4.5.3, but for now, we consider another application of functions in economics, namely how they can be used to represent information about supply and demand in a market. Supply and demand functions In any given market, there is a good which is supplied by the producers (and demanded by the consumers) and the general idea is that, for both supply (and demand), if producers are charging (or consumers are buying) at a price of p per-unit, then the level of supply (or demand) for that good, q, will depend on p. Indeed, since each value of p will lead the producers to supply (and the consumers to demand) exactly one quantity q, it makes sense to think of the quantity, q, supplied (or demanded) as a function of the price, p. This leads us to a description of the market in terms of two kinds of function, namely: If the quantity supplied, q, can be written in terms of p then we can identify the supply function, q S , from the fact that we have q = q S (p). This tells us the quantity, q, that the producers will supply if the prevailing market price is p.
26
2.1. Introduction: What is a function?
If the quantity demanded, q, can be written in terms of p then we can identify the demand function, q D , from the fact that we have q = q D (p). This tells us the quantity, q, that the consumers will demand if the prevailing market price is p. In particular, note that, although we have q as a function of p in both of these cases we follow the practice common in economics and use the vertical axis for p and the horizontal axis for q when drawing the graphs of these functions. As such, any point on the graph of these functions is of the form (q, p) where q = q S (p) for supply and q = q D (p) for demand. Also, these functions and their graphs only make economic sense when p ≥ 0 and the quantities they yield, q, are also non-negative.3
Once we have these functions, we are often interested in the the equilibrium point for the market as this is the point where the supply and demand functions are equal. In theory, this is the point, (q ∗ , p∗ ), where the market stabilises since, at this point, the per-unit price, p∗ , is such that the levels of supply and demand are equal, i.e. we have q S (p∗ ) = q D (p∗ ).
As such, we can find the equilibrium price, p∗ , by solving the resulting equation and the corresponding equilibrium quantity, q ∗ , can then be found by, say, using the demand function as q ∗ = q D (p∗ ). Let’s look at a simple example. Example 2.6
The supply and demand functions for a good are q S (p) = p + 1
and
q D (p) = 3 − p,
respectively. Sketch the graphs of these functions and find the equilibrium point. Here the supply and demand functions are straight lines which can easily be sketched using the method outlined in Example 2.1 and the results of doing this are illustrated in Figure 2.11. To find the equilibrium price, p∗ , we have q S (p∗ ) = q D (p∗ )
=⇒
p∗ + 1 = 3 − p∗
=⇒
2p∗ = 4,
and so, p∗ = 2. Then using the demand function, say, we have q ∗ = q D (p∗ ) = 3 − p∗ , and so the equilibrium quantity is q ∗ = 3 − 2 = 1. Consequently, the equilibrium point is (q ∗ , p∗ ) = (2, 1) which, as indicated in Figure 2.11, is the point at which the two straight lines intersect. Usually, the supply and demand functions are invertible and so we can also find the inverses of these functions. In particular, if they are invertible, we note that: If the price, p, can be written in terms of q then we can identify the inverse supply function, pS , from the fact that we have p = pS (q). This tells us the price, p, that the producers will charge if the quantity being supplied is q. 3
Although, when drawing their graphs, it is often useful to consider all possible values of p and q before restricting your attention to the economically meaningful ones where p, q ≥ 0!
27
2
2. Functions
p 3
2
S 1 D O
1
2
3
q
−1 Figure 2.11: A sketch of the graphs of the supply and demand functions in Example 2.6
indicating the equilibrium point for this market. (Note that this sketch only makes economic sense when p ≥ 0.) If the price, p, can be written in terms of q then we can identify the inverse demand function, pD , from the fact that we have p = pD (q). This tells us the price, p, that the consumers will pay if the quantity being demanded is q. Activity 2.19 Decide whether the supply and demand functions in the example above are invertible. If they are, find the inverse supply and demand functions. The effects of taxation Sometimes, in order to control a market, a government will impose an excise tax of T per unit sold. We model such situations by assuming that the tax is paid to the government by the supplier and so, if the price paid by the consumers in the presence of this tax is p per unit, the suppliers effectively receive p − T for each unit sold as they must pay T of each p received to the government. As such, the supply and demand functions in the presence of the tax, let’s call them qTS (p) and qTD (p) respectively, will be given by qTS (p) = q S (p − T )
and
qTD (p) = q D (p).
That is, the consumers still pay a price of p per unit and so the demand function is unchanged, but the suppliers now only receive an amount p − T per unit and so the supply function is modified by the introduction of an excise tax. Of course, the introduction of an excise tax will affect the equilibrium price and quantity for a market, i.e. in the presence of such a tax, the new equilibrium point, let’s call it (qT∗ , p∗T ), will be the point where qTS (p∗T ) = qTD (p∗T )
or, equivalently,
q S (p∗T − T ) = q D (p∗T ),
and, using the unchanged demand function qT∗ = qTD (p∗T ) or, equivalently, qT∗ = q D (p∗T ). Let’s look at how such a tax would affect the market we considered in Example 2.6.
28
2.1. Introduction: What is a function?
Example 2.7 An excise tax of T per unit is imposed on the market in Example 2.6. Find the new equilibrium point and, by sketching the graph of the new supply function on your earlier sketch, comment on how the equilibrium point for the market has changed. How much of the tax has been passed onto the consumers? What is the maximum tax, Tm , that can be imposed if this market is to continue functioning? If an excise tax of T per unit is imposed, the demand function is still qTD (p) = q D (p) = 3 − p, but the supply function becomes qTS (p) = q S (p − T ) = p − T + 1, as the suppliers now see an effective price of p − T . This means that the equilibrium price in the presence of the tax, p∗T , is given by qTS (p∗T ) = qTD (p∗T )
=⇒
p∗T −T +1 = 3−p∗T
=⇒
2p∗T = 2+T
=⇒
T p∗ = 1+ , 2
and so the equilibrium quantity in the presence of tax, qT∗ , is T T ∗ D ∗ qT = qT (pT ) = 3 − 1 + =2− , 2 2 if we use the demand function, qTD (p).4 Thus, the new equilibrium point is (2 − T /2, 1 + T /2). Sketching the graph of the new supply function, as in Figure 2.12, we see that it is parallel to the old one and the p-intercept has increased by T . Indeed, as the equilibrium price has increased from 1 to 1 + T /2 due to the presence of the tax, half the tax has been passed on to the consumer. Of course, the equilibrium quantity in the presence of the tax must be positive and so, for the market to function, we require that qT∗ > 0
=⇒
2−
T >0 2
=⇒
T < 4,
i.e. the maximum tax, Tm , that can be imposed is given by Tm = 4. Alternatively, the government may decide to impose a percentage of the price tax of 100r% (so, for instance, a tax of 5% of the price would correspond to r = 0.05) instead of the per unit tax that we have considered so far. So, again assuming that the tax is paid to the government by the supplier, if the price paid by the consumers in the presence of this tax is p per unit, the suppliers effectively receive p − rp for each unit sold as they must pay rp of each p received to the government. As such, in the presence of such a tax, the supply and demand functions in the presence of the tax, let’s call 4
Alternatively, we could use the supply function qTS (p) = q S (p − T ) = p − T + 1,
to find qT∗ . However, we can not use q S (p) = p + 1 as this no longer holds in the presence of the tax!
29
2
2. Functions
p
2
3
(2 − 12 T, 1 + 21 T ) new S S
1 T −1
O
D 1
2
q
3
−1 Figure 2.12: Following on from the sketch in Figure 2.11, if an excise tax of T per unit is imposed, the supply set changes as shown and the demand set stays the same. Observe how the introduction of this tax affects the equilibrium point for this market. (Note that this sketch only makes economic sense when p ≥ 0.)
them qrS (p) and qrD (p) respectively, will be given by qrS (p) = q S (p − rp)
and
qrD (p) = q D (p).
That is, once again, the consumers still pay a price of p per unit and so the demand function is unchanged, but the suppliers now only receive an amount p − rp per unit and so the supply function is modified by the introduction of a percentage of the price tax. Of course, the introduction of this tax will also affect the equilibrium price and quantity for the market, i.e. in the presence of such a tax, the new equilibrium point, let’s call it (qr∗ , p∗r ), will be the point where qrS (p∗r ) = qrD (p∗r )
or, equivalently,
q S (p∗r − rp∗r ) = q D (p∗r ),
and, using the unchanged demand function qr∗ = qrD (p∗r ) or, equivalently, qr∗ = q D (p∗r ). See, for example, Exercise 2.3 at the end of this chapter.
2.2
Conic sections
So far, we have been dealing with functions that are explicitly defined in terms of an independent variable but, sometimes, we may have an equation relating two variables, say x and y, which implicitly defines y as one or more functions of x. As it will be useful in various places, we now investigate some important instances of functions defined in this way and their graphs, the so-called conic sections.5
2.2.1
Parabolae
A parabola is a curve whose equation has the form y = ax2 + bx + c, 5
See, for example, Binmore and Davies (2002) Section 2.14 for a full discussion of the geometric aspects of conic sections and where they come from. Although this is interesting, we will not be delving into these overly geometric aspects of conic sections in this course.
30
2.2. Conic sections
where a 6= 0, b and c are constants. Indeed, if we complete the square, we can write this in the form y = a(x − p)2 + q, for some constants p and q. This curve will have a y-intercept which we can find by setting x = 0 and it may have x-intercepts which, if they exist, we can find by setting y = 0. It will also have a turning point with coordinates (p, q) which will be a minimum if a > 0 and a maximum if a < 0. Once we have this information, the parabola should be easy to draw as the next example shows. Example 2.8
Sketch the parabolae whose equations are
(a) y = x2 − 4x + 3, and (b) y = −x2 + 2x + 3. For (a), we are told that y = x2 − 4x + 3 and so we find that: For the y-intercept: Setting x = 0 we get y = 3. For the x-intercepts: Setting y = 0 we get x2 − 4x + 3 = 0
=⇒
(x − 1)(x − 3) = 0,
i.e. the x-intercepts are x = 1 and x = 3. The turning point of the parabola can be found by writing the equation of the parabola in completed square form and, doing this, we get y = (x − 2)2 − 1. Here, a = 1 > 0 and so we get a minimum at the point (2, −1).
Putting this information together, we then get the sketch in Figure 2.13(a). For (b), we are told that y = −x2 + 2x + 3 and so we find that For the y-intercept: Setting x = 0 we get y = 3. For the x-intercept: Setting y = 0 we get −x2 + 2x + 3 = 0
=⇒
x2 − 2x − 3 = 0
=⇒
(x + 1)(x − 3) = 0,
i.e. the x-intercepts are x = −1 and x = 3. The turning point of the parabola can be found by writing the equation of the parabola in completed square form and, doing this, we get y = −x2 +2x+3 = − x2 −2x +3 = − (x−1)2 −1 +3 = −(x−1)2 +1+3 = −(x−1)2 +4. Here, a = −1 < 0 and so we get a maximum at the point (1, 4).
Putting this information together, we then get the sketch in Figure 2.13(b).
31
2
2. Functions
y
y y = x2 − 4x + 3
2
4 3
y = −x2 + 2x + 3
3
2 O
x 1
3
−1
−1
O
(a)
x 1
3
(b)
Figure 2.13: In (a) we have a sketch of the parabola from Example 2.8(a). In (b) we have
a sketch of the parabola from Example 2.8(b). Activity 2.20 Given the equation of a parabola in completed square form, i.e. y = a(x − p)2 + q, use the fact that (x − p)2 ≥ 0 for all x ∈ R to explain why the turning point of this parabola will be a minimum if a > 0 and a maximum if a < 0.
2.2.2
Circles
A circle of radius, r, centred on the point (a, b) has an equation given by (x − a)2 + (y − b)2 = r2 . Of course, such a circle is easy to draw and its x and y-intercepts can be found by seeing where y = 0 and where x = 0 respectively. Once we have this information, the circle should be easy to draw as the next example shows. Example 2.9 Find the radius and centre of the circle whose equation is given by Sketch the circle.
x2 − 6x + y 2 − 8y = 0.
We are told that x2 − 6x + y 2 − 8y = 0,
is the equation of a circle and so, completing the square in x and y, we find that 2 2 (x − 3) − 9 + (y − 4) − 16 = 0 =⇒ (x − 3)2 + (y − 4)2 = 25,
32
2.2. Conic sections
and so, comparing this with (x − a)2 + (y − b)2 = r2 we see that we have a circle of radius 5 centred on the point (3, 4). We also find that:
2
For the x-intercept: Setting y = 0 we get (x − 3)2 + 16 = 25
=⇒
(x − 3)2 = 9
=⇒
x − 3 = ±3,
=⇒
y − 4 = ±4,
i.e. the x-intercepts are x = 6 and x = 0. For the y-intercept: Setting x = 0 we get 9 + (y − 4)2 = 25
(y − 4)2 = 16
=⇒
i.e. the y-intercepts are y = 8 and y = 0. Putting this information together, we then get the sketch in Figure 2.14(a). y 8
y 3 5
4 O
−2 3
2
x
−3
x
6
O
(a)
(b)
Figure 2.14: In (a) we have a sketch of the circle from Example 2.9. In (b) we have a
sketch of the ellipse from Example 2.10.
2.2.3
Ellipses
An ellipse has an equation of the form x2 y 2 + 2 = 1. a2 b In particular, an ellipse of this form is effectively a circle centred on the origin that has been ‘squashed’ and it is easy to draw once we have found its x and y-intercepts by seeing where y = 0 and where x = 0 respectively. Example 2.10 Sketch the ellipse whose equation is given by
x2 y 2 + =1 4 9
Given that the equation of the ellipse is x2 y 2 + = 1, 4 9 we see that the x-intercepts, which occur when y = 0, are given by x2 =1 4
=⇒
x2 = 4
=⇒
x = ±2,
33
2. Functions
whereas the y-intercepts, which occur when x = 0, are given by y2 =1 9
2
=⇒
y2 = 9
=⇒
y = ±3.
Putting this information together, and bearing in mind that this should look like a circle centred on the origin that has been ‘squashed’, we then get the sketch in Figure 2.14(b).
2.2.4
Hyperbolae
A hyperbola can have an equation of the form x2 y 2 − 2 = 1. a2 b This curve will have x-intercepts which can be found by setting y = 0, but no y-intercepts. It will also have slant (or oblique) asymptotes which can be found by writing the equation as 1 1 y2 2 =b − , x2 a2 x 2 so that, as x → ∞, we have 1/x2 → 0 and this leaves us with b2 b y2 = =⇒ y = ± x, 2 2 x a a as the equations of the asymptotes. Once we have this information, the hyperbola should be easy to draw as the next example shows. Example 2.11
Sketch the hyperbola whose equation is given by
x2 y 2 − = 1. 4 9
Given that the equation of the hyperbola is x2 y 2 − = 1, 4 9 we see that the x-intercepts, which occur when y = 0, are given by x2 = 1 =⇒ x2 = 4 =⇒ x = ±2, 4 whereas there are no y-intercepts since, setting x = 0, we get y2 = 1 =⇒ y 2 = −9, 9 which has no real solutions. To find the asymptotes, we write the equation as y2 1 1 =9 − , x2 4 x2 −
so that, as x → ∞, we have 1/x2 → 0 and this leaves us with
y2 9 3 =⇒ y = ± x, = 2 x 4 2 as the equations of the asymptotes. Putting this information together, we then get the sketch in Figure 2.15(a).
34
2.2. Conic sections
y
y 2 x−1
2
y
=
2
3
x
y =1+
−2
x2 4
O
2
−
y2 9
1
=1 x
y
O −1 −1
x
1
= 3
−
x
2
(a)
(b)
Figure 2.15: In (a) we have a sketch of the hyperbola from Example 2.11. In (b) we have
a sketch of the rectangular hyperbola from Example 2.12. Of course, similar remarks apply to a hyperbola which has an equation of the form y 2 x2 − 2 = 1, b2 a and, in particular, this curve will have y-intercepts but no x-intercepts.
Activity 2.21 Sketch the hyperbola whose equation is given by
y 2 x2 − = 1. 9 4
Lastly, we note that a rectangular hyperbola has an equation of the form (x − a)(y − b) = c, and this arises when the asymptotes turn out to be the horizontal line y = b and the vertical line x = a as the next example illustrates. Example 2.12 Sketch the rectangular hyperbola whose equation is given by (x − 1)(y − 1) = 2. Given that (x − 1)(y − 1) = 2, we can see that: For the x-intercept: Setting y = 0 we get (x − 1)(−1) = 2 or x − 1 = −2, i.e.the x-intercept is given by x = −1. For the y-intercept: Setting x = 0 we get (−1)(y − 1) = 2 or y − 1 = −2, i.e. the y-intercept is given by y = −1.
Then, by writing the equation as
y =1+
2 , x−1
we can find the asymptotes by noting that:
35
2. Functions
For the vertical asymptote: As x → 1+ we have y → ∞ and as x → 1− we have y → −∞.
2
For the horizontal asymptote: As x → ∞ we have y → 1 from above and as x → −∞ we have y → 1 from below.
Putting this information together, we then get the sketch in Figure 2.15(b). In particular, observe that here we have y =1+
(x − 1) + 2 x+1 2 = = , x−1 x−1 x−1
and so this gives us y = f (x) where f (x) is the first function in Example 2.2 which was illustrated in Figure 2.9(a).
Learning outcomes At the end of this chapter and having completed the relevant reading and activities, you should be able to: identify elementary functions and sketch their graphs; find combinations of elementary functions and inverses (if they exist); use identities to rewrite expressions involving powers, logarithms and trigonometric functions; solve problems from economics-based subjects that involve functions; identify and sketch conic sections.
Solutions to activities Solution to activity 2.1 Using the triangles in Figure 2.5 and the definition of the tangent function, it should be clear that π 1 π π √ tan = √ , tan = 1 and tan = 3. 6 4 3 3 Indeed, using the fact that angle in radians =
2π × angle in degrees, 360
we can see that an angle of π/6, π/4 or π/3 radians corresponds to an angle of 30, 45 or 60 degrees respectively. Solution to activity 2.2 In this case, the unit circle method gives us the situation illustrated in Figure 2.16 and so the angle in the triangle would be π/3 (i.e. π − 2π = π3 as the angle subtended by a 3
36
2.2. Solutions to activities
straight line —√in this case the x-axis — is π) giving x a magnitude of 1/2 and y a magnitude of 3/2 whereas their signs would be negative for x (as x < 0) and positive for y (as y > 0). Thus we see that √ 3 2π 2π 1 sin = and cos =− , 3 2 3 2 using the unit circle method.
(x, y)
y 1
2π/3
O
x
Figure 2.16: For Activity 2.2, we find sin θ and cos θ when θ = 2π/3 by considering a unit
circle. Solution to activity 2.3 Using the unit circle in Figure 2.17(a), it should be clear that sin 0 = 0
and
cos 0 = 1,
whereas using the unit circle in Figure 2.17(b), it should be clear that π π and cos = 0. sin = 1 2 2 Then, using similar reasoning, we should be able to deduce that θ
π
sin θ
0
cos θ
−1
3π 2 −1 0
2π 0 1
are the other values of the functions sin θ and cos θ that we seek. Solution to activity 2.4 From the definition of cot θ, we have 1 1 cos θ = = , sin θ tan θ sin θ cos θ as we know that tan θ = sin θ/ cos θ. This function is defined as long as θ 6= nπ for n ∈ Z since, at these values of θ, we have tan θ = 0 or, equivalently, sin θ = 0. cot θ =
Solution to activity 2.5 Given the functions f : R → R and g : R → R where f (x) = x2 + 1 and g(x) = 2x , we see that
37
2
2. Functions
y
y
2
1 O
(a)
1 x
O
x
(b)
Figure 2.17: For Activity 2.3, we find sin θ and cos θ by considering a unit circle when (a) θ = 0 and (b) θ = π/2.
g ◦ f is the function where (g ◦ f )(x) = g(f (x)) = g(x2 + 1) = 2x
2 +1
,
where (g ◦ f ) : R → R. f ◦ g is the function where (f ◦ g)(x) = f (g(x)) = f (2x ) = (2x )2 + 1 = 22x + 1, and (f ◦ g) : R → R.
Indeed, observe that as 2x
2 +1
6= 22x + 1, these are certainly not the same function.
Solution to activity 2.6 If we have f (x) = x3 and g(x) = x2 + 5, then the function (x2 + 5)3 can be written as (x2 + 5)3 = f (x2 + 5) = f (g(x)) = (f ◦ g)(x), i.e. it is the composition we get from applying f after g or, in terms of x, it is the composition of the function x3 after the function x2 + 5. Solution to activity 2.7 By considering the graphs of the functions f (x) = x + 2 and f −1 (x) = x − 2 as illustrated in Figure 2.18, we see that the latter is indeed the reflection in the line y = x of the former. Alternatively, we can see that if y = x + 2, a reflection in the line y = x just means replacing all points (x, y) that satisfy this equation with points given by (y, x) to get the new equation x = y + 2. But, of course, this gives y = x − 2 which is what we wanted. Solution to activity 2.8 When we considered the function f : R → R in Example 2.5, there were two problems that prevented us from finding an inverse. To counteract these so that we can find the local inverses of this function, we note that: If we take the co-domain to be the interval [0, ∞) so that we have y ≥ 0, then we remove the problem that occurs because y = x2 has no solution for y < 0.
38
+
2
2.2. Solutions to activities
2 x
=
2
y O
2
−
x =
2
y
y
=
x
y
−2
x
−2
Figure 2.18: For Activity 2.7, we see that the graph of f −1 (x) = x − 2 is the reflection of
the function f (x) = x + 2 about the line y = x.
If we take the two domains given by the intervals (−∞, 0] and [0, ∞) so that we have x ≤ 0 and x ≥ 0 respectively, then we remove the problem that occurs because y = x2 has two solutions for x ∈ R.
Indeed, this means that if we consider the function
f : [0, ∞) → [0, ∞) given by f (x) = x2 , then we have y = f (x)
=⇒
y = x2
=⇒
x=
√
y,
−1 as x ≥ 0 because x ∈ [0, ∞). √ Thus, using x = f (y), the inverse of this function is √ −1 −1 f (y) = y or f (x) = x if we want it in terms of x.
f : (−∞, 0] → [0, ∞) given by f (x) = x2 , then we have y = f (x)
=⇒
y = x2
=⇒
√ x = − y,
−1 as x ≤ 0 because x ∈ (−∞, 0]. Thus, √ using x = f (y), the inverse of this function √ −1 −1 is f (y) = − y or f (x) = − x if we want it in terms of x.
In particular, this means that the local inverses√of f : R → [0, ∞) where f (x) = x2 are √ f −1 (x) = x when x ∈ [0, ∞) and f −1 (x) = − x when x ∈ (−∞, 0]. Solution to activity 2.9
We saw in Activity 2.8 that the function f : [0, ∞) → [0, ∞) where f (x) = x2 has an √ inverse given by f −1 (x) = x. The graphs of these two √functions are illustrated in Figure 2.19. In particular, observe that the curve y = x is the reflection about the line y = x of the curve y = x2 and that all three of these curves intersect at the points (0, 0) and (1, 1). Solution to activity 2.10 From the graphs of the function f (x) = xn where f : R → R when n is odd, which we saw in Figure 2.2(a) and (c), it should be clear that the the equation y = f (x) has a unique solution, x, for all y ∈ R and so the inverse of this function exists. In particular, we see that √ y = xn =⇒ x = y 1/n = n y, gives us this unique solution for any y ∈ R provided that n is odd and so we have √ √ −1 −1 n n f (y) = y as the inverse function or, indeed, f (x) = x if we want it in terms of x.
39
x y
y
2
=
y=
x2
2. Functions
y=
√x
1
O
1
x
Figure 2.19: For Activity 2.9, we see that the graph of f −1 (x) =
√
x is the reflection of
the function f (x) = x2 about the line y = x. However, from the graph of the function f (x) = xn where f : R → R when n is even, which we saw in Figure 2.2(b), it should be clear that when y < 0, the equation y = f (x) has no solution for x as we know that x ∈ R means that y = xn ≥ 0 when n is even. y > 0, the equation y = f (x) has two solutions for x as we know that we can get √ x = ± n y ∈ R when n is even.
As such, we can not find a unique solution, x, for all y ∈ R and so the inverse of this function can not exist. Solution to activity 2.11
=
x log 2 = y
1 O
y
y= x 2
y
x
We saw the graph of a function like f : R → (0, ∞) where f (x) = 2x in Figure 2.3(b) since we have a = 2 > 1 here. As such, we find that the graphs of the function f : R → (0, ∞) where f (x) = 2x and its inverse, f −1 (x) = log2 x where f −1 : (0, ∞) → R, are as illustrated in Figure 2.20. In particular, observe that the curve y = log2 x is the reflection about the line y = x of the curve y = 2x .
1
x
Figure 2.20: For Activity 2.11, we see that the graph of f −1 (x) = log2 x is the reflection
of the function f (x) = 2x about the line y = x.
40
2.2. Solutions to activities
Solution to activity 2.12 To find the acute angles θ1 and θ2 where θ1 = sin−1 of values in Section 2.1.1 to see that π 1 = sin 2 6
sin θ1 =
gives us
1 2
and θ2 = cos−1 12 , we use the table
θ1 = sin−1
1 π = , 2 6
and
1 1 π π = cos gives us θ2 = cos−1 = , 2 3 2 3 −1 whereas to find the acute angle θ3 where t3 = tan 1, we use the table we found in Activity 2.1 to see that cos θ2 =
tan θ3 = 1 = tan
π 4
gives us
θ3 = tan−1 1 =
π . 4
We also have cosec θ1 =
1 1 1 1 1 1 = = 2, sec θ2 = = = 2 and cot t3 = = = 1, sin θ1 1/2 cos θ2 1/2 tan θ3 1
using the definitions of the reciprocals of our three trigonometric functions, which we saw in Section 2.1.2. Solution to activity 2.13 Given that f (x) = x3 , g(x) = x4 and h(x) = 2x , we use the definitions of the combinations of functions we need from Section 2.1.2, to get (f · g)(x) = f (x)g(x) = (x3 )(x4 ) = x7 , f (x) x3 1 (f /g)(x) = = 4 = , and g(x) x x (g ◦ h)(x) = g(h(x)) = g(2x ) = (2x )4 = 24x ,
where we have used the power laws to simplify our answers. Indeed, observe that for the last function, we can also write 24x = (24 )x = 16x . Solution to activity 2.14 To derive the laws of logarithms, we note that for the first one, we use the power laws and the given fact to get aloga x+loga y = aloga x aloga y = xy = aloga (xy) , which means that loga x + loga y = loga (xy), for the second one, we similarly get aloga x−loga y =
x aloga x = = aloga (x/y) , aloga y y
which means that loga x − loga y = loga (x/y) and for the third one, we get y
ay loga x = (aloga x )y = xy = aloga (x ) , which means that y loga x = loga (xy ).
41
2
2. Functions
Solution to activity 2.15
2
We take logarithms to the base b on both sides of the given fact to see that aloga x = x =⇒ logb aloga x = logb x =⇒ (loga x)(logb a) = logb x,
where we have used the third law of logarithms in the last step. Then, dividing through on both sides by logb a (which is non-zero as a 6= 1), we get loga x =
logb x , logb a
as required. Solution to activity 2.16 Starting with sin2 θ + cos2 θ = 1, we divide both sides by sin2 θ to get 2 2 sin2 θ + cos2 θ 1 sin2 θ cos2 θ 1 cos θ 1 = =⇒ + = =⇒ 1+ = , sin θ sin θ sin2 θ sin2 θ sin2 θ sin2 θ sin2 θ so that 1 + cot2 θ = cosec2 θ if we use the definition of cosec θ from Section 2.1.2 and the result from Activity 2.4. Then, again starting with sin2 θ + cos2 θ = 1, we divide both sides by cos2 θ to get 2 2 sin θ sin2 θ + cos2 θ 1 sin2 θ cos2 θ 1 1 = =⇒ + = =⇒ +1 = , cos2 θ cos2 θ cos2 θ cos2 θ cos2 θ cos θ cos θ so that tan2 θ + 1 = sec2 θ if we use the definition of sec θ and (2.1) from Section 2.1.2. Solution to activity 2.17 With the given facts, we can use the compound-angle formula for sin(θ + ϕ) to see that sin(θ − ϕ) = sin(θ + (−ϕ)) = sin θ cos(−ϕ) + cos θ sin(−ϕ) = sin θ cos ϕ − cos θ sin ϕ, and the compound-angle formula for cos(θ + ϕ) to see that cos(θ − ϕ) = cos(θ + (−ϕ)) = cos θ cos(−ϕ) − sin θ sin(−ϕ) = cos θ cos ϕ + sin θ sin ϕ, as required. Solution to activity 2.18 Using the compound-angle formula sin(θ + ϕ) = sin θ cos ϕ + cos θ sin ϕ, with ϕ = θ we get sin(θ + θ) = sin θ cos θ + cos θ sin θ
=⇒
sin(2θ) = 2 sin θ cos θ,
whereas using the compound-angle formula cos(θ + ϕ) = cos θ cos ϕ − sin θ sin ϕ,
42
2.2. Solutions to activities
with ϕ = θ we get cos(θ + θ) = cos θ cos θ − sin θ sin θ
=⇒
cos(2θ) = cos2 θ − sin2 θ, 2
2
2
as required. Indeed, since we also have the Pythagorean identity sin θ + cos θ = 1, we can write this last double-angle formula as cos(2θ) = (1 − sin2 θ) − sin2 θ = 1 − 2 sin2 θ, in terms of sin2 θ, or as cos(2θ) = cos2 θ − (1 − cos2 θ) = 2 cos2 θ − 1, in terms of cos2 θ, as required. Solution to activity 2.19 From the graph in Figure 2.11, we can see that the economically meaningful part of the supply function is q S : [0, ∞) → [1, ∞) where q S (p) = p + 1 and the economically meaningful part of the demand function is q D : [0, 3] → [0, 3] where q D (p) = 3 − p. Clearly, both of these functions are invertible as each q in the co-domain gives rise to a unique p in the domain and we find that q =p+1
=⇒
p = pS (q) = q − 1,
is the inverse supply function, whereas q =3−p
=⇒
p = pD (q) = 3 − q,
is the inverse demand function. Solution to activity 2.20 Given that y = a(x − p)2 + q, we see that If a > 0, then for any x ∈ R, (x − p)2 ≥ 0
=⇒
a(x − p)2 ≥ 0
=⇒
a(x − p)2 + q ≥ q,
i.e. for all x ∈ R, y ≥ q and so the smallest value of y occurs when y = q which, in turn, means that we must have x = p. Thus, the turning point of the parabola is a minimum and this occurs at the point (p, q). If a < 0, then for any x ∈ R, (x − p)2 ≥ 0
=⇒
a(x − p)2 ≤ 0
=⇒
a(x − p)2 + q ≤ q,
i.e. for all x ∈ R, y ≤ q and so the largest value of y occurs when y = q which, in turn, means that we must have x = p. Thus, the turning point of the parabola is a maximum and this occurs at the point (p, q).
43
2. Functions
Solution to activity 2.21
y 2 x2 − = 1, 9 4 we see that there are no x-intercepts since, setting y = 0, we get −
x2 =1 4
x2 = −4,
=⇒
which has no real solutions, whereas we see that the y-intercepts, which occur when x = 0, are given by y2 = 1 =⇒ y 2 = 9 =⇒ y = ±3. 9 To find the asymptotes, we write the equation as y2 =9 x2
1 1 + 2 4 x
,
so that, as x → ∞, we have 1/x2 → 0 and this leaves us with 9 y2 = 2 x 4
=⇒
3 y = ± x, 2
as the equations of the asymptotes. Putting this information together, we then get the sketch in Figure 2.21.
y
−
x2 4
=1
=
3
2
3
x
y2 9
y
2
Given that the equation of the hyperbola is
O
y = 3
−
−3
x
x
2
Figure 2.21: For Activity 2.21, a sketch of the hyperbola
44
y 2 x2 − = 1. 9 4
2.2. Exercises
Exercises
2
Exercise 2.1 Sketch the graph of the function f : {x ∈ R | x 6= −1, 1} → R given by x4 − 1 . x2 − 1
f (x) =
Exercise 2.2 Use the compound-angle formulae to show that tan(θ ± ϕ) =
tan θ ± tan ϕ , 1 ∓ tan θ tan ϕ
and hence deduce an expression for tan(2θ). Exercise 2.3 The supply and demand functions for a good are q S (p) = p − 4
and
q D (p) = 8 − p,
respectively. Sketch the graphs of these functions and find the equilibrium point. A percentage [of the price] tax of 100r% is imposed. Find the new equilibrium point and, by sketching the graph of the new supply function on your earlier sketch, comment on how the equilibrium point for the market has changed. How much of the tax has been passed onto the consumers? What is the maximum tax, rm , that can be imposed if this market is to continue functioning? Exercise 2.4 When selling a quantity, q, a firm makes a profit given by π(q) = q 2 + 2q + 2, and the largest quantity it can produce is 10. Sketch the graph of this profit function and deduce the value of q that will yield the greatest profit for this firm. Explain why the inverse profit function exists and find it. Exercise 2.5 Sketch the circle and the rectangular hyperbola with equations x2 + y 2 = 1
and
2xy = 1,
respectively. At what points do these two curves intersect?
45
2. Functions
Solutions to exercises
2
Solution to exercise 2.1 The function f : {x ∈ R | x 6= −1, 1} → R given by x4 − 1 f (x) = 2 , x −1 is clearly undefined at x = 1 and x = −1 as these values of x would entail division by zero. However, we notice that factorising the numerator and the denominator we get f (x) =
(x2 − 1)(x2 + 1) , x2 − 1
and so, as long as x 6= ±1, we have f (x) = x2 + 1. This means that, to sketch the graph of f (x), we start by sketching the graph of the parabola y = x2 + 1, which has a y-intercept when x = 0, i.e. when y = 1, no x-intercepts as y = 0 gives x2 + 1 = 0 which has no real solutions, and a turning point which is a minimum at the point (0, 1). We then exclude the points (1, 2) and (−1, 2) on the parabola at which f (x) itself is undefined to get the sketch in Figure 2.22.
y y = f (x) 2 1 −1 O 1
x
Figure 2.22: For Exercise 2.1, a sketch of the graph of f (x). (Note that the points at
which f (x) is undefined are marked by a “◦”.) Solution to exercise 2.2 Using (2.1), we have tan(θ ± ϕ) =
46
sin(θ ± ϕ) , cos(θ ± ϕ)
2.2. Solutions to exercises
and so, using the compound-angle formulae in (2.5), we get sin ϕ sin θ ± sin θ cos ϕ ± cos θ sin ϕ cos θ cos ϕ tan(θ ± ϕ) = = , sin θ sin ϕ cos θ cos ϕ ∓ sin θ cos ϕ 1∓ cos θ cos ϕ
2
if we divide the numerator and denominator by cos θ cos ϕ and cancel where appropriate. Thus, using (2.1) again, we have tan(θ ± ϕ) =
tan θ ± tan ϕ , 1 ∓ tan θ tan ϕ
as required. Indeed, observe that this only makes sense if θ, ϕ 6= (2n + 1) π2 for n ∈ Z as, if this isn’t true, we can’t divide through by cos θ cos ϕ or, equivalently, one of tan θ or tan ϕ won’t exist. To deduce a formula for tan(2θ), we set ϕ = θ in the formula for tan(θ + ϕ) to get tan(2θ) = tan(θ + θ) =
2 tan θ tan θ + tan θ = . 1 − tan θ tan θ 1 − tan2 θ
Again, we observe that this only makes sense if θ 6= (2n + 1) π2 for n ∈ Z as, if this isn’t true, tan θ won’t exist. Solution to exercise 2.3 Here the supply and demand functions are straight lines which can be easily sketched using the method outlined in Example 2.1 and the results of doing this are illustrated in Figure 2.23(a). To find the equilibrium price, p∗ , we have q S (p∗ ) = q D (p∗ )
=⇒
p∗ − 4 = 8 − p∗
=⇒
2p∗ = 12,
and so, p∗ = 6. Then, using the demand function, say, we have q ∗ = q D (p∗ ) = 8 − p∗ , and so the equilibrium quantity is q ∗ = 8 − 6 = 2. Consequently, the equilibrium point is (q ∗ , p∗ ) = (2, 6) which, as indicated in Figure 2.23(a), is the point at which the two straight lines intersect. If a percentage [of the price] tax of 100r% is imposed,6 the demand function is still qrD (p) = q D (p) = 8 − p, but the supply function becomes qrS (p) = q S (p − rp) = p − rp − 4, 6
Here we will start by restricting our attention to the case where 0 ≤ r ≤ 1 as, prima facie, these are the values that would appear to be economically sensible. Although, as we will soon see, the economically meaningful values of r will turn out to be 0 ≤ r < 1/2!
47
2. Functions
as the suppliers now see an effective price of p − rp. This means that the equilibrium price in the presence of tax, p∗r , is given by
2
qrS (p∗r ) = qrD (p∗r ) =⇒ p∗r − rp∗r + 4 = 8 − p∗r =⇒ p∗r (2 − r) = 12 12 =⇒ p∗ = , 2−r and so the equilibrium quantity in the presence of tax, qr∗ , is qr∗ = qrD (p∗r ) = 8 −
16 − 8r − 12 4 − 8r 12 = = , 2−r 2−r 2−r
if we use the demand function, qrD (p).7 Sketching the graph of the new supply function, as in Figure 2.23(b), we see that by writing its equation as p=
q 4 + , 1−r 1−r
and noting that
1 4 ≥1 and ≥ 4, 1−r 1−r when considering 0 ≤ r ≤ 1, this means that it is steeper than the old one and that the p-intercept, which is now 4 4(1 − r) + 4r 4r = =4+ , 1−r 1−r 1−r has increased by 4r/(1 − r). In this case, as the equilibrium price has increased from 6 to 12 6(2 − r) + 6r 6r = =6+ , 2−r 2−r 2−r we see that the consumer pays 6r/(2 − r) more. But, as the total tax to be paid by the supplier is given by 12 12r rp∗r = r × = , 2−r 2−r this means that only half of the tax has been passed on to the consumer in this case. Of course, the equilibrium quantity in the presence of the tax must be positive and so, for the market to function, we require that qr∗ > 0
=⇒
4 − 8r >0 2−r
=⇒
4 > 8r
=⇒
1 r< , 2
(bearing in mind that 2 − r > 0 if 0 ≤ r ≤ 1), i.e. the maximum tax, rm , that can be imposed is given by rm = 1/2. 7
Alternatively, we could use the supply function qrS (p) = q S (p − rp) = p − rp − 4,
to find qr∗ . However, we can not use q S (p) = p − 4 as this no longer holds in the presence of the tax!
48
2.2. Solutions to exercises
p
p 8
S
8
new S
12 4−8r , 2−r ) ( 2−r
6
4 1−r
6 4
4
D
D −4
O
2
S
2
8
q
(a)
−4
O
2
8
q
(b)
Figure 2.23: For Exercise 2.3, a sketch of the graphs of the supply and demand functions
indicating the equilibrium point of the market when (a) there is no tax and (b) a percentage of the price tax of 100r% is imposed. (Note that these sketches only make economic sense when q ≥ 0.) Solution to exercise 2.4 The firm’s profit function is π(q) = q 2 + 2q + 2, and its domain is the interval [0, 10] as q ≥ 0 since it is a quantity and q ≤ 10 since the largest quantity it can produce is 10. So, to sketch the graph of this profit function, we start by sketching the parabola y = q 2 + 2q + 2 = (q + 1)2 + 1, in completed square form. This has a y-intercept when q = 0, i.e. y = 2, no q-intercepts as y = 0 gives (q + 1)2 + 1 = 0 which has no real solutions, and a turning point which is a minimum at the point (−1, 1). We then restrict our attention to the relevant values of q, i.e. those that satisfy 0 ≤ q ≤ 10, to get a sketch of the graph of the profit function itself as illustrated in Figure 2.24. Looking at the graph of the profit function, we see that as it is a function π : [0, 10] → [2, 122], its inverse exists since there is a unique q ∈ [0, 10] such that y = π(q) for all y ∈ [2, 122]. Indeed, solving this equation we find that, using the completed square form above, we have p p y = (q+1)2 +1 =⇒ (q+1)2 = y−1 =⇒ q+1 = ± y − 1 =⇒ q = −1± y − 1, which gives us two values of q for each value of y > 1. However, as we must be getting values of q ∈ [0, 10] from our inverse function, we take the ‘+’ sign here (i.e. we discard
49
2. Functions
2
the ‘−’ sign) so that we can get the solutions where q ≥ −1 (instead of getting the solutions where q ≤ −1 which we don’t want). That is, we have found that p q = π −1 (y) = −1 + y − 1, is the sought after inverse function in terms of y. y 122 y = π(q) 2 1 −1 O
10 q
Figure 2.24: For Exercise 2.4, a sketch of the graph of the profit function, π(q). (Note
that dashed parts of the curve are on the parabola but are not part of the graph of the profit function.) Solution to exercise 2.5 To sketch the circle and the rectangular hyperbola, we note that: The circle with equation x2 + y 2 = 1, is centred on the origin and has a radius of 1. Indeed, setting x = 0, we find that its y-intercepts are y = ±1 and, setting y = 0, we find that its x-intercepts are x = ±1. The rectangular hyperbola with equation 2xy = 1
=⇒
y=
1 , 2x
has the x and y-axes, i.e. the lines y = 0 and x = 0, respectively, as its asymptotes since • For the vertical asymptote: As x → 0+ we have y → ∞ and as x → 0− we have y → −∞. • For the horizontal asymptote: As x → ∞ we have y → 0 from above and as x → −∞ we have y → 0 from below.
We also note that it has no x-intercepts (as no value of x makes y = 0) and no y-intercepts (as no value of y makes x = 0). and these curves are illustrated in Figure 2.25.
To find the points of intersection of these two curves we have to solve the equations x2 + y 2 = 1
and
2xy = 1,
simultaneously. This can easily be done in two different ways which we give here for completeness.
50
2.2. Solutions to exercises
Method I: The equation 2xy = 1 tells us that, say, y = 1/(2x) and substituting this into the other equation we get x2 + y 2 = 1
=⇒
1 =1 4x2
x2 +
=⇒
2
4x4 − 4x2 + 1 = 0,
which is a quadratic equation in x2 . By factorising this, say, we find that (2x2 − 1)2 = 0
=⇒
x2 =
1 2
1 x = ±√ . 2
=⇒
Then, substituting these back into the equation y = 1/(2x) we get √ 1 2 1 y= =± = ±√ , 2x 2 2 as the corresponding values of y. Method II: We note that, using our equations, we have (x − y)2 = x2 − 2xy + y 2 = (x2 + y 2 ) − (2xy) = 1 − 1 = 0, and so any solutions we seek must satisfy (x − y)2 = 0 or, equivalently, x = y. If we substitute this into one of our equations, say 2xy = 1, we get 2xy = 1
=⇒
2y 2 = 1
=⇒
y2 =
1 2
=⇒
1 y = ±√ . 2
√ Then, using the equation x = y again, we get x = ±1/ 2 as the corresponding values of x. So, whichever √ method we find √ that the points of intersection of these two √ you choose, √ curves are (1/ 2, 1/ 2) and (−1/ 2, −1/ 2), both of which are indicated in Figure 2.25.
y y=
1 2x
1 −1
O −1
1
x
x2 + y 2 = 1
Figure 2.25: For Exercise 2.5, a sketch of the circle x2 + y 2 = 1 and √ the rectangular √
hyperbola 2xy √ √ = 1. The indicated points of intersection are (1/ 2, 1/ 2) and (−1/ 2, −1/ 2).
51
2. Functions
2
52
Chapter 3 Differentiation 3 Essential reading (For full publication details, see Chapter 1.) Binmore and Davies (2002) Sections 2.7–2.13. Anthony and Biggs (1996) Chapter 6 and parts of Chapter 7. Further reading Simon and Blume (1994) Sections 2.3–2.7 and 3.6, Chapter 4 and Section 5.5. Adams and Essex (2010) Sections 2.1–2.7, parts of Sections 3.1 and 3.3, parts of Sections 4.9 and 4.10. Aims and objectives The objectives of this chapter are as follows. To introduce the idea of a derivative and see how it can be found using various techniques. To use derivatives to find tangent lines and approximate functions using various techniques. To see how derivatives can be used in economics-based subjects. Specific learning outcomes can be found near the end of this chapter.
3.1
Introduction: What is differentiation?
Having revised the idea of a function in the previous chapter, we now turn to differentiation, the process by which we find the derivative of a function. Given a function, f , its derivative at the point a, which we denote by f 0 (a), is given by the formula f (a + h) − f (a) f 0 (a) = lim , h→0 h provided that the limit exists. Indeed, when the limit exists, i.e. when we can find a value for f 0 (a), we say that the function is differentiable at a. Observe that here, we have introduced the notation lim g(h), h→0
53
3. Differentiation
to denote the value1 of the function g(h) as h → 0 (provided, of course, that there is such a value) and we call this value “the limit of g(h) as h → 0” whereas if there is no such value, we say that this limit does not exist.2 To see how this works in practice, we can consider a simple example. Example 3.1 Use the definition to find the derivative of the function f (x) = x2 at the point x = 3.
3
We need to find f 0 (3) and, using the formula above with a = 3, we start by looking at f (3 + h) − f (3) (3 + h)2 − 32 = , h h which, looking at the numerator, is easily simplified to give f (3 + h) − f (3) (9 + 6h + h2 ) − 9 6h + h2 = = = 6 + h. h h h This in turn means that f (3 + h) − f (3) f (3) = lim = lim 6 + h = 6, h→0 h→0 h 0
because, in the limit as h → 0, we see that 6 + h goes to 6. Consequently, we can see that the derivative of f (x) at the point x = 3, i.e. f 0 (3), is 6. Indeed, we can say that the function f (x) = x2 is differentiable at x = 3. Activity 3.1 Use the definition to find the derivative of the function f (x) = x2 at the point x = −1. More generally, instead of finding the derivative of f at individual points, we can find its derivative at a general point, x, by finding f 0 (x). Of course, according to our formula, this would involve finding f (x + h) − f (x) , h→0 h and, provided the limit exists, this will give us another function of x. This can then be used to find the derivative, say f 0 (a), at an individual point, a, by setting x = a in our result. Let’s see how this works. f 0 (x) = lim
Example 3.2 Use the definition to find the derivative of the function f (x) = x2 at the general point x and use this to verify that f 0 (3) = 6 as we found in Example 3.1. We need to find f 0 (x) and, using the formula above, we start by looking at f (x + h) − f (x) (x + h)2 − x2 = , h h 1
That is, a finite real number. In 176 Further Calculus, you will do limits properly, but this simple explanation of what is going on should suffice for our purposes here. In particular, we briefly introduced the ‘→’ notation in Example 2.2 and, with the examples below, you should be able to see what is happening for now. Also, in Section 3.3.4, we will see some examples where a limit fails to exist and we will explain what that means there. 2
54
3.2. How to find derivatives
which, looking at the numerator, is easily simplified to give f (x + h) − f (x) (x2 + 2xh + h2 ) − x2 2xh + h2 = = = 2x + h. h h h This in turn means that f (x + h) − f (x) f (x) = lim = lim 2x + h = 2x, h→0 h→0 h
3
0
because, in the limit as h → 0, we see that 2x + h goes to 2x. Consequently, we can see that the derivative of f (x) at the general point x, i.e. f 0 (x), is 2x which is also a function of x as we should expect.3 Having found this, we can substitute x = 3 into our result to see that f 0 (3) = 2(3) = 6 as we found in Example 3.1. Activity 3.2
Use the result in Example 3.2 to verify your answer to Activity 3.1.
At what point is the derivative of f (x) = x2 equal to (a) 16 and (b) −4? We have seen that a function, f (x), has a derivative, f 0 (x), which is also a function of x. The process by which we go from a function to its derivative is called differentiation. That is, when we have a function, f (x), we differentiate it with respect to x and we sometimes denote this operation by d f (x) which is read as “differentiate f (x) with respect to x”, dx and the outcome of this operation will be the sought after derivative which we can write as df or f 0 (x). dx If we then want to evaluate the derivative of f at the point a, we write df or f 0 (a), dx x=a depending on which notation we are using.
We will see what derivatives tell us about functions in Section 3.3 and, in particular, we will see that some functions do not have derivatives at certain points as the limit in the definition may not exist. But, before we do that, we turn our attention to how we can find the derivative of a function when we don’t want to explicitly use the definition.
3.2
How to find derivatives
The previous section told us how to find derivatives from first principles, but now we want to explore a more convenient way of finding them. The key idea is that we 3
Indeed, as this limit exists for all x ∈ R, we can say that the function f (x) = x2 is differentiable for all x ∈ R.
55
3. Differentiation
introduce standard derivatives which tell us how to differentiate the basic functions that we saw in the previous chapter. Once we know how to differentiate these, the rules of differentiation will allow us to differentiate combinations of these functions.
3.2.1
3
Standard derivatives
In Example 3.2, we used the definition of the derivative to show that the function f (x) = x2 has a derivative given by f 0 (x) = 2x. We now state some results that will allow us to differentiate other elementary functions. Power and root functions If n ∈ Z, we can use the definition of the derivative to show that f (x) = xn
=⇒
f 0 (x) = nxn−1 .
For instance, we note that: If n = 0, we have f (x) = x0 = 1
=⇒
f 0 (x) = 0x−1 = 0,
which tells us the derivative of 1. If n = 1, we have f (x) = x1 = x
=⇒
f 0 (x) = 1x0 = 1,
which tells us the derivative of x. If n = 2, we have f (x) = x2
=⇒
f 0 (x) = 2x1 = 2x,
in agreement with what we saw in Example 3.2. Indeed, we will also use this rule when n ∈ Q, and so we also have things like f (x) = x 2 =
√
which tells us the derivative of
√
1
x
=⇒
1 1 1 f 0 (x) = x− 2 = √ , 2 2 x
x.
Exponential and logarithmic functions If we are using e and ln, the derivatives are very simple, i.e. f (x) = ex
=⇒
f 0 (x) = ex ,
as ex is the special function which is equal to its derivative. We also have f (x) = ln x
56
=⇒
f 0 (x) =
1 , x
3.2. How to find derivatives
which, as we will see in Activity 3.12, follows from the fact that the function ln x is the inverse of ex . If we have another base, a, the derivatives are not so simple. We shall see in Activity 3.9 that f (x) = ax =⇒ f 0 (x) = ax ln a, and, using the change of base formula for logarithms, we will see that f (x) = loga x
=⇒
1 , f (x) = x ln a
=⇒
f 0 (x) = cos x,
3
0
in Section 3.2.2. Sine and cosine functions For the sine function we find that f (x) = sin x and for the cosine function we have f (x) = cos x
=⇒
f 0 (x) = − sin x.
Although, we could have used the fact that the sine and cosine functions are interdefinable, i.e. π π cos x = sin x + and sin x = − cos x + , 2 2
to derive the latter from the former once we have the chain rule (see Exercise 3.2). Indeed, using these standard derivatives, we can then derive the derivatives of the other trigonometric functions using their definitions in terms of sine and cosine together with the rules of differentiation in Section 3.2.2 — see, for example, Activity 3.6(c).
3.2.2
The rules of differentiation
In Section 2.1.2, we saw that there are several standard ways of making new functions from old ones. Here, we see how we can use the standard derivatives, i.e. the derivatives of our basic functions, and rules of differentiation to differentiate new functions that are created from these basic ones in these standard ways. We start with the most straightforward of these which allows us to differentiate linear combinations of functions. The linear combination rule If k and l are constants, this allows us to differentiate the linear combination, kf (x) + lg(x), of two functions f (x) and g(x). It states that df dg d kf (x) + lg(x) = k +l , dx dx dx or, using our shorthand, (kf + lg)0 (x) = kf 0 (x) + lg 0 (x). Indeed, this gives us three more basic rules straightaway, i.e. the
57
3. Differentiation
constant multiple rule: If k is a constant and f (x) is a function, then d df kf (x) = k , dx dx or, using our shorthand, (kf )0 (x) = kf 0 (x).
3
sum rule: If f (x) and g(x) are functions, then d df dg f (x) + g(x) = + , dx dx dx or, using our shorthand, (f + g)0 (x) = f 0 (x) + g 0 (x). difference rule: If f (x) and g(x) are functions, then d df dg f (x) − g(x) = − , dx dx dx or, using our shorthand, (f − g)0 (x) = f 0 (x) − g 0 (x). Activity 3.3 Derive the constant multiple, sum and difference rules from the linear combination rule. Example 3.3
Using these rules we see that:
3 3 1 if f (x) = −3x− 2 , then f 0 (x) = −3 − 21 x− 2 = 32 x− 2 by the constant multiple rule; 1
1
if f (x) = x2 + x 2 , then f 0 (x) = 2x + 12 x− 2 by the sum rule; if f (x) = cos x − sin x, then f 0 (x) = − sin x − cos x by the difference rule; if f (x) = 3 ln x − 4 ex , then f 0 (x) =
3 − 4 ex by the linear combination rule. x
So, in the case of simple combinations of functions such as these, we see that the derivative of the linear combination is given by the linear combination of the derivatives. Activity 3.4 Use the rules above to differentiate the following functions with respect to x. (a) − 3 cos x,
(b) ex + cos x,
(c) 3 sin x − 3 ln x.
Indeed, we can see that using the change of base formula for logarithms from Section 2.1.4, we have ln x loga x = , ln a
58
3.2. How to find derivatives
and so, using the constant multiple rule, we get d ln x 1 d 1 1 1 d loga x = = ln x = = , dx dx ln a ln a dx ln a x x ln a as mentioned in Section 3.2.1. We now look at the other rules of differentiation, i.e. the ones that will allow us to differentiate products, quotients and compositions of functions. The product rule This allows us to differentiate the product of two functions f (x) and g(x). It states that d df dg f (x)g(x) = g(x) + f (x) , dx dx dx or, using our shorthand, [f (x)g(x)]0 = f 0 (x)g(x) + f (x)g 0 (x)]. Let’s have a look at some examples of how it works. Example 3.4
Differentiate the function h(x) = x ex with respect to x.
This is the product of the two functions f (x) = x
and
g(x) = ex ,
f 0 (x) = 1
and
g 0 (x) = ex .
and these give us As such, the product rule tells us that h0 (x) = (1)(ex ) + (x)(ex ) = (1 + x) ex , is the derivative of the function h(x) = x ex with respect to x. Example 3.5
Differentiate the function h(x) = x ln x with respect to x.
This is the product of the two functions f (x) = x
and
g(x) = ln x,
and these give us f 0 (x) = 1
and
g 0 (x) =
1 . x
As such, the product rule tells us that 1 h (x) = (1)(ln x) + (x) = ln x + 1. x 0
is the derivative of the function h(x) = x ln x with respect to x.
59
3
3. Differentiation
Example 3.6
Differentiate the function h(x) = ex ln x with respect to x.
This is the product of the two functions f (x) = ex
3
and
g(x) = ln x,
and these give us f 0 (x) = ex
and
g 0 (x) =
1 . x
As such, the product rule tells us that 1 1 x h (x) = (e )(ln x) + (e ) = e ln x + , x x 0
x
x
is the derivative of the function h(x) = ex ln x with respect to x. Activity 3.5 Use the product rule to differentiate the following functions with respect to x. (a) x sin x, (b) ex cos x, (c) sin x cos x. What can you deduce about the derivative of sin(2x) from your answer to (c)? The quotient rule This allows us to differentiate the quotient of two functions f (x) and g(x). It states that df dg g(x) − f (x) d f (x) dx , = dx dx g(x) [g(x)]2 or, using our shorthand,
f (x) g(x)
0
=
f 0 (x)g(x) − f (x)g 0 (x) . [g(x)]2
Of course, as we saw in Section 2.1.2, this all assumes that the quotient of the two functions is defined for the values of x that we are working with, i.e. it only works for values of x in the domain where g(x) 6= 0. Let’s have a look at some examples of how it works.
Example 3.7
For x 6= 0, differentiate the function h(x) =
This is the quotient of the two functions f (x) = ex
and
g(x) = x,
f 0 (x) = ex
and
g 0 (x) = 1.
and these give us
60
ex with respect to x. x
3.2. How to find derivatives
As such, for x 6= 0, the quotient rule tells us that h0 (x) =
(ex )(x) − (ex )(1) x−1 x = e , 2 x x2
is the derivative of the function h(x) with respect to x.
Example 3.8
For x 6= 1, differentiate the function h(x) =
x3 with respect to x.4 ln x
This is the quotient of the two functions f (x) = x3
and
g(x) = ln x,
and these give us f 0 (x) = 3x2
and
g 0 (x) =
1 . x
As such, for x 6= 1, the quotient rule tells us that 1 2 3 (3x )(ln x) − (x ) x2 (3 ln x − 1) x h0 (x) = = , [ln x]2 [ln x]2 is the derivative of h(x) with respect to x.
Example 3.9
Differentiate the function h(x) =
ln x with respect to x.5 ex
This is the quotient of the two functions f (x) = ln x
and
g(x) = ex ,
and these give us
1 and g 0 (x) = ex . x As such, the quotient rule tells us that x 1 (e ) − (ln x)(ex ) (1 − x ln x) ex 1 − x ln x 0 x h (x) = = = , [ex ]2 x e2x x ex f 0 (x) =
is the derivative of h(x) with respect to x. Activity 3.6 Use the quotient rule to differentiate the following functions with respect to x and find the values of x for which the derivatives exist. (a)
sin x , x
(b)
ex , cos x
(c)
sin x . cos x
What can you deduce about the derivative of tan x from your answer to (c)? 4 5
Here, h(x) is only defined for x 6= 1 since we have ln x = 0 if x = 1. Observe that as ex > 0 for all x ∈ R, we don’t have to worry about dividing by zero here.
61
3
3. Differentiation
The chain rule This allows us to differentiate the composition of two functions f (x) and g(x). It states that d df dg [f (g(x))] = , dx dg dx
3
or, using our shorthand, [f (g(x))]0 = f 0 (g)g 0 (x). Let’s have a look at some examples of how it works. Example 3.10
Differentiate the function h(x) = (2x + 1)3 with respect to x.
The function h(x) = (2x + 1)3 is the composition of the functions f (g) = g 3
and
g(x) = 2x + 1.
As such we have f 0 (g) = 3g 2
and
g 0 (x) = 2,
and so the chain rule tells us that h0 (x) = (3g 2 )(2) = 6g 2 = 6(2x + 1)2 , is the derivative of h(x) with respect to x. Activity 3.7 Verify that this is correct by multiplying out the brackets and differentiating your new expression for h(x) with respect to x. √ Differentiate the function h(x) = 2x + 1 with respect to x. √ The function h(x) = 2x + 1 is the composition of the functions Example 3.11
f (g) =
√
1
g = g2
and
g(x) = 2x + 1.
As such we have
1 1 f 0 (g) = g − 2 and g 0 (x) = 2, 2 and so the chain rule tells us that 1 1 −1 1 0 , h (x) = g 2 (2) = g − 2 = √ 2 2x + 1 is the derivative of h(x) with respect to x.6
Example 3.12 6
Differentiate the function h(x) = ex
3 +2
with respect to x.
In particular, observe that here the original function is only defined if x ≥ −1/2 whereas the derivative is only defined if x > −1/2 (as, in the derivative, x = −1/2 would entail division by zero).
62
3.2. How to find derivatives
The function h(x) = ex
3 +2
is the composition of the functions
f (g) = eg
and
g(x) = x3 + 2.
As such we have f 0 (g) = eg
and
g 0 (x) = 3x2 ,
3
and so the chain rule tells us that 3 +2
h0 (x) = (eg )(3x2 ) = 3x2 ex
,
is the derivative of h(x) with respect to x. Activity 3.8 to x.
Use the chain rule to differentiate the following functions with respect (a) sin(2x),
(b) ln(cos x),
(c) ln(ex ).
Why should your answer to (c) be obvious? The chain rule can also be used to derive some useful results. Activity 3.9 (A useful result) Using the fact that ax = ex ln a , which we saw in Section 2.1.3, show that dax = ax ln a. dx This was mentioned in Section 3.2.1, but there is no need to remember it as you should be able to derive this result if it is needed. Activity 3.10 (Deriving the quotient rule) Derive the quotient rule by writing the quotient f (x) g(x)
as the product f (x)[g(x)]−1 ,
and using the product and chain rules to differentiate it with respect to x. Activity 3.11 (Derivatives of inverse functions) If the function, f , has an inverse, f −1 , then we can let y = f (x) so that x = f −1 (y). Use the chain rule to show that d −1 d f (y) = 1 f (x) . dy dx
63
3. Differentiation
Activity 3.12 We know that if y = ex , then x = ln y. Use the result in Activity 3.11 and the fact that (ex )0 = ex to show that the derivative of ln y with respect to y is 1/y. Using the rules together
3
Sometimes it will be necessary to apply several of the above rules of differentiation in order to find a derivative. This is easily done as long as care is taken to recognise what you are differentiating at each step. Here are two examples that should make the procedure clear. Example 3.13 x.
Differentiate the function l(x) = (x3 + 1) ln(x2 + 4) with respect to
This is the product of the two functions f (x) = x3 + 1
and
g(x) = ln(x2 + 4),
and clearly, f 0 (x) = 3x2 . But to differentiate g(x) we need to use the chain rule because it is a composition. In this case, we have g(h) = ln h which gives us
and
h(x) = x2 + 4,
1 and h0 (x) = 2x, h 1 2x 2x 0 g (x) = (2x) = = 2 , h h x +4 g 0 (h) =
so that
by the chain rule. Now, putting all of this into the product rule gives us 2x 2x(x3 + 1) 0 2 2 3 2 2 = 3x ln(x + 4) + , l (x) = (3x ) ln(x + 4) + (x + 1) x2 + 4 x2 + 4 as the derivative of l(x) with respect to x. Example 3.14
2 +x
Differentiate the function l(x) = ex
ln(x3 + 1) with respect to x.
This is the product of the two functions f (x) = ex
2 +x
and
g(x) = ln(x3 + 1),
and to differentiate f (x) we need to use the chain rule because it is a composition. In this case, we have f (h) = eh
and
h(x) = x2 + x,
f 0 (h) = eh
and
h0 (x) = 2x + 1,
which gives us
64
3.2. How to find derivatives
so that f 0 (x) = (eh )(2x + 1) = (2x + 1) eh = (2x + 1) ex
2 +x
,
by the chain rule. Then, to differentiate g(x), we need to use the chain rule again because it is also a composition. In this case, we have g(h) = ln h which gives us
and
h(x) = x3 + 1,
3
1 and h0 (x) = 3x2 , h 1 3x2 3x2 0 g (x) = (3x2 ) = = 3 , h h x +1 g 0 (h) =
so that
by the chain rule. Now, putting all of this into the product rule gives us x2 +x 3x2 0 x2 +x 3 l (x) = (2x + 1) e ln(x + 1) + e x3 + 1 3x2 2 = (2x + 1) ln(x3 + 1) + 3 ex +x , x +1 as the derivative of l(x) with respect to x. Of course, once you can reliably apply the rules, there is no need to show all of the intermediate working. Activity 3.13 Use the rules of differentiation to differentiate the following functions with respect to x. 2
(a) ex ln(sin x),
(b)
sin(cos x) , esin x
(c) sin2 (3x) + cos2 (3x).
Why should your answer to (c) be obvious?
3.2.3
Higher-order derivatives
As we have seen above, when we differentiate a function, f (x), we find that its derivative, f 0 (x), is also a function of x. In this context, we call f 0 (x) the first-order derivative of f (x) and we can differentiate it again to get the second-order derivative, i.e. we find d2 f d df and we denote this by or f 00 (x). 2 dx dx dx Of course, the second-order derivative will also be a function of x and so we can differentiate it again to get the third-order derivative, i.e. we find d d2 f d3 f and we denote this by or f 000 (x). dx dx2 dx3 We can, of course, do this again and again but the shorthand notation we use can become a bit unwieldy once we pass the third-order derivative. As such, for n ≥ 4, we
65
3. Differentiation
often write the nth-order derivative, i.e. dn f dxn
as f (n) (x),
when we use our shorthand.
3
Example 3.15 Find the first four derivatives of f (x) = sin x. What is the relationship between these derivatives of sin x? We have f (x) = sin x, and so the first-order derivative of f is given by f 0 (x) =
d sin x = cos x. dx
The second-order derivative of f is then d df d 00 f (x) = = cos x = − sin x. dx dx dx The third-order derivative of f is then d d2 f d 000 f (x) = (− sin x) = − cos x. = 2 dx dx dx And, finally, the fourth-order derivative of f is d d3 f d (4) f (x) = (− cos x) = sin x. = 3 dx dx dx So, in particular, we see that f 00 (x) = −f (x), f 000 (x) = −f 0 (x) and f (4) (x) = f (x). Activity 3.14 n ≥ 1?
Using the pattern inherent in Example 3.15, what is f (n) (x) for
Activity 3.15 Find the first four derivatives of f (x) = x ex . Hence deduce an expression for f (n) (x) for n ≥ 1.
3.3
Using derivatives
Derivatives can be very useful in mathematics and economics, but before we see how, we need to understand what derivatives represent.
3.3.1
The meaning of the derivative
If we draw the graph of a function, f , we get the curve with equation y = f (x). At any point on this curve, say the point (a, f (a)), we can draw a chord (or secant line) that connects the given point to another point on the curve. For instance, in Figure 3.1, the
66
3.3. Using derivatives y y = f (x)
f (b)
3 C
f (a) O
a
b
x
Figure 3.1: The line segment C is the chord joining the points (a, f (a)) and (b, f (b)) on
the curve y = f (x). This is extended using the ‘dotted lines’ at both ends so that we can see what line the chord is a line segment of. line segment C is the chord joining the points (a, f (a)) and (b, f (b)) on the curve y = f (x). In particular, we see that the gradient of this chord, let’s call it mC , can be found using the formula f (b) − f (a) , mC = b−a which you should know. To relate this to the derivative, we take some number, h 6= 0, and let b = a + h so that we now have a chord, C, which is joining the points (a, f (a)) and (a + h, f (a + h)). The gradient of this chord is then given by mC (h) =
f (a + h) − f (a) f (a + h) − f (a) = , (a + h) − a h
and, for h 6= 0, this is a function of h since the value of mC will depend on the value of h that we choose. In particular, recalling what we saw in Section 3.1, we can see that f 0 (a) = lim mC (h), h→0
and we want to see what this is telling us about the function, f . We now consider the chords that join the point (a, f (a)) to the points (a + h1 , f (a + h1 )), (a + h2 , f (a + h2 )) and (a + h3 , f (a + h3 )) where h3 > h2 > h1 > 0. These are the line segments on the lines C1 , C2 and C3 which can be seen in Figure 3.2. That is, we have three points that are getting successively closer to the point (a, f (a)) so that we can see the effect of letting h → 0. In particular, as we let h get smaller, we see that the gradients of these particular chords are decreasing. The question is, do the
67
3. Differentiation y y = f (x)
C3 C2 f (a + h3 )
3
C1
T f (a + h2 )
f (a + h1 )
f (a) O
a
a + h1
a + h2
a + h3
x
Figure 3.2: C1 , C2 and C3 are three chords of the curve y = f (x) originating from the
point (a, f (a)). Observe that as the other end of a chord approaches this point, the chords ‘pivot’ about it and their gradients get closer to the gradient of the line, T . gradients of these chords tend to some finite limit as h → 0? That is, does the limit in our expression for f 0 (a) above exist? Hopefully, in Figure 3.2, you can see that as h gets smaller (i.e. as we consider C3 , then C2 and then C1 ), the lines are ‘pivoting’ through the point (a, f (a)) and their gradients are getting closer to the gradient of the line T . Indeed, in the limit as h → 0, the lines we get from extending an arbitrary chord joining the points (a, f (a)) and (a + h, f (a + h)) should become the line T . In particular, this means that the limit of mC (h) as h → 0 exists because it should be equal to the gradient of T . This means that the line T , called the tangent to f at the point (a, f (a)) goes through the point (a, f (a)), and its gradient is the limit, as h → 0, of mC (h), i.e. f 0 (a).
For this reason, we define the gradient of a curve y = f (x) at the point (a, f (a)) to be the gradient of its tangent line at that point and this, as we have seen, is simply the value of f 0 (a).
3.3.2
Tangent lines and linear approximations
Now that we know how the tangent lines to a curve are related to derivatives, we can use derivatives to find the equation of the tangent line to a curve at a given point. This, in turn, will introduce us to a useful way of performing approximations.
68
3.3. Using derivatives
Finding tangent lines Given that f 0 (a) is the gradient of the curve y = f (x) at the point x = a, we can use this to find the equation of the tangent line at this point. In particular, the formula for the gradient of a straight line, i.e. f 0 (a) =
y − f (a) , x−a
(3.1)
gives us the equation of the tangent line as it goes through the point (a, f (a)) and its gradient is given by f 0 (a). Let’s look at a quick example. Example 3.16 when x = 3.
Find the equation of the tangent line to the function f (x) = x2
When x = 3, the point on the curve y = x2 is (3, 9) and we know that f 0 (3) = 6 as f 0 (x) = 2x. Consequently, using (3.1), the equation of the tangent line is given by 6=
y−9 x−3
=⇒
y − 9 = 6x − 18
=⇒
y = 6x − 9.
In particular, when written in this form, we see that the gradient of the line is indeed 6 and the point (3, 9) does indeed lie on it as 6(3) − 9 = 9. Activity 3.16 Find the equation of the tangent line to the function f (x) = ex when x = 1. Linear approximations One use of tangent lines is that they provide us with a simple way of approximating the value of a function. For instance, if we have the tangent line to the function f (x) at the point x = a, the equation of its tangent line, i.e. f 0 (a) =
y − f (a) , x−a
can be rearranged to give us y = f (a) + (x − a)f 0 (a). Now, if we pick a value of x that is close to a, say x∗ , the value of y when x = x∗ , will be y ∗ = f (a) + (x∗ − a)f 0 (a), and this will be ‘close’ to the value of f (x∗ ) as illustrated in Figure 3.3. Of course, if we pick a value of x∗ which is closer to a, the value of y ∗ will be closer to the value of f (x∗ ) and we will have a better approximation to the value of f (x) at this point. As we are approximating the function f by a straight line, we call this a linear approximation to f around a. In particular, we have f (x) ' f (a) + (x − a)f 0 (a),
69
3
3. Differentiation y y = f (x)
3 T
error
f (x∗ )
y∗
f (a) O
a
x
x∗
Figure 3.3: When x∗ is close to a we can use the tangent line at a to find y ∗ which gives
us an approximate value for f (x∗ ). There will, of course, be an ‘error’ involved in this approximation but this can be made smaller if we use values of x∗ which are closer to a. if x is close to a. In Section 3.4, we will discuss Taylor series and these will allow us to find better approximations to f around a, but before we do that, let’s consider a useful application of linear approximations. Example 3.17
Without using a calculator, find an approximate value of 3 e−0.1 .
Given that f (x) = 3 e−x , we have f 0 (x) = −3 e−x
=⇒
f 0 (0) = −3 e0 = −3,
and so, using our linear approximation, we get f (0.1) ' f (0) + (0.1 − 0)f 0 (0) = 3 e0 +(0.1)(−3) = 3 − 0.3 = 2.7, i.e. an approximate value of 3 e−0.1 is 2.7. We note in passing that, using a calculator, we can see that the exact value of 3 e−0.1 is 2.71 to 2dp and so this is a pretty good approximation. Using linear approximations to find changes As well as allowing us to find approximations to f around a, linear approximations can give us useful information about how the value of f is changing as we move from a to, say, a + h. Using our linear approximation, we see that f (a + h) ' f (a) + hf 0 (a)
70
=⇒
f 0 (a) '
f (a + h) − f (a) , h
3.3. Using derivatives
and so, if we denote the change in f by ∆f and the change in x by ∆x = h, we see that ∆f ' f 0 (a) or ∆f ' f 0 (a)∆x. ∆x That is, we can find the approximate value of the change in f if we change x from a to a + h. Of course, the smaller ∆x = h is, the better our approximation. This is illustrated in Figure 3.4. y y = f (x)
T
approx ∆f
y∗
f (a) O
a
exact ∆f
error
f (a + h)
x
a+h ∆x = h
Figure 3.4: Finding an approximate value for ∆f using a linear approximation to f .
Obviously, the smaller the value of the change ∆x = h, the better the approximation for ∆f will be. Example 3.18 Without using a calculator, find the approximate change in 3 e−x if x is increased from zero to 0.1. Hence deduce the approximate value of 3 e−0.1 . Given that f (x) = 3 e−x , we have f 0 (x) = −3 e−x
=⇒
f 0 (0) = −3 e0 = −3,
so, if ∆x = 0.1 as x increases from 0 to 0.1, we find that ∆f ' f 0 (0)∆x = (−3)(0.1) = −0.3, i.e. the change in f is approximately −0.3. Observe that the minus sign is telling us that when x increases from 0 to 0.1, f (x) is decreasing by approximately 0.3. This means that using f (0.1) ' f (0) + ∆f = 3 e0 +(−0.3) = 3 − 0.3 = 2.7, we see that the approximate value of 3 e−0.1 is 2.7 as we would expect from the linear approximation in Example 3.17.
71
3
3. Differentiation
Further, as the derivative of a function gives us information about how f (x) is changing due to changes in x, we often refer to f 0 (a) as the rate of change of f (x) with respect to x when x = a.
3.3.3
3
Applications of derivatives
Derivatives are useful in economics and we now introduce two ways in which they can arise in that subject. The first is their use when discussing ‘marginal’ functions and the second is when they are used in the context of ‘elasticities’. At this point, we will just introduce these ideas and see how they might be useful, but they will also be used when we consider some applications of the material contained in other chapters of this subject guide. Marginal functions In economics, the term marginal denotes the rate of change of a quantity with respect to a variable on which it depends. For instance, if a firm has a cost function, C(q), this tells us the cost of producing q units of their product. The marginal cost of the firm, which we denote by MC(q), would then be given by MC(q) =
dC . dq
This is useful since, using what we saw above, we can see that the marginal cost is telling us (approximately) about how changes in the level of production, q, will incur changes in the costs, C. That is, if the level of production is increased by ∆q, i.e. our production increases from q to q + ∆q, we find that MC(q) =
dC dq
=⇒
MC(q) '
∆C ∆q
=⇒
∆C ' MC(q)∆q,
where ∆C = C(q + ∆q) − C(q) is, of course, the resulting increase in costs. In particular, if q is so large that a change in production of one unit (i.e. ∆q = 1) is small compared to q, we see that ∆C = C(q + 1) − C(q) ' MC(q). That is, in these circumstances, the marginal cost tells us (approximately) the extra cost incurred if the firm wishes to produce one more unit of their good given that they are already producing q units. Example 3.19
A firm has a cost function given by C(q) = 1000 + 5q + q 2 ,
in dollars. Find the marginal cost function for this firm and use it to determine the approximate cost of producing one more unit if the original level of production is 100 units. The marginal cost function, MC(q), is given by MC(q) = C0 (q) = 5 + 2q,
72
3.3. Using derivatives
and so using the fact that the change in cost, ∆C, is related to the change in production, ∆q, by ∆C ' C0 (q)∆q, we see that an increase in production of one unit, i.e. ∆q = 1, gives rise to an increase in costs given by
3
∆C ' C0 (100)(1) = (5 + 2(100))(1) = 205. That is, if the firm is producing 100 units and they increase their production by one unit, they will incur additional costs of approximately 205 dollars. Activity 3.17 By using C(q + 1) − C(q) directly when q = 100, determine how good the approximation found in Example 3.19 is. Generally then, if f is some economically meaningful function, its derivative is referred to as the marginal of f and we denote this by Mf . For instance, if R(q) is the revenue function for a firm, the marginal revenue, MR(q), is just R0 (q). Elasticities Suppose that, as in Section 2.1.5, we have a market where consumers purchase a good according to the demand function, q D (p). If the price of this good was to increase from p to p + ∆p, then there will be a change in the quantity demanded by the consumers from q to q + ∆q. Indeed, since a rise in price will usually lead to a fall in demand, we would expect ∆q to be negative here. In these circumstances, we can see how these changes are related by noting that ∆q = q D (p + ∆p) − q D (p) ' q 0 (p)∆p
=⇒
∆q ' q 0 (p), ∆p
where we have used q to denote the quantity demanded, i.e. q(p) = q D (p). Now, suppose that we are interested in the relative change in quantity, ∆q/q, and the relative change in price, ∆p/p, we can see that the ratio of these two terms is then given by ∆q/q p ∆q p = ' q 0 (p). ∆p/p q ∆p q Indeed, as ∆q is usually negative (whereas the other terms on the left-hand-side, i.e. p, q and ∆p, are all positive) we would usually expect the right-hand-side to be negative as well. With this in mind, we define the [price] elasticity of demand, ε(p), to be p ε(p) = − q 0 (p), q where q = q D (p) and the minus sign is introduced so that, in the usual case where ∆q is negative, we can be sure that ε(p) itself will be positive.7 Then, we can see that using ∆q ∆p ' −ε(p) , q p 7
Some books omit the minus sign in their definition of the elasticity of demand, but it will be useful for us to include it as it is easier to deal with positive quantities.
73
3. Differentiation
we can see how the relative change in quantity is simply related to the relative change in price via the elasticity of demand. Example 3.20 Suppose that the demand function for some good is given by q D (p) = 10p−r where r is a constant. Find the elasticity of demand. What does this tell us about the effect of relative changes in price on relative changes in quantity?
3
Here we have q = q D (p) = 10p−r where r is a constant and so, q 0 (p) = −10rp−r−1 , which means that the elasticity of demand is given by p 0 p −r−1 = r, ε(p) = − q (p) = − − 10rp q 10p−r i.e. ε(p) is a constant too. This means that, using ∆q ∆p ' −ε(p) , q p we see that a relative increase in price of, say, x% will lead to a relative decrease in quantity purchased of (approximately) rx%. Indeed, we will see, in Section 4.2.3, that elasticities can also give us useful information about how the revenue, R = pq, generated from selling a quantity, q, at a price of p per unit will be affected by increases in the price.
3.3.4
Existence of derivatives
Although we will usually be dealing with situations where a function has a derivative at every point where it is defined, we will occasionally encounter situations where there is at least one point at which the derivative of a function does not exist. Just so that we are aware of what this means and the kinds of situation in which it can arise, we consider some of the most common ways in which a derivative can fail to exist at a certain point.8 Discontinuous functions If a function is discontinuous at a point, i.e. there is a point at which the function is not continuous, then the derivative will not exist at that point as the next example illustrates. Example 3.21
8
Show that the derivative of the function defined by ( 1 x≥0 f (x) = , −1 x < 0
See, for example, Section 2.8 of Binmore and Davies (2002) for a discussion of some similar cases.
74
3.3. Using derivatives
does not exist when x = 0. This function is illustrated in Figure 3.5(a) and, clearly, as the function is a continuous horizontal line when x 6= 0, its derivative is defined and equal to zero as long as x 6= 0. However, when x = 0, the function is discontinuous and its derivative, if it exists, would be given by
3
f (h) − f (0) . h→0 h
f 0 (0) = lim However, here we can not just find
f (h) − f (0) , h and let h → 0 as we did in Section 3.1 since the value of f (h) is different depending on whether h is positive or negative. In such cases, we say that the limit we seek, i.e. f (h) − f (0) , h→0 h lim
exists if, firstly, both of the limits lim−
h→0
f (h) − f (0) h
and
lim+
h→0
f (h) − f (0) , h
exist9 and, secondly, if they exist, they must be equal. But, using the given function, we see that (−1) − 0 −1 f (h) − f (0) lim− = lim− = lim− = ∞, h→0 h→0 h→0 h h h and (1) − 0 1 f (h) − f (0) = lim+ = lim+ = ∞, lim+ h→0 h→0 h h→0 h h i.e. neither of these limits exists as ‘∞’ is not a value10 but more of a notational convenience which tells us that a function is getting arbitrarily large in the limit. Consequently, we see that f (h) − f (0) , h→0 h
f 0 (0) = lim
fails to exist too and so the derivative of this function does not exist at x = 0. Of course, the graph of a function can also have a discontinuity due to the presence of a vertical asymptote. In such cases, the function is not actually defined at the value of x where the asymptote occurs and so, because of this, the derivative cannot exist at this point either.11 In both of these cases, as we can’t ascribe a gradient to the function at these points, the function can’t have a tangent line at these points. 9
Notice that the former limit allows us to deal with negative h and the latter allows us to deal with positive h. Also recall that the notation h → 0− and h → 0+ was explained in Example 2.2. 10 That is, it is not a real number. 11 We’ll come across this again in Section 4.4.3.
75
3. Differentiation y
y
y y = x1/3
1 y=
(
y = |x| 1 x≥0 −1 x < 0
O
3
O
x
O
x
x
−1
(a)
(b)
(c)
Figure 3.5: The graphs of three functions that have no derivative at x = 0 as explained in
(a) Example 3.21, (b) Example 3.22 and (c) Example 3.23. We note however that, unlike the functions in (a) and (b), the function in (c) does have a tangent line at x = 0 given by the vertical line with equation x = 0. Continuous functions with ‘corners’ But, even if a function is continuous at every point, the derivative will not exist at points where the curve changes too sharply, i.e. when the curve has a ‘corner’, as the next example illustrates. Example 3.22 when x = 0.
Show that the derivative of the function f (x) = |x| does not exist
This function is illustrated in Figure 3.5(b) and, clearly, as the function is the continuous straight line f (x) = −x when x < 0 and f (x) = x when x > 0, its derivative is defined and equal to −1 when x < 0 and 1 when x > 0. However, when x = 0, the function has a ‘corner’ and its derivative, if it exists, would be given by f (h) − f (0) . h→0 h
f 0 (0) = lim
However here, as in Example 3.21, we can not just find f (h) − f (0) , h and let h → 0 as we did in Section 3.1 since the value of f (h) is different depending on whether h is positive or negative. In such cases, we again say that the limit we seek, i.e. f (h) − f (0) lim , h→0 h exists if, firstly, both of the limits lim−
h→0
f (h) − f (0) h
and
lim+
h→0
f (h) − f (0) , h
exist12 and, secondly, if they exist, they must be equal. But, using the given function, we see that lim−
h→0
76
f (h) − f (0) (−h) − 0 = lim− = lim− −1 = −1, h→0 h→0 h h
3.3. Using derivatives
and
f (h) − f (0) h−0 = lim+ = lim+ 1 = 1, h→0 h→0 h→0 h h i.e. both of these limits exist, but they are clearly not equal. Consequently, we see that f (h) − f (0) f 0 (0) = lim , h→0 h fails to exist and so the derivative of this function does not exist at x = 0. lim+
Observe that, in this case, the limits as h → 0+ and as h → 0− both exist, but the problem occurs because they are not equal and so we cannot ascribe a value to the derivative (i.e. the limit as h → 0) in such situations. In particular, as this means that we can’t ascribe a gradient to f at this point, the function can’t have a tangent line here either. Continuous functions with ‘vertical tangent lines’ Also, if a function is continuous at every point, the derivative will not exist at points where the gradient of the curve becomes ‘infinite’, i.e. when the curve has a ‘vertical tangent line’, as the next example illustrates.
Example 3.23 when x = 0.
Show that the derivative of the function f (x) = x1/3 does not exist
This function is illustrated in Figure 3.5(c) and, clearly, we can see that its derivative is given by 1 f 0 (x) = 31 x−2/3 = 2/3 , 3x which exists as long as x 6= 0. Of course, when x = 0, the derivative cannot exist since, if we were to use this formula, we would have to ‘divide by zero’ and this is never allowed. However, we can see from Figure 3.5(c) that the graph of the function has a vertical tangent line at x = 0 which is given by the vertical line with equation x = 0.13 Thus, we have a situation where the derivative of the function does not exist at x = 0, but it does have a tangent line at that point.
Observe that, in cases where the tangent line to f at a point is a vertical line we cannot use (3.1) to find its equation as its derivative is not defined.14
12
Again, as in Example 3.21, the former limit allows us to deal with negative h and the latter allows us to deal with positive h. 13 Notice that the tangent lines of the function are getting steeper as we move towards x = 0 on the left and shallower as we move away from x = 0 on the right. 14 We’ll come across this again in Section 4.4.3.
77
3
3. Differentiation
3.4
Using higher-order derivatives
We have seen that the first derivative of a function, f , can allow us to find a linear approximation to f around a by using the formula
3
f (x) ' f (a) + (x − a)f 0 (a). However, if we use higher-order derivatives, we can get better approximations to f around a by using the formula f (x) = f (a) + (x − a)f 0 (a) +
(x − a)2 00 (x − a)n (n) f (a) + · · · + f (a) + · · · , 2! n!
(3.2)
which is called the Taylor series for f (x) about x = a.15 You will notice that the right-hand-side of this formula is an infinite series and, for reasons beyond the scope of this course, there will generally be conditions that depend on f and a that determine whether this infinite series does indeed give us the value of f (x) that we expect to get on the left-hand-side. For now, we just note that these conditions can be used to find a set of values of x, that includes the point x = a, for which the formula works. Of course, if the value of x in question does not lie in this set, the formula does not work! In this course, we will often just use the first few terms from the Taylor series to get an approximate value of f (x).16 And, as long as we are considering what this formula tells us about f (x) when x is close to a, these approximations will generally be more than adequate. For instance, if we take n = 1 in this formula, i.e. if we take the first two terms of the Taylor Series, we recover our linear approximation to f around a and, if we take n = 2, we get f (x) ' f (a) + (x − a)f 0 (a) +
(x − a)2 00 f (a), 2!
which is now a quadratic approximation to f around a. Indeed, we have seen how useful the linear approximation is in Section 3.3.2 and the quadratic approximation will be useful in the next chapter.
3.4.1
Maclaurin series
Let’s start with the relatively simple case of a Maclaurin series which is what we call a Taylor series about x = 0. That is, the Maclaurin series of the function f (x) is found by setting a = 0 in (3.2) to get f (x) = f (0) + xf 0 (0) +
xn x2 00 f (0) + · · · + f (n) (0) + · · · . 2! n!
(3.3)
To see how this works, let’s start by finding a simple Maclaurin series. 15
See, for instance, Section 2.13 of Binmore and Davies (2002) for an explanation of where this formula comes from. 16 It will be an approximation since, if we only keep the first few terms from the beginning of the series, we lose all the information about the value of f (x) that is contained in the terms we are neglecting.
78
3.4. Using higher-order derivatives
Example 3.24
Find the Maclaurin series for ex .
Here we have f (x) = ex so that f (0) = 1. We also note that the first three derivatives of this function are f 0 (x) = ex ,
f 00 (x) = ex
and f 000 (x) = ex .
Indeed, it should be clear that f (n) (x) = ex for all n ≥ 1. Then, to use these in (3.3), we need to evaluate these derivatives at x = 0, i.e. we find that f 0 (0) = e0 = 1,
f 00 (0) = e0 = 1 and f 000 (0) = e0 = 1.
Indeed, it should be clear that f (n) (0) = e0 = 1 for all n ≥ 1. Consequently, putting this into (3.3), we get x2 x3 xn e =1+x+ + + ··· + + ··· , 2! 3! n! x
and this formula works for all x ∈ R. In particular, observe that ex is only equal to the series on the right-hand-side if we keep all of the terms in this infinite series. Of course, it is not always so easy to find a Maclaurin series and so let’s look at another example. Example 3.25
Find the Maclaurin series for (1 + x)r when r ∈ Q.
Here we have f (x) = (1 + x)r so that f (0) = 1. We also note that the first three derivatives of this function are f 0 (x) = r(1 + x)r−1 , f 00 (x) = r(r − 1)(1 + x)r−2 and f 000 (x) = r(r − 1)(r − 2)(1 + x)r−3 . Indeed, it should be clear that f (n) (x) = r(r − 1) · · · (r − [n − 1])(1 + x)r−n , for all n ≥ 1. Then, to use these in (3.3), we need to evaluate these derivatives at x = 0, i.e. we find that f 0 (0) = r,
f 00 (x) = r(r − 1) and f 000 (0) = r(r − 1)(r − 2).
Indeed, spotting the pattern, it should be clear that f (n) (0) = r(r − 1) · · · (r − [n − 1]), for all n ≥ 1. Consequently, putting this into (3.3), we get (1 + x)r = 1 + rx +
r(r − 1) 2 r(r − 1)(r − 2) 3 x + x + ··· 2! 3! r(r − 1) · · · (r − [n − 1]) n + x + ··· , n!
and this formula works when |x| < 1.
79
3
3. Differentiation
3
In particular, notice that if r ∈ Q but r 6∈ N, this is always an infinite series as, for any n ∈ N, we will find that r − [n − 1] 6= 0. However, if r ∈ N, we will find a value of n, namely n = r + 1 that makes r − [n − 1] = 0 and this will mean that all of the terms with n ≥ r + 1 will be zero, i.e. the Maclaurin series will be finite and will terminate at the term where n = r. This is a very special Maclaurin series that you may have encountered before as the binomial theorem and we look at some examples of this special case in Activity 3.18. Activity 3.18 Use the Maclaurin series for (1 + x)r which we found in Example 3.25 to find (1 + x)2 and (1 + x)3 . As well as the two Maclaurin series derived in Examples 3.24 and 3.25, you should also remember the following x3 x5 x2n+1 sin x = x − + + ··· + + ··· 3! 5! (2n + 1)! cos x = 1 −
x2n x2 x4 + + ··· + + ··· 2! 4! (2n)!
for x ∈ R.
for x ∈ R.
x2 x3 xn + + · · · + (−1)n+1 + · · · for |x| < 1. 2 3 n In particular, observe how these series differ in their first term, the presence of terms of odd and even degree and the absence of factorials in the series for ln(1 + x). ln(1 + x) = x −
Using Maclaurin series as approximations As we have seen, a Maclaurin series is an infinite series in powers of x and, by taking a certain number of terms, we can use it to approximate a function. In particular, we say that we have the nth-order Maclaurin series of a function if we keep all of the terms up to and including the one in xn and discard the rest. Example 3.26
Write down the second and fourth-order Maclaurin series for cos x.
As we saw above, the Maclaurin series for cos x is given by the infinite series x2 x 4 x2n cos x = 1 − + + ··· + + ··· , 2! 4! (2n)! As such, the second-order Maclaurin series for cos x is 1−
x2 , 2!
which, since there is no x3 term in the Maclaurin series for cos x, is also the third-order Maclaurin series for cos x. Similarly, the fourth-order Maclaurin series for cos x is x2 x4 1− + , 2! 4! 5 which, since there is no x term in the Maclaurin series for cos x, is also the fifth-order Maclaurin series for cos x.
80
3.4. Using higher-order derivatives
These nth-order Maclaurin series can be used to approximate a function, f (x), for values of x close to x = 0. In general, there are two factors that determine how accurate this approximation will be, namely the value of x we are considering: the closer this value of x is to x = 0, the better the approximation will be, and the order of the Maclaurin series we use: the more terms we keep, the better the approximation will be. The precise way of determining the accuracy of such approximations in terms of these two factors will be dealt with in 176 Further Calculus where you will encounter Taylor’s theorem. But, we can see how it works and begin to see how these factors affect the accuracy of our approximations by considering some examples. Example 3.27 Use the fourth-order Maclaurin series for cos x to find an approximate value for cos 1 and cos 2. The fourth-order Maclaurin series for cos x is 1−
x2 x4 + . 2! 4!
This means that taking x = 1, we see that cos 1 ' 1 −
13 12 14 + = , 2 24 24
which is 0.5417 to 4dp. Using a calculator we see that the true value of cos 1 is 0.5403 to 4dp and so this is a good approximation as, to 2dp, it gives us 0.54 either way. Similarly, taking x = 2, we see that cos 2 ' 1 −
22 24 2 1 + =1−2+ =− , 2 24 3 3
which is −0.3333 to 4dp. Using a calculator we see that the true value of cos 2 is −0.4161 to 4dp and so this is a poor approximation as it isn’t even accurate to 1dp. But, of course, we should expect our approximations to be poor if we move too far away from x = 0 as, by definition, the Maclaurin series represents how the function is behaving around x = 0. To see this, consider the curves in Figure 3.6 which illustrate how the fourth-order Maclaurin series for cos x becomes less accurate at approximating the function as we move away from x = 0. The other way in which the accuracy of our approximation to a function can be affected is the number of terms we take in the Maclaurin series. For instance, the second-order Maclaurin series for cos x contains less information about the function than the fourth-order one and so we would expect this to give us a worse approximation. This can be seen in Figure 3.7, which illustrates how the second-order Maclaurin series is even less accurate than the fourth-order one as we move away from x = 0.
81
3
3. Differentiation
3
Figure 3.6: The solid curve is the graph of the function cos x and the dashed curve is the
graph of the fourth-order Maclaurin series for this function. Observe how the Maclaurin series moves away from the function as we take values of x further away from x = 0. Using Maclaurin series to approximate other functions We now look at some ways of finding Maclaurin series for more complicated functions and see how we can use these to find approximations. Example 3.28
Find the fourth-order Maclaurin series for x ex .
There are two ways to do this. We could use (3.3) to see that as f (x) = x ex we have f (0) = 0 and then, using what we found in Activity 3.15 above, i.e. f 0 (x) = (1+x) ex ,
f 00 (x) = (2+x) ex ,
f 000 (x) = (3+x) ex ,
and f (4) (x) = (4+x) ex ,
we see that f 0 (0) = 1,
f 00 (0) = 2,
f 000 (0) = 3,
and f (4) (0) = 4.
So, putting this into (3.3) we see that, keeping terms up to x4 , 0 + x(1) +
x3 x4 x3 x 4 x2 (2) + (3) + (4) = x + x2 + + , 2! 3! 4! 2 6
is the fourth-order Maclaurin series for x ex . Alternatively, since we know that the Maclaurin series for ex is given by ex = 1 + x +
82
x2 x3 x4 + + + ··· , 2! 3! 4!
3.4. Using higher-order derivatives
3
Figure 3.7: The solid curve is the graph of the function cos x, the dotted curve is the graph
of its second-order Maclaurin series and the dashed curve is the graph of its fourth-order Maclaurin series. Observe how the former less accurately tracks the function than the latter as we take values of x further away from x = 0. we can see that x2 x3 x4 x3 x4 xe = x 1 + x + + + + · · · = x + x2 + + + ··· , 2! 3! 4! 2 6 x
as before, if we keep terms up to x4 for the fourth-order Maclaurin series for x ex . This example illustrates a general point: when asked to find a Maclaurin series of a certain order, we can always use the definition and differentiation. But, if the derivatives start to become difficult to calculate, it is always easier to use the Maclaurin series for the elementary functions (which we saw above) and a little algebra to find what we are looking for. Let’s consider another example to see how we can do this in a slightly harder situation. Example 3.29
Find the fourth-order Maclaurin series for cos(ln(1 + x)).
Here we have f (x) = cos(ln(1 + x)) which is a composition where f (x) = cos y with y = ln(1 + x). So we need to look at the Maclaurin series for cos y which is given by y2 y4 + + ··· , 2! 4! and y, in turn, will be given by the Maclaurin series for ln(1 + x), i.e. cos y = 1 −
y = ln(1 + x) = x −
x2 x3 x 4 + + + ··· . 2 3 4!
83
3. Differentiation
So, substituting our series for y into our series for cos y, we can see that 2 4 1 x2 x3 x4 x2 x3 x4 1 f (x) = 1 − x− + − + ··· + x− + − + ··· +··· , 2! 2 3 4 4! 2 3 4 | | {z } {z } A
3
B
and we start by looking at how the terms A and B contribute to the series if we are only interested in terms up to x4 . For A, we have 2 x2 x3 x4 + − + ··· A= x− 2 3 4 x2 x3 x4 x2 x3 x4 = x− + − + ··· x− + − + ··· , 2 3 4 2 3 4
so we can multiply each term in the second bracket by the appropriate terms in the first bracket (taking care to include cross-terms) to get 2 3 2 2 x x x x 11 A = (x)(x) − 2 (x) + 2 (x) + + · · · = x2 − x3 + x4 + · · · , 2 3 2 2 12
where ‘· · · ’ indicates terms we can ignore because their degree is greater than four. Similarly, for B, we have 4 x2 x3 x4 + − + ··· , B = x− 2 3 4
which is the bracketed expression x2 x3 x 4 x− + − + ··· , 2 3 4
multiplied by itself four times. The terms which arise from this product are obtained by multiplying together four objects, one from each occurrence of the bracketed expression. Since the term with lowest power of x in each bracket is x, it is only by taking the x from each bracket that we obtain a term which is at most x4 and so we get B = x4 + · · · , where ‘· · · ’ indicates terms we can ignore because their degree is greater than four. Of course, using similar reasoning, we can see that there will be no further terms for our series as the next term in the cos y series (i.e. the first one we omitted above) is −y 6 /6! and the smallest term this can yield looks like x6 whose degree is greater than four. Therefore, putting this all together, we have A B + + ··· 2! 4! 1 11 4 1 2 3 4 =1− x − x + x + ··· + x + ··· + ··· 2 12 24
cos (ln(1 + x)) = 1 −
5 x2 x3 + − x4 + · · · , 2 2 12 and this gives us the fourth-order Maclaurin series for cos(ln(1 + x)) as we have kept all of the terms up to x4 . =1−
84
3.4. Using higher-order derivatives
Activity 3.19 Find the fourth-order Maclaurin series for cos(ln(1 + x)) by using the definition and differentiation to verify the answer we found in Example 3.29. (Notice that it is harder to work it out using this method!) Once we have the Maclaurin series of a function, f (x), we can use it to estimate the value of the function at some value of x close to zero as we did above.
3
Example 3.30 Use the Maclaurin series we found in Example 3.29 to find an approximate value for cos(ln 1.1) and cos(ln 1.9). To find an approximate value for cos(ln 1.1), we use the Maclaurin series above to get the approximation cos(ln(1 + x)) ' 1 −
x2 x3 5 + − x4 , 2 2 12
and then set x = 0.1 to get cos(ln 1.1) = cos(ln(1+0.1)) ' 1−
0.12 0.13 5 + − 0.14 = 1−0.005+0.0005−0.000042, 2 2 12
which is 0.995458 to 6dp. In passing we note that, using a calculator, the true value is 0.995461 to 6dp and so this is a good approximation as, to 5dp, it gives us 0.99546 either way. To find an approximate value for cos(ln 1.9), we use the approximation above with x = 0.9 to get cos(ln 1.9) = cos(ln(1+0.9)) ' 1−
0.92 0.93 5 + − 0.94 = 1−0.405+0.3645−0.273375, 2 2 12
which is 0.686125 to 6dp. In passing we note that, using a calculator, the true value is 0.800987 to 6dp and so this is a poor approximation as it isn’t even accurate to 1dp. Observe that this approximation has deteriorated much more quickly than the one we used when considering approximate values of cos x in Example 3.27. We won’t pursue the nature of this sensitivity here, but we do reiterate that we should expect our approximations to be poor if we move too far away from x = 0 for, as we have seen, the Maclaurin series is there to represent how the function is behaving around x = 0.
3.4.2
Taylor series
We now briefly consider what happens when we are looking for the Taylor series for f (x) around x = a when a 6= 0. In this case, we follow the general method outlined above, but now we have to use (3.2), i.e. f (x) = f (a) + (x − a)f 0 (a) +
(x − a)2 00 (x − a)n (n) f (a) + · · · + f (a) + · · · , 2! n!
that we saw earlier.
85
3. Differentiation
Example 3.31
3
Find the Taylor series for ex around x = 1.
Here we have f (x) = ex so that f (1) = e. We also note, as in Example 3.24, that f (n) (x) = ex for n ≥ 1. Then, to use these derivatives in (3.2), we need to evaluate them at x = 1, i.e. we find that f (n) (1) = e for n ≥ 1. Consequently, putting this into (3.2), we get ex = e +(x − 1) e +
(x − 1)2 (x − 1)3 (x − 1)n e+ e+··· + e+··· , 2! 3! n!
as the Taylor series for ex around x = 1. Activity 3.20 We can write ex as e1 ex−1 so that values of x around x = 1 correspond to values of x − 1 around x = 0. Use this fact and the Maclaurin series for ex which we found in Exercise 3.24 to derive the result we found in Example 3.31. Activity 3.21
Find the Taylor series for ex around x = 2.
Using Taylor series as approximations We can use the Taylor series of a function around x = a to get approximations to the value of the function for values of x close to x = a in the same way as we used the Maclaurin series of a function to get approximations to the value of the function for values of x close to x = 0 in Section 3.4.1. As the ideas are so similar, we will just take a brief look at how they work. Example 3.32 Find an approximation to e1.1 using (a) the second-order Maclaurin series for ex and (b) the second-order Taylor series for ex around x = 1. How do these approximations compare? For (a), we know from Example 3.24 that the second-order Maclaurin series for ex is given by x2 1+x+ , 2! and, using this, we find that 1.12 = 2.705. 2! Incidentally, the exact value of e1.1 is 3.0042 (to 4dp) and so this approximation doesn’t even agree with this to 1dp. e1.1 ' 1 + 1.1 +
For (b), we know from Example 3.31 that the second-order Taylor series for ex around x = 1 is given by (x − 1)2 e +(x − 1) e + e, 2! and, using this, we find that e1.1 ' e +(1.1 − 0.1) e +
86
(1.1 − 1)2 e = 1.105 e, 2!
3.4. Learning outcomes
which, if we know the value of e, gives us 3.0037 (to 4dp). This agrees with the exact value of e1.1 to 3dp. As we should expect, the answer to (b) gives us a better approximation to e1.1 than the one we found in (a) since x = 1.1 is closer to x = 1 than it is to x = 0. But, on the other hand, the answer to (a) didn’t require us to have any accurate knowledge of the value of e itself! Following on from this example, as we can see in Figure 3.8, we observe that the Maclaurin series for ex is most accurate when x is close to x = 0 whereas the Taylor series for ex about x = 1 is most accurate when x is close to x = 1. This is, of course, exactly what we should expect!
Figure 3.8: The solid curve is the graph of the function ex , the dashed curve is the graph
of its second-order Maclaurin series and the dotted curve is the graph of its second-order Taylor series about x = 1. Observe how, as we might expect, the Maclaurin series is more accurate around x = 0 and this Taylor series is more accurate around x = 1.
Learning outcomes At the end of this chapter and having completed the relevant reading and activities, you should be able to: find simple derivatives using the definition of the derivative; find derivatives using standard derivatives and the rules of differentiation; use the derivative to find tangent lines and use these to approximate functions; solve problems from economics-based subjects that involve derivatives; find Maclaurin and Taylor series and use these to approximate functions.
87
3
3. Differentiation
Solutions to activities Solution to activity 3.1 We need to find the derivative of the function f (x) = x2 at the point x = −1, i.e. f 0 (−1). So, using the definition of the derivative with a = −1, we start by looking at
3
f (−1 + h) − f (−1) (−1 + h)2 − (−1)2 = , h h which, looking at the numerator, is easily simplified to give f (−1 + h) − f (−1) (1 − 2h + h2 ) − 1 −2h + h2 = = = −2 + h. h h h This in turn means that f (−1 + h) − f (−1) f (−1) = lim = lim h→0 h→0 h 0
− 2 + h = −2,
because, in the limit as h → 0, we see that −2 + h goes to −2. Solution to activity 3.2 In Example 3.2, we showed that if f (x) = x2 , then f 0 (x) = 2x. This means that f 0 (−1) = −2 in agreement with what we saw in Activity 3.1. To find the point at which the derivative of f (x) = x2 , i.e. f 0 (x) = 2x, is equal to (a) 16, we see that f 0 (x) = 2x = 16 when x = 8, and (b) −4, we see that f 0 (x) = 2x = −4 when x = −2. Solution to activity 3.3 Given the linear combination rule, i.e. dg d df +l , kf (x) + lg(x) = k dx dx dx we can derive the constant multiple rule by setting l = 0 so that d d df dg df kf (x) = kf (x) + 0g(x) = k +0 =k , dx dx dx dx dx the sum rule by setting k = 1 and l = 1 so that d d df dg df dg f (x) + g(x) = 1f (x) + 1g(x) = 1 +1 = + , dx dx dx dx dx dx and the difference rule by setting k = 1 and l = −1 so that d d df dg df dg f (x) − g(x) = 1f (x) + (−1)g(x) = 1 + (−1) = − . dx dx dx dx dx dx
88
3.4. Solutions to activities
Solution to activity 3.4 For (a), we use the constant multiple rule to see that d d − 3 cos x = −3 cos x = −3 − sin x = 3 sin x. dx dx For (b), we use the sum rule to see that d d d x x x e + cos x = e + cos x = e + − sin x = ex − sin x. dx dx dx
3
For (c), we use the linear combination rule to see that d d d 1 3 3 sin x − 3 ln x = 3 sin x − 3 ln x = 3 cos x − 3 = 3 cos x − . dx dx dx x x Solution to activity 3.5 For (a), h(x) = x sin x is the product of the two functions f (x) = x
and
g(x) = sin x,
f 0 (x) = 1
and
g 0 (x) = cos x.
and these give us As such, the product rule tells us that h0 (x) = (1)(sin x) + (x)(cos x) = sin x + x cos x. For (b), h(x) = ex cos x is the product of the two functions f (x) = ex
and
g(x) = cos x,
and these give us f 0 (x) = ex
and
g 0 (x) = − sin x.
As such, the product rule tells us that h0 (x) = (ex )(cos x) + (ex )(− sin x) = ex (cos x − sin x). For (c), h(x) = sin x cos x is the product of the two functions f (x) = sin x
and
g(x) = cos x,
and these give us f 0 (x) = cos x
and
g 0 (x) = − sin x.
As such, the product rule tells us that h0 (x) = (cos x)(cos x) + (sin x)(− sin x) = cos2 x − sin2 x. Then, using the double angle formulae sin x cos x = sin(2x) and
cos2 x − sin2 x = cos(2x),
89
3. Differentiation
from (2.6), this means that we have d 1 sin(2x) = cos(2x), dx 2
3
so that, using the constant multiple rule, we can deduce that d sin(2x) = 2 cos(2x). dx This result will make sense once we have seen the chain rule and, in particular, Activity 3.8(a). Solution to activity 3.6 sin x For (a), h(x) = is the quotient of the two functions x f (x) = sin x
and
g(x) = x,
f 0 (x) = cos x
and
g 0 (x) = 1.
and these give us As such, the quotient rule tells us that h0 (x) =
x cos x − sin x (cos x)(x) − (sin x)(1) = . 2 x x2
In this case, the original function and the derivative are only defined if x 6= 0. For (b), h(x) =
ex is the quotient of the two functions cos x f (x) = ex
and
g(x) = cos x,
and these give us f 0 (x) = ex
and
As such, the quotient rule tells us that h0 (x) =
g 0 (x) = − sin x.
(ex )(cos x) − (ex )(− sin x) cos x + sin x x = e . 2 [cos x] cos2 x
In this case, the original function and the derivative are only defined if cos x 6= 0, i.e. if x 6= (2n + 1) π2 for n ∈ Z. For (c), h(x) =
sin x is the quotient of the two functions cos x f (x) = sin x
and
g(x) = cos x,
and these give us f 0 (x) = cos x As such, the quotient rule tells us that
and
g 0 (x) = − sin x.
(cos x)(cos x) − (sin x)(− sin x) cos2 x + sin2 x h (x) = = . [cos x]2 cos2 x 0
90
3.4. Solutions to activities
In this case, the original function and the derivative are only defined if cos x 6= 0, i.e. if x 6= (2n + 1) π2 for n ∈ Z. Indeed, using the Pythagorean identity sin2 x + cos2 x = 1 and the definitions
tan x =
sin x cos x
and
sec x =
1 , cos x
3
from (2.2), (2.1) and Section 2.1.2, we can deduce that d 1 tan x = = sec2 x, 2 dx cos x as long as x 6= (2n + 1) π2 for n ∈ Z. Solution to activity 3.7 Given that h(x) = (2x + 1)3 , we can multiply out the brackets to get h(x) = 8x3 + 12x2 + 6x + 1, which means that h0 (x) = 24x2 + 24x + 6 = 6(4x2 + 4x + 1) = 6(2x + 1)2 , in agreement with what we saw in Example 3.10. Solution to activity 3.8 For (a), h(x) = sin(2x) is the composition of the functions f (g) = sin g
and
g(x) = 2x.
As such we have f 0 (g) = cos g
and
g 0 (x) = 2,
and so the chain rule tells us that h0 (x) = (cos g)(2) = 2 cos(2x), which agrees with what we found in Activity 3.5(c). For (b), h(x) = ln(cos x) is the composition of the functions f (g) = ln g
and
g(x) = cos x.
As such we have
1 and g 0 (x) = − sin x, g and so the chain rule tells us that sin x 1 0 h (x) = (− sin x) = − = − tan x. g cos x f 0 (g) =
For (c), h(x) = ln(ex ) is the composition of the functions f (g) = ln g
and
g(x) = ex .
91
3. Differentiation
As such we have f 0 (g) =
1 g
and
g 0 (x) = ex ,
and so the chain rule tells us that 1 ex h (x) = (ex ) = x = 1. g e 0
3
Of course, this is obvious as ln(ex ) = x and so its derivative with respect to x is therefore 1. Solution to activity 3.9 Given that ax = ex ln a , we use the chain rule with h(g) = eg and g(x) = x ln a to get d dax x ln a x ln a ln a = ax ln a, = e = e dx dx as required. Solution to activity 3.10 Writing the quotient f (x)/g(x) as the product f (x)[g(x)]−1 , the product rule gives us d df −1 −2 dg −1 [g(x)] + f (x) − [g(x)] f (x)[g(x)] , = dx dx dx where we have used the chain rule to differentiate [g(x)]−1 with respect to x. Rewriting this, we then have df dg g(x) − f (x) d f (x) dx , = dx dx g(x) [g(x)]2 which is the quotient rule, as required. Solution to activity 3.11 We have y = f (x) so that x = f −1 (y). Thus, differentiating both sides of the latter with respect to x, we get dx df −1 dy = , dx dy dx where we have used the chain rule on the right-hand-side as y itself is a function of x since y = f (x). This gives us df −1 dy df −1 df 1= =⇒ =1 , dy dx dy dx as required.17 In particular, observe that this formula makes no sense at points where f 0 (x) = 0.
17
See Section 2.9 of Binmore and Davies (2002) for a geometric view of this result.
92
3.4. Solutions to activities
Solution to activity 3.12 Here we have y = f (x) = ex and x = f −1 (y) = ln y so, using Activity 3.11, we see that 1 d d ln y = 1 ex = x = dy dx e
the result from 1 , y
3
as (ex )0 = ex = y. Solution to activity 3.13 There is, generally, no need to apply the rules of differentiation in as much detail as we have been using. So, let’s do the three examples in this activity quickly. 2
For (a), we have h(x) = ex ln(sin x) which is the product of two compositions and so using the product and chain rules we get 2 cos x ex x2 x2 h (x) = 2x e ln(sin x) + e = 2x sin x ln(sin x) + cos x . sin x sin x 0
For (b), we have sin(cos x) , esin x which is the quotient of two compositions and so using the quotient and chain rules we get cos(cos x)(− sin x) esin x − sin(cos x) esin x cos x 0 h (x) = [esin x ]2 sin x cos(cos x) + cos x sin(cos x) . =− esin x h(x) =
For (c), we have h(x) = sin2 (3x) + cos2 (3x) which is the sum of two compositions and so we can easily use the chain rule to see that h0 (x) = 2 sin(3x) cos(3x)(3) + 2 cos(3x)[− sin(3x)](3) = 0. Of course, this is obvious as sin2 (3x) + cos2 (3x) = 1 using (2.2) and so its derivative with respect to x is zero. Solution to activity 3.14 We have seen that the first four derivatives are given by f 0 (x),
f 00 (x) = −f (x),
f 000 (x) = −f 0 (x),
and f (4) (x) = −f 00 (x) = f (x),
which returns us to our original function. Indeed, we can then see that the next four derivatives will be given by f (5) (x) = f 0 (x),
f (6) (x) = −f (x),
f (7) (x) = −f 0 (x) and f (8) (x) = −f 00 (x) = f (x),
93
3. Differentiation
3
which, again, returns us to our original function. This means that, spotting the pattern, we can see that f (x) n = 4, 8, . . . f 0 (x) n = 1, 5, 9, . . . f (n) (x) = −f (x) n = 2, 6, 10, . . . 0 −f (x) n = 3, 7, 11, . . .
for n ≥ 1.
Solution to activity 3.15 To find the first four derivatives of x ex , we use the product rule to see that f 0 (x) = (1)(ex ) + (x)(ex ) = (1 + x) ex , f 00 (x) = (1)(ex ) + (1 + x)(ex ) = (2 + x) ex , f 000 (x) = (1)(ex ) + (2 + x)(ex ) = (3 + x) ex , and f (4) (x) = (1)(ex ) + (3 + x)(ex ) = (4 + x) ex . Indeed, spotting the pattern, we can deduce that f (n) (x) = (n + x) ex for n ≥ 1. Solution to activity 3.16 Here we have f (x) = ex so that f 0 (x) = ex . Then using (3.1), we see that when x = 1 we have y − f (1) =⇒ y − e1 = e1 (x − 1) =⇒ y = e x. f 0 (1) = x−1 as the equation of the tangent line to the function f (x) = ex at x = 1. Solution to activity 3.17 Here we have C(q) = 1000 + 5q + q 2 and so, when operating at q = 100, the increase in cost given an increase in quantity of one is given by ∆C = C(101) − C(100) = (1000 + 5(101) + 1012 ) − (1000 + 5(100) + 1002 ) = 206. This is pretty close to the approximate answer of 205 that we found in Example 3.19, especially if we consider this as a relative error of 1/206 = 0.49% (to 2dp) instead of an absolute error of one. Solution to activity 3.18 The Maclaurin series for (1 + x)2 is given by (1 + x)2 = 1 + 2x +
(2)(1) 2 x = 1 + 2x + x2 , 2!
as all terms involving xn with n ≥ 3 will have a coefficient of zero. Similarly, the Maclaurin series for (1 + x)3 is given by (1 + x)3 = 1 + 3x +
94
(3)(2) 2 (3)(2)(1) 3 x + x = 1 + 3x + 3x2 + x3 , 2! 3!
3.4. Solutions to activities
as all terms involving xn with n ≥ 4 will have a coefficient of zero. Of course, this is exactly what we would get if we just multiplied out the brackets in the usual way! Solution to activity 3.19 To use (3.3), we see that f (x) = cos(ln(1 + x)) gives
3
f (0) = cos(ln 1) = cos 0 = 1, and then, finding the first four derivatives of f (x), we get f 0 (x) = −
sin(ln(1 + x)) , 1+x
f 00 (x) =
sin(ln(1 + x)) − cos(ln(1 + x)) , (1 + x)2
f 000 (x) =
3 cos(ln(1 + x)) − sin(ln(1 + x)) , (1 + x)3
f (4) (x) = −10
cos(ln(1 + x)) , (1 + x)4
after some fairly tortuous differentiation. These then give us f 0 (0) = 0,
f 00 (0) = −1,
f 000 (0) = 3 and f (4) (0) = −10,
if we evaluate them at x = 0. Consequently, putting these into (3.3), we get x2 x3 x4 (−1) + (3) + (−10) + · · · 2! 3! 4! x2 x3 5 4 =1− + − x + ··· , 2 3 12
cos(ln(1 + x)) = 1 + x(0) +
and this gives us the fourth-order Maclaurin series for cos(ln(1 + x)) in agreement with what we saw before in Example 3.29. Notice, however, that this method involved some fairly complicated differentiation whereas the method in Example 3.29 only involved some simple algebra! Solution to activity 3.20 For values of y around y = 0 we have the Maclaurin series ey = 1 + y +
yn y2 y3 + + ··· + + ··· , 2! 3! n!
so that, if we let y = x − 1 for values of x around x = 1, we still have values of y around y = 0, i.e. we can write ex−1 = 1 + (x − 1) +
(x − 1)2 (x − 1)3 (x − 1)n + + ··· + + ··· , 2! 3! n!
95
3. Differentiation
which gives us the Taylor series for ex−1 for values of x around x = 1. So, as ex = e1 ex−1 , this means that ex = e +(x − 1) e +
3
(x − 1)3 (x − 1)n (x − 1)2 e+ e+··· + e+··· , 2! 3! n!
is the Taylor series for ex for values of x around x = 1 in agreement with what we found in Example 3.31. Solution to activity 3.21 To find the Taylor series for ex around x = 2, we can either use (3.2) or the method we saw in Activity 3.20. Method I: Using (3.2), we have f (x) = ex so that f (2) = e2 . We also note, as in Example 3.24, that f (n) (x) = ex for n ≥ 1. Then, to use these derivatives in (3.2), we need to evaluate them at x = 2, i.e. we find that f (n) (2) = e2 for n ≥ 1. Consequently, putting these into (3.2), we get ex = e2 +(x − 2) e2 +
(x − 2)2 2 (x − 2)3 2 (x − 2)n 2 e + e +··· + e +··· , 2! 3! n!
as the Taylor series for ex around x = 2. Method II: Using the method of Activity 3.20, we know that for values of y around y = 0 we have the Maclaurin series ey = 1 + y +
yn y2 y3 + + ··· + + ··· , 2! 3! n!
so that, if we let y = x − 2 for values of x around x = 2, we still have values of y around y = 0, i.e. we can write ex−2 = 1 + (x − 2) +
(x − 2)2 (x − 2)3 (x − 2)n + + ··· + + ··· , 2! 3! n!
which gives us the Taylor series for ex−2 for values of x around x = 2. So, as ex = e2 ex−2 , this means that ex = e2 +(x − 2) e2 +
(x − 2)n 2 (x − 2)2 2 (x − 2)3 2 e + e +··· + e +··· , 2! 3! n!
is the Taylor series for ex for values of x around x = 2 in agreement with what we have just found using the other method.
Exercises Exercise 3.1 Find the derivatives of the following functions. (a) esin x cos x,
96
(b)
tan x , ex2
(c) sin(x ex ).
3.4. Solutions to exercises
Exercise 3.2 Use the compound-angle formulae to show that π cos x = sin x + 2
π sin x = − cos x + . 2
and
Hence use the chain rule to derive the derivative of cos x from the derivative of sin x. Exercise 3.3 Verify that the point (e, e) is on the curve with equation y = x ln x, and find the equation of the tangent line to the curve at this point. Consider, for some constants a and b, the curve with equation y = ax2 + b. For what values of a and b does this curve pass through the point (e, e) with the same tangent line as the one you found above? Exercise 3.4 Suppose the demand function for a good is 1
q D (p) = p
1 + p4
.
Find the elasticity of demand in terms of p and verify that it is positive if p > 0. Exercise 3.5 Find the fourth-order Maclaurin series for ln
1 + sin x . 1+x
Solutions to exercises Solution to exercise 3.1 We apply the rules of differentiation ‘quickly’ as we did in Activity 3.13. (a) The function h(x) = esin x cos x is a product that has the composition esin x as one of its terms. As such, applying the product rule we get 0
h (x) =
e
sin x
cos x cos x + e
sin x
− sin x
=
2
cos x − sin x esin x ,
where we have used the chain rule to differentiate the composition.
97
3
3. Differentiation 2
(b) The function h(x) = (tan x)/ ex is a quotient whose denominator is the 2 composition ex . As such, applying the quotient rule we get 2 x2 x2 (sec x) e −(tan x) e (2x) sec2 x − 2x tan x h0 (x) = = , 2 [ex ]2 ex2 where we have used the fact, from Activity 3.6(c), that the derivative of tan x is sec2 x and the chain rule to differentiate the composition.
3
Also note that this derivative can be found by writing the function as 2 h(x) = (tan x) e−x and, if we do this, we would use the product rule instead of the quotient rule. (c) The function h(x) = sin(x ex ) is the composition sin x after x ex where the latter function is a product. As such, applying the chain rule we get 0 x x x h (x) = cos(x e ) (1) e +x(e ) = (1 + x) ex cos(x ex ), where we have used the product rule to differentiate the product. Solution to exercise 3.2 Using the compound-angle formulae from (2.5), we have π π π = sin x cos + cos x sin = cos x, sin x + 2 2 2 and
π π π cos x + = cos x cos − sin x sin = − sin x, 2 2 2 as required. Indeed, notice that we have used the facts, from Activity 2.3, that sin(π/2) = 1 and cos(π/2) = 0. Now, using chain rule and the derivative of sin x, we see that π π π d sin x + = cos x + (1) = cos x + , dx 2 2 2
which, using the results we showed above, becomes
d cos x = − sin x, dx as required. Solution to exercise 3.3 Substituting x = e into y = x ln x we get y = e ln e = e(1) = e, and so the point (e, e) lies on this curve. The gradient of the curve at any point is given by the derivative of f (x) = x ln x and so, using the product rule, we get 1 0 f (x) = (1) ln(x) + x = ln(x) + 1. x
98
3.4. Solutions to exercises
Thus, when x = e, the gradient of the curve is given by f 0 (e) = ln(e) + 1 = 1 + 1 = 2, which means that, using (3.1), we get 2=
y−e x−e
=⇒
y − e = 2(x − e)
=⇒
y = 2x − e,
3
as the equation of the tangent line to the curve y = x ln x at the point (e, e). The curve y = ax2 + b will have a tangent line at (e, e) which is the same as the one we have just found if, firstly, the curve goes through the point (e, e), i.e. a and b must satisfy e = a e2 +b, and, secondly, it has the same gradient at e, i.e. if the derivative of g(x) = ax2 + b at x = e is two. That is, as g 0 (x) = 2ax
we need
g 0 (e) = 2a e,
to be two. Thus, from the second condition, we have 2a e = 2 and, from the first condition, we have 1 2 e= e +b e
=⇒
1 a= , e
=⇒
b = e − e = 0.
Consequently, we see that when a = 1/ e and b = 0 the curve y = ax2 + b passes through the point (e, e) with the same tangent line as the one we found above. Solution to exercise 3.4 We have the demand function 1
q D (p) = p
1 + p4
1
= (1 + p4 )− 2 ,
and so, setting q = q D (p), we can use the chain rule to get the derivative 3 1 −2p3 q 0 (p) = − (1 + p4 )− 2 (4p3 ) = 3 . 2 (1 + p4 ) 2
Then, using the definition of the elasticity of demand from Section 3.3.3, we have ! p 0 p −2p3 2p4 ε(p) = − q (p) = − = , 1 3 q 1 + p4 (1 + p4 )− 2 (1 + p4 ) 2 in terms of p. Indeed, when p > 0, we have p4 > 0 and 1 + p4 > 0, which means that ε(p) > 0 too.
99
3. Differentiation
Solution to exercise 3.5
3
We start by noticing that it really is much easier to make use of the standard Maclaurin series rather than trying to use (3.3) directly on the given function. Especially as, in order to apply (3.3), we would need to find the first four derivatives of the function to answer this question and this would get very messy very quickly! Indeed, if we decide to use the standard Maclaurin series, two methods present themselves. Method I: We start by simplifying the function by using the laws of logarithms from Section 2.1.4. This gives us 1 + sin x = ln(1 + sin x) − ln(1 + x), ln 1+x and so, we can easily use the Maclaurin series for ln(1 + x) from Section 3.4.1, i.e. x2 x3 x4 + − + ··· , 2 3 4 to get the second term in this difference. Then, using the Maclaurin series for sin x, also from Section 3.4.1, we have x3 sin x = x − + ··· , 3! which means that the first term in this difference is x3 + ··· ln(1 + sin x) = ln 1 + x − 3! 2 3 4 x3 1 1 x3 1 = x− − + ··· , + ··· − x− + ··· + x − ··· x − ··· 3! 2 3! 3 4 ln(1 + x) = x −
where we have used the Maclaurin series for ln(1 + x) again in the second line. Now, as we want to keep terms up to x4 , we can see that the brackets in the second term give us x3 x3 x3 x4 x− + ··· x− + · · · = x2 − 2 · x · + · · · = x2 − + ··· , 3! 3! 3! 3 where, here, we’re trying to make it clear that each term that arises from this product is obtained by multiplying out the relevant brackets. Further, we see that the brackets in the last two terms will give us x3 and x4 respectively. Overall, then, we have 1 x4 1 3 1 4 x3 2 + ··· − x − + ··· + x − ··· − x − ··· + ··· ln(1 + sin x) = x − 3! 2 3 3 4 x2 x3 x4 =x− + − + ··· , 2 6 12 for the first part of our difference. Putting these together in our expression for the function, we then have 1 + sin x ln = ln(1 + sin x) − ln(1 + x) 1+x x2 x3 x4 x2 x3 x4 = x− + − + ··· − x − + − + ··· 2 3 4 2 6 12 x3 x4 =− + − ··· , 6 6
100
3.4. Solutions to exercises
and this gives us the required fourth-order Maclaurin series. Method II: We could also use the Maclaurin series for sin x which we saw above to get 1 + sin x = 1 + x −
x3 + ··· , 3!
and the fact that
3 1 = (1 + x)−1 = 1 − x + x2 − x3 + x4 + · · · , 1+x
which, with r = −1, follows from a simple application of the Maclaurin series for (1 + x)r that we saw in Example 3.25. This means that we have x3 1 + sin x 2 3 4 = 1+x− + ··· 1 − x + x − x + x + ··· 1+x 3! x3 2 3 4 2 3 1 − x + ··· = 1 1 − x + x − x + x + ··· + x 1 − x + x − x + ··· − 3! x3 x4 =1− + + ··· , 3! 3! if we want to keep terms up to x4 . Then, using the Maclaurin series for ln(1 + x) which we saw above, we get 3 1 + sin x x x4 x3 x4 ln = ln 1 + − + + ··· =− + + ··· , 1+x 3! 3! 3! 3! and this gives us the same fourth-order Maclaurin series as the one we found using the other method.
101
3. Differentiation
3
102
Chapter 4 One-variable optimisation Essential reading
4
(For full publication details, see Chapter 1.) Binmore and Davies (2002) Sections 4.1–4.3. Anthony and Biggs (1996) Chapters 8 and 9. Further reading Simon and Blume (1994) Chapter 3. Adams and Essex (2010) Sections 4.4–4.6. Aims and objectives The objectives of this chapter are as follows. To see what first and second-order derivatives tell us about functions. To see how derivatives and other information about a function can be used to sketch curves. To use derivatives to solve problems where a function needs to be optimised. Specific learning outcomes can be found near the end of this chapter.
4.1
Introduction: What is optimisation?
Having seen how to find derivatives in the previous chapter, we now consider what they tell us about a function. In particular, we will see that the first-order derivatives of a function tell us where the function is increasing, stationary or decreasing; and its second-order derivatives tell us where the function is convex or concave. Indeed, once we have access to this information about a function we will be able to do two things. Firstly, we will be able to sketch the curve that represents the graph of a function; and secondly, we will be able to see where a function is optimised, i.e. we will be able to find the points where the function takes its largest and smallest values.
103
4. One-variable optimisation
4.2
Using first-order derivatives
The first-order derivatives of a function allow us to see whether it is increasing or decreasing. If it is neither increasing or decreasing, we say that the function is stationary. As we shall see in Section 4.5, stationary points are important when we are finding the points where a function is optimised.
4.2.1
4
Increasing and decreasing functions
Intuitively, if f (x) is a function defined for all x ∈ R, we would say that it is increasing when the values of f (x) get larger as x gets larger, and decreasing when the values of f (x) get smaller as x gets larger. Or, more precisely, if a and b are any two points in an interval, I, such that a < b, then f is increasing on I if f (a) < f (b), and f is decreasing on I if f (a) > f (b). Indeed, we can see that this makes sense by considering the two functions illustrated in Figure 4.1. y
y
y = f (x)
y = f (x)
00000001010 1111111 1111 0000 1010 1010 1010 1010
f (b)
f (a)
O
a
b
(a) f is increasing
1010 0000 1111 1010 10 1111111 0000000 1010 1010 10 10
f (a)
f (b)
x
O
a
b
x
(b) f is decreasing
Figure 4.1: As x increases, (a) f is increasing as its values get larger and (b) f is decreasing
as its values get smaller. This can also be seen by taking two values of x, say a and b, such that a < b. In (a), the function is increasing because we have f (a) < f (b) and in (b) the function is decreasing because we have f (a) > f (b). However, of more interest here is the fact that we can use derivatives to determine whether a function is increasing or decreasing over some interval, I. To see how this works, consider that the first-order Taylor approximation to f (x) around x = a is given by f (x) = f (a) + (x − a)f 0 (a), and to make this a good approximation, we want x − a to be small. So, if we now consider another value of x, say x = b, where b > a and b − a is small, we see that this
104
4.2. Using first-order derivatives
approximation gives us f (b) = f (a) + (b − a)f 0 (a).
Now, b − a > 0, so we just need to know the sign of f 0 (a) to determine whether f (b) is greater or less than f (a), i.e. whether f is increasing or decreasing as we move from a to b. Indeed, we see that if f 0 (a) > 0, then f is increasing at a because f (b) > f (a), and if f 0 (a) < 0, then f is decreasing at a because f (b) < f (a). Indeed, by letting a be any value of x, we can generalise this to obtain the following useful result. Let I be an interval, 0
if f (x) > 0 for x ∈ I, then f is increasing on I, and if f 0 (x) < 0 for x ∈ I, then f is decreasing on I.
Let’s look at an example to see how this works.
Example 4.1 Determine the intervals on which the function f (x) = x3 − 2x2 − 15x is (a) increasing and (b) decreasing. Differentiating the function with respect to x, we find that f 0 (x) = 3x2 − 4x − 15. This factorises to give us f 0 (x) = (3x + 5)(x − 3), and so, by looking at what is happening away from the points x = −5/3 and x = 3 where f 0 (x) = 0, we see that the sign of this derivative can be found by considering the signs of its two factors, i.e.
3x + 5 x−3 f 0 (x)
x < − 35 − − +
− 53 < x < 3 + − −
3 3 where f 0 (x) > 0 and (b) decreasing on the interval −5/3 < x < 3 where f 0 (x) < 0 as illustrated in Figure 4.2(a). A useful consequence of this is that it tells us something about the tangent lines to the function f (x) at points where it is increasing or decreasing. Recall, from Section 3.3.2, that the tangent line to f (x) at the point x = a has an equation given by y = f (a) + (x − a)f 0 (a),
and, in particular, the gradient of the tangent line is given by f 0 (a). This means that, if f (x) is increasing (or decreasing) at x = a, then f 0 (a) will be positive (or negative) and this, in turn, means that the tangent line at this point will also be an increasing (or decreasing) function of x. This will be useful in a moment, but for now, we can see how this works by looking at Figure 4.3.
105
4
4. One-variable optimisation y = f (x)
y
− 53
3
O
4
y = f (x)
y
O
x
2 3
x
(a)
(b)
Figure 4.2: The graph of f (x) = x3 − 2x2 − 15x indicating the points relevant to (a)
Examples 4.1, 4.2 and 4.4; (b) Examples 4.5 and 4.6. y
y
y = f (x)
y = f (x)
1010 0000 1111 1010 1010 10
f (a) T
0000 1111 1010 1010
f (a)
O
a
O
x
(a) f and T are increasing
a
T
x
(b) f and T are decreasing
Figure 4.3: (a) When f (x) is increasing at x = a, its tangent line at the point (a, f (a))
will also be increasing as the gradient of the curve (and hence the gradient of the tangent line) at this point is positive. (b) When f (x) is decreasing, its tangent line at the point (a, f (a)) will also be decreasing as the gradient of the curve (and hence the gradient of the tangent line) at this point is negative. At this point, we know what a positive or negative derivative tells us about a function but you may be wondering what happens when the derivative is neither positive nor negative. That is, what happens when the derivative is zero? This is very important and we now turn our attention to that.
4.2.2
Stationary points
When we find a point, say x = a, that makes f 0 (x) = 0, the tangent line at that point is horizontal and its Cartesian equation is given by y = f (a). This means that we will have a function which may look like the one illustrated in Figure 4.4. We call such points, i.e. points where f 0 (x) = 0, stationary points.
106
4.2. Using first-order derivatives
y
y = f (x)
f (a)
O
1 0 0 1 0 1
a
T
4
x
Figure 4.4: The point x = a is a stationary point of the function f (x) as f 0 (a) = 0. Observe
that this means that the tangent line to f (x) at the point (a, f (a)) is a horizontal line. There are, essentially, four different kinds of stationary point that we will encounter and these depend on how the function is changing as we move through the stationary point in the direction of increasing x. In particular, as x is increasing through a stationary point at x = a, we have a local minimum if f changes from being increasing to being decreasing at the stationary point, and a local maximum if f changes from being decreasing to being increasing at the stationary point. Of course, f could also be increasing (or decreasing) on both sides of the stationary point and in these cases we have a point of inflection. These four possibilities are illustrated in Figure 4.5 and, in particular, we see that the stationary point we saw earlier in Figure 4.4 is a local minimum. This provides us with a way of classifying any stationary points we find by looking at the sign of the first-order derivative of the function as we move through a stationary point. This is called the first-order derivative test and it runs as follows. As we move through the stationary point in the direction of increasing x, if we find that: f 0 (x) changes from positive to negative, i.e. the function goes from being increasing to being decreasing as we pass through the stationary point, then the stationary point is a local maximum. f 0 (x) changes from negative to positive, i.e. the function goes from being decreasing to being increasing as we pass through the stationary point, then the stationary point is a local minimum. And, if the sign of f 0 (x) does not change, i.e. if the function is increasing (or decreasing) on both sides of the stationary point, then the stationary point is a point of inflection.
107
4. One-variable optimisation y
y y = f (x)
T T
x
O
x
O
y = f (x)
4
(a) f increases on both sides A point of inflection y
(b) f increases and then decreases A local maximum y
y = f (x)
T
y = f (x)
T O
x
(c) f decreases and then increases A local minimum
O
x
(d) f decreases on both sides A point of inflection
Figure 4.5: The four different kinds of stationary point.
Example 4.2 Find the stationary points of the function given in Example 4.1 and classify them by using the first-order derivative test. We saw in Example 4.1 that the derivative of the function can be written as f 0 (x) = (3x + 5)(x − 3), and so the stationary points of this function, i.e. the points that make f 0 (x) = 0, occur when x = −5/3 and x = 3 as you can see in Figure 4.2(a). We can also use what we saw in Example 4.1 to see that, according to the first-derivative test, the stationary point that occurs when: x = −5/3 is a local maximum as f changes from being increasing to being decreasing (i.e. f 0 changes from positive to negative) at the stationary point. x = 3 is a local minimum as f changes from being decreasing to being increasing (i.e. f 0 changes from negative to positive) at the stationary point. This, of course, can be clearly seen in Figure 4.2(a).
108
4.2. Using first-order derivatives
4.2.3
An application: Elasticities revisited
In Section 3.3.3, we saw that the elasticity of demand, ε(p), is defined to be p ε(p) = − q 0 (p), q where q = q D (p) is the demand function and this told us how changes in price will cause changes in the quantity purchased by the consumers. Now, of course, from the point of view of a supplier, you are not interested in such changes per se, but in how such changes affect your revenue. That is, will a change in price, together with the corresponding change in quantity purchased, lead to an increase or decrease in your revenue? To answer this question, we assume that the supplier is a monopoly, i.e. it is the only supplier of a given product to the market. In such cases, the revenue generated by selling at a price per unit, p, is given by R(p) = pq, where q = q D (p) is the quantity that will be purchased by the consumers at this price. Indeed, using the product rule to differentiate this with respect to p, we find that p 0 0 0 R (p) = q + pq (p) = q 1 + q (p) = q 1 − ε(p) , q using the definition of ε(p). So, as q > 0 for this to be economically meaningful, we have: If ε(p) > 1, we see that R0 (p) < 0 and so a small increase in price leads to a decrease in revenue. In such cases we say that demand for the product is elastic. If ε(p) < 1, we see that R0 (p) > 0 and so a small increase in price leads to an increase in revenue. In such cases we say that demand for the product is inelastic. Thus, even though an increase in price will usually lead to a decrease in the quantity that the consumers will demand, the value of the elasticity (i.e. whether it is greater than or less than one) determines how such changes will affect the revenue (i.e. whether it will decrease or increase). Example 4.3 Suppose that the demand function for a good is given by q D (p) = 20 − 2p. Determine the values of p that make the demand (a) elastic and (b) inelastic. In this case, we have q = q D (p) = 20 − 2p and so the elasticity of demand is given by p p p ε(p) = − q 0 (p) = − (−2) = , q 20 − 2p 10 − p as long as p 6= 10. And, of course, we need values of p where 0 ≤ p ≤ 10 in order for the demand function to be economically meaningful. So, for (a), where we want the values of p that make demand elastic, we see that ε(p) > 1
=⇒
p >1 10 − p
=⇒
p > 10 − p,
109
4
4. One-variable optimisation
as 10 − p > 0 since 0 ≤ p ≤ 10. This means that demand is elastic if p > 5 and, in particular, if we have 5 < p ≤ 10 a small increase in price will lead to a decrease in revenue. For (b), similar reasoning shows us that demand is inelastic if p < 5 and, in particular, if we have 0 ≤ p < 5 a small increase in price will lead to an increase in revenue.
4.3
4
Using second-order derivatives
The second-order derivative of a function can allow us to infer useful information about the shape of a function. For instance, they can allow us to infer whether a stationary point is a local maximum or a local minimum and, more generally, whether the function is convex or concave. Indeed, once we understand convexity and concavity, we will be in a position to extend our understanding of what we mean by a point of inflection.
4.3.1
Second-derivatives and stationary points
The key to understanding the link between the shape of a function and its second-derivative is the second-order Taylor approximation to f (x) around x = a, i.e. f (x) = f (a) + (x − a)f 0 (a) +
(x − a)2 00 f (a), 2
and we know that this is a good approximation as long as x − a is small. Now, to start with, let’s suppose that f (x) has a stationary point at x = a, i.e. f 0 (a) = 0, so that our second-order Taylor approximation becomes f (x) = f (a) +
(x − a)2 00 f (a) 2
=⇒
f (x) − f (a) =
(x − a)2 00 f (a). 2
Here, for all x near the stationary point, the sign of f (x) − f (a) on the left-hand-side, i.e. the relative magnitude of f (x) and f (a), is determined by the sign of f 00 (a) on the right-hand-side. That is, the sign of f (x) − f (a) for x near the stationary point is determined by the value of the second-order derivative at the stationary point. Indeed, we see that: If f 00 (a) > 0, then f (x) > f (a) for all x near to a and so the function always lies above the horizontal tangent line at x = a. This means that the stationary point is a local minimum as in Figure 4.5(c). If f 00 (a) < 0, then f (x) < f (a) for all x near to a and so the function always lies below the horizontal tangent line at x = a. This means that the stationary point is a local maximum as in Figure 4.5(b). Thus, the sign of the second-order derivative at a stationary point allows us to infer whether the stationary point is a local maximum or a local minimum. When we classify stationary points in this way, we call it the second-order derivative test. However, observe that if f 00 (a) = 0, then the second-order Taylor approximation tells us nothing useful about the shape of the function as it reduces to f (x) = f (a).
110
4.3. Using second-order derivatives
Example 4.4 Use the second-order derivative test to classify the stationary points of the function in Example 4.1. We saw in Example 4.1 that the first-order derivative of f is f 0 (x) = 3x2 − 4x − 15, and, in Example 4.2, we saw that its stationary points occur when x = −5/3 and x = 3. To use the second-order derivative test, we note that f 00 (x) = 6x − 4,
4
and then use the fact that when x = −5/3, f 00 (x) = −14 < 0 and so this is a local maximum, when x = 3, f 00 (x) = 14 > 0 and so this is a local minimum, in agreement with what we found in Example 4.2.
4.3.2
Convex and concave functions
More generally, the sign of the second-order derivative of a function tells us whether a function is convex or concave. Indeed, we find that: If f 00 (x) > 0 on some interval, we say that f is convex on that interval. If f 00 (x) < 0 on some interval, we say that f is concave on that interval. To get an idea of what this means, consider that a convex function on an interval, I, has f 00 (x) > 0 for all x ∈ I. So, if we take any particular point, say a ∈ I, the tangent line to f at x = a has an equation given by y = f (a) + (x − a)f 0 (a), and so, our second-order Taylor approximation can be written as f (x) = y +
(x − a)2 00 f (a). 2
Now, as f 00 (a) > 0 (recall that a ∈ I too), we see that f (x) > y for all x ∈ I where x 6= a, i.e. these values of f always lie above the values from the tangent line to f at x = a, as illustrated in Figure 4.6(a). But, of course, we can use any a ∈ I when we run this argument and so a convex function is one which lies above all of its tangent lines, as illustrated in Figure 4.6(b). In particular, a function must be convex in the neighbourhood of a local minimum. A similar argument can be given to show that a concave function always lies below all of its tangent lines so that, in particular, a function must be concave in the neighbourhood of a local maximum.
111
4. One-variable optimisation y
y = f (x)
y
y = f (x)
f (x) y
T O
4
x
a
x
(a) f lies above the tangent at a ∈ I
O
x
(b) f lies above all its tangent lines
Figure 4.6: The relationship between a convex function and its tangent lines. (a) When
changing the value of x, we can see that the values of f (x) are greater than the corresponding values of y from the tangent line to f at a, i.e. f lies above this tangent line. (b) By changing the value of a, we can see that f lies above all of its tangent lines.
Activity 4.1 Using an argument similar to the one above, explain why a concave function always lies below all of its tangent lines. This gives us another, more visual, way of deciding whether a function is convex or concave, namely: A function is convex on some interval if it lies above all of its tangent lines in that interval. A function is concave on some interval if it lies below all of its tangent lines in that interval. And, we can see how this all works by continuing with our example. Example 4.5 Determine the intervals on which the function in Example 4.1 is (a) convex and (b) concave. In Example 4.3 we saw that the second-order derivative of the function from Example 4.1 is given by f 00 (x) = 6x − 4, so we find that f 00 (x) > 0 when 6x − 4 > 0 which means that x > 2/3, and f 00 (x) < 0 when 6x − 4 < 0 which means that x < 2/3.
This means that the function is convex on the interval x > 2/3 where f 00 (x) > 0 and concave on the interval x < 2/3 where f 00 (x) < 0 as illustrated in Figure 4.2(b). Indeed, when looking at this figure, observe that when x > 2/3 the function lies above all of its tangent lines in that interval and that when x < 2/3 the function lies below all of its tangent lines in that interval.
112
4.3. Using second-order derivatives
4.3.3
Points of inflection
Not all points of inflection are stationary points like the ones we saw in Section 4.2.2. More generally, a point of inflection is a point where a function changes from being convex to concave (or vice versa) in a certain well-defined way. Technically, we say that: If f 00 (a) = 0 and f 00 (x) changes sign at x = a, then f has a point of inflection at a. As such, we can see that the points indicated in Figure 4.7 as well as the ones we saw earlier in Figure 4.5(a) and (d) are points of inflection although, of course, only the ones in Figure 4.5(a) and (d) are stationary points as well.
y
y T
T
y = f (x)
y = f (x) O
x
a (a)
O
x
a (b)
Figure 4.7: A point of inflection where f changes from (a) convex to concave at a and (b)
concave to convex at a. In particular, observe that neither of these points of inflection is a stationary point because neither of them have a horizontal tangent line, i.e. f 0 (a) 6= 0 in both cases.
Example 4.6
Find any points of inflection of the function in Example 4.1.
We saw in Example 4.4 that the second-order derivative changes sign when x = 2/3 and, furthermore, we can see that f 00 (2/3) = 0. This means that the function in Example 4.1 has a point of inflection when x = 2/3. Indeed, looking at Figure 4.2(b), we can see that when x = 2/3, the function changes from being concave to convex as we should expect from a point of inflection. However, this point of inflection is not a stationary point because f 0 (x) 6= 0 when x = 2/3. It is, perhaps, worth stressing that the condition f 00 (a) = 0 on its own is not enough to guarantee that we have a point of inflection. For instance, the two functions illustrated in Figure 4.8 both have f 00 (0) = 0, but in neither case does the second derivative change sign and so we do not have a point of inflection. Activity 4.2 Show that f 00 (0) = 0 for both of the functions illustrated in Figure 4.8. How can we infer that they have those shapes by looking at (a) the first-order derivative and (b) the second-order derivative of the function?
113
4
4. One-variable optimisation
4
(a) f (x) = x4 − 1
(b) f (x) = 1 − x4
Figure 4.8: Both of these functions have f 00 (0) = 0 but neither of them have a point
of inflection. (a) This is convex on both sides of x = 0 and the function has a local minimum at that point. (b) This is concave on both sides of x = 0 and the function has a local maximum at that point. (The dashed curves in these figures represent the curves y = x2 − 1 in (a) and y = 1 − x2 in (b) for comparison). It is also worth noting that the condition that f 00 (x) changes sign at x = a on its own is not enough to guarantee that we have a point of inflection either. Of course, if f 00 (x) is changing sign at x = a and f 00 (a) exists, we must have f 00 (a) = 0. But, although we do not dwell on it here, sometimes we may encounter functions where f 00 (a) does not exist even though f 00 (x) changes sign at x = a. We will briefly consider what happens in these cases when we look at cusps and asymptotes in Section 4.4.3.
4.4
Curve sketching
One useful application of this material on derivatives and what they tell us about the ‘shape’ of a function is curve sketching. The aim here is to illustrate the behaviour of the curve described by the equation y = f (x) by picking out its main features and where these features occur by means of a sketch. For most functions we will deal with, these features include any points where the curve may cross the axes and the location and nature of any stationary points. But, it may also be necessary to assess how the curve behaves as x → ±∞ and, in particular, assessing whether the function has any asymptotes. A general method for sketching the curve y = f (x) would therefore involve us thinking about the following: x-intercepts: The x-axis is given by the equation y = 0 and so the curve y = f (x) crosses the x-axis at any point (x, 0) for which f (x) = 0. Solving this equation will therefore give us the x-intercepts of the curve if there are any. y-intercept: The y-axis is given by the equation x = 0 and so the curve crosses the y-axis at the point (0, y) for which y = f (0). As f is a function, there can be only one such point and this is the y-intercept. Finding stationary points: We can find the stationary points, as we saw above, by solving the equation f 0 (x) = 0.
114
4.4. Curve sketching
Classifying stationary points: We can also determine whether each of the stationary points is a local maximum, local minimum or point of inflection by using the methods outlined above. Limiting behaviour in the x-direction: We can determine how f (x) is behaving as x → ∞ and as x → −∞.
Of course, in certain cases, it may also be advantageous to think carefully about the intervals in which the function is increasing (or decreasing) or whether the function is convex (or concave). But, generally, the method above should suffice when we sketch most functions.
In particular, observe that a sketch is very different from a plot. A plot involves plotting certain points and joining them up with little regard to any interesting behaviour the curve may be exhibiting elsewhere. A sketch, on the other hand, isolates any interesting behaviour the curve may be exhibiting (such as the ones listed above) and concentrates on these. Please be aware that there is a difference and in this course, we will always want to see sketches and not plots! To see how we can implement the method above, we will start by sketching the relatively simple curves that arise when f is a polynomial. We will then consider how we would proceed when the functions are differentiable, but involve other elementary functions. Then, just so that we are aware of some possible complications, we look at what happens when our function fails to be differentiable at some points.
4.4.1
Sketching curves defined by polynomials
Given what we have seen so far, the only real obstacle to sketching a polynomial is an understanding of the limiting behaviour of this kind of function. The key result here is that, if f (x) is a polynomial, its behaviour as x gets arbitrarily large in magnitude (that is to say, as x → ∞ or x → −∞) is determined solely by its leading term, i.e. the one with the highest power of x. Then, with this in mind, we can look at the term with the highest power of x, let’s say that this is xn , and note that: if n is even, then xn → ∞ as x → ∞ and as x → −∞; whereas if n is odd, then xn → ∞ as x → ∞ and xn → −∞ as x → −∞.
Using these facts and noting how the sign of the coefficient of the term with the highest power of x can influence the sign of the limit, we can determine the limiting behaviour of any polynomial. Activity 4.3 Suppose that f (x) is a polynomial and that, for some constants a 6= 0 and n ∈ N, the term in this polynomial with the highest power of x is axn . Determine the behaviour of f (x) as x → ∞ and as x → −∞ in the cases which arise according to whether a is positive and negative and whether n is even or odd. We can now see how to sketch some polynomials and we start by seeing how to sketch the function that we have been considering throughout this chapter.
115
4
4. One-variable optimisation
Example 4.7 Sketch the curve y = f (x) where f (x) is the function in Example 4.1. From the earlier examples in this section, we know quite a lot about this function and, in particular, we have found and classified its stationary points. But, to sketch this curve, we need to find a bit more information, namely its x-intercepts: These occur when y = 0 and so we solve the equation given by f (x) = 0, i.e. x3 − 2x2 − 15x = 0, which, on taking out the common factor of x and factorising the remaining quadratic, gives us
4
x(x2 − 2x − 15) = 0
=⇒
x(x − 5)(x + 3) = 0.
Thus, the x-intercepts occur when x = −3, x = 0 and x = 5. y-intercept: This occurs when x = 0 and so using y = f (0) we see that the y-intercept occurs when y = 0. Note, in particular, that this means that the curve goes through the origin (as we should have expected since one of the x-intercepts occurs when x = 0). stationary points: We have found the x-coordinates of the stationary points and classified them above (see, for instance, Example 4.2). So, all we need to do here, is use y = f (x) to find the values of y at these points so that we can locate them on our sketch. Doing this, we find that f (x) has a • local maximum when x = −5/3 and y = f (−5/3) = 400/27, and • local minimum when x = 3 and y = f (3) = −36.
limiting behaviour: The term with the highest power of x in f (x) is x3 and so f (x) → ∞ as x → ∞ and f (x) → −∞ as x → −∞.
So, using this information, we begin to sketch this curve by roughly indicating these key features on some axes as in Figure 4.9(a) and then, joining them up with a nice smooth curve, we get the sketch itself as in Figure 4.9(b). In particular, it is worth noting that in this sketch: all of the key features are labelled; the curve has the right kind of limiting behaviour, i.e. f (x) → ∞ as x → ∞ and f (x) → −∞ as x → −∞; and points of inflection which are not stationary points (recall that, in Example 4.5, we saw that this curve has one when x = 2/3) are not usually indicated. Of course, what we see here is similar to what we saw in Figure 4.2, but a sketch must include information about all of the relevant key features.
116
4.4. Curve sketching y = f (x)
y
y
400 27
400 27
−3
− 53
O
3 5 x
−3
− 53
3
O
5 x
−36
−36
(a) The key features
4
(b) The sketch
Figure 4.9: Sketching the curve y = x3 − 2x2 − 15x in Example 4.7. (a) Using what we
have discovered about the key features of the curve, we can begin to see what it must look like. (b) By joining up these key features with a nice smooth curve, we get the sketch itself. Indeed, it can be seen that, unlike plotting a function, sketching it is a bit of an art and it can only be done well by learning to appreciate what your calculations are telling you about its appearance. With this in mind, let’s sketch a function that we haven’t encountered before.
Example 4.8
Sketch the curve y = f (x) where f (x) = 2x4 − 4x3 + 2x2 .
We find the key features of this curve according to the list given above, namely x-intercepts: These occur when y = 0 and so we solve the equation given by f (x) = 0, i.e. 2x4 − 4x3 + 2x2 = 0,
which, on taking out the common factor of 2x2 and factorising the remaining quadratic, gives us 2x2 (x2 − 2x + 1) = 0
=⇒
2x2 (x − 1)2 = 0.
Thus, the x-intercepts occur when x = 0 and x = 1. y-intercept: This occurs when x = 0 and so using y = f (0) we see that the y-intercept occurs when y = 0. Note, in particular, that this means that the curve goes through the origin (as we should have expected since one of the x-intercepts occurs when x = 0). finding the stationary points: These occur when f 0 (x) = 0 and so, noting that f 0 (x) = 8x3 − 12x2 + 4x, we solve the equation 8x3 − 12x2 + 4x = 0,
117
4. One-variable optimisation
which, on taking out a common factor of 4x and factorising the remaining quadratic, gives us 4x(2x2 − 3x + 1) = 0
4x(2x − 1)(x − 1) = 0,
=⇒
and so the stationary points occur when x = 0, x = 1/2 and x = 1. Then, we use y = f (x) to find the values of y at these points so that we can locate them on the sketch. Doing this, we find that • x = 0 gives y = f (0) = 0,
• x = 1/2 gives y = f (1/2) = 1/8, and
4
• x = 1 gives y = f (1) = 0.
So, the stationary points have coordinates given by (0, 0), (1/2, 1/8) and (1, 0). classifying the stationary points: Let’s use the second-order derivative test here. We can see that f 00 (x) = 24x2 − 24x + 4, and so, looking at the stationary points, we have • f 00 (0) = 4 > 0 and so (0, 0) is a local minimum;
• f 00 (1/2) = −2 < 0 and so (1/2, 1/8) is a local maximum; and • f 00 (1) = 4 > 0 and so (1, 0) is a local minimum.
limiting behaviour: The term with the highest power of x in f (x) is 2x4 and so f (x) → ∞ as x → ∞ and as x → −∞.
So, using this information, we begin to sketch this curve by roughly indicating these key features on some axes as in Figure 4.10(a) and then, joining them up with a nice smooth curve, we get the sketch itself as in Figure 4.10(b).
y
y
1 8
1 8
O
1 2
1
(a) The key features
x
y = f (x)
O
1 2
1
x
(b) The sketch
Figure 4.10: Sketching the curve y = 2x4 − 4x3 + 2x2 in Example 4.8. (a) Using what we
have discovered about the key features of the curve, we can begin to see what it must look like. (b) By joining up these key features with a nice smooth curve, we get the sketch itself.
Activity 4.4 Find the points of inflection of the function in Example 4.8.
118
4.4. Curve sketching
4.4.2
Sketching curves defined using other elementary functions
When sketching curves defined using other elementary functions the only real obstacle is, again, an understanding of the limiting behaviour of such functions. For instance, as we saw in Section 2.1.1, exponential functions like ex and e−x have very simple limiting behaviours, i.e. ex → ∞ as x → ∞ and ex → 0 as x → −∞; whereas e−x → 0 as x → ∞ and e−x → ∞ as x → −∞.
But, when functions such as these are multiplied by polynomials (say), it is not clear how this will affect their limiting behaviour. For now, we just state the following fact1 When an exponential is multiplied by a polynomial, the exponential dominates. Thus, for example, the function x3 e−x → 0 as x → ∞ because the exponential e−x → 0 as x → ∞ and this dominates the behaviour of the polynomial, x3 , even though x3 → ∞ as x → ∞. Let’s sketch this curve to see why this is reasonable. Example 4.9
Sketch the curve y = f (x) where f (x) = x3 e−x .
We find the key features of this curve according to the list given above, namely x-intercepts: These occur when y = 0 and so we solve the equation given by f (x) = 0, i.e. x3 e−x = 0. But, as e−x 6= 0 for all x ∈ R, we find that the only x-intercept occurs when x = 0. y-intercept: This occurs when x = 0 and so using y = f (0) we see that the y-intercept occurs when y = 0. Note, in particular, that this means that the curve goes through the origin (as we should have expected since the x-intercept we found occurs when x = 0). finding the stationary points: These occur when f 0 (x) = 0 and so, using the product rule, we get f 0 (x) = (3x2 )(e−x ) + (x3 )(− e−x ) = x2 (3 − x) e−x , and so we solve the equation x2 (3 − x) e−x = 0. But, as e−x 6= 0 for all x ∈ R, we find that the stationary points occur when x = 0 and x = 3. Then, we use y = f (x) to find the values of y at these points so that we can locate them on the sketch. Doing this, we find that 1
In 176 Further Calculus we will encounter techniques for finding limits which are much more sophisticated than the ones that we have seen so far. Once we have these, we will be able to see exactly why this fact is true and be in a better position to assess the limiting behaviour of curves which are defined using other elementary functions.
119
4
4. One-variable optimisation
• x = 0 gives y = f (0) = (0)3 e0 = (0)(1) = 0, and
• x = 3 gives y = f (3) = (3)3 e−3 = 27 e−3 .
So, the stationary points have coordinates given by (0, 0) and (3, 27 e−3 ). classifying the stationary points: Let’s use the second-order derivative test here. We can use the product rule again to see that f 00 (x) = (6x − 3x2 )(e−x ) + (3x2 − x3 )(− e−x ) = (6x − 6x2 + x3 ) e−x , and so, looking at the stationary points, we have
4
• f 00 (0) = (0) e0 = 0 and so the second derivative test fails! However, we can see that as f 0 (x) = x2 (3 − x) e−x , is positive when x < 0 and positive when 0 < x < 3, we can see that this function is increasing on both sides of the stationary point at x = 0. Thus, the first-derivative test tells us that (0, 0) is a point of inflection. • f 00 (3) = (−9) e−3 < 0 and so (3, 27 e−3 ) is a local maximum. limiting behaviour: Using the fact above we would expect the e−x to dominate and this would mean that f (x) → 0 as x → ∞ whereas, as x → −∞, we would expect f (x) → −∞ as x3 → −∞ and e−x → ∞.
Then, using this information, we begin to sketch this curve by roughly indicating these key features on some axes as in Figure 4.11(a) and then, joining them up with a nice smooth curve, we get the sketch itself as in Figure 4.11(b).
y
y
27e−3
27e−3
y = f (x) O
3
(a) The key features
x
O
3
x
(b) The sketch
Figure 4.11: Sketching the curve y = x3 e−x in Example 4.9. (a) Using what we have
discovered about the key features of the curve, we can begin to see what it must look like. (b) By joining up these key features with a nice smooth curve, we get the sketch itself.
Activity 4.5 Does the function in Example 4.9 have any other points of inflection? If so, find them.
120
4.4. Curve sketching
Activity 4.6 Sketch the curve y = f (x) where f (x) = x2 ex and find all of its points of inflection.
4.4.3
Asymptotes and cusps
The method above for sketching y = f (x) assumes, as we generally have throughout this chapter, that the function, f (x), and its derivatives are well-defined for all x ∈ R. But, more generally, there may be points at which the function or some of its derivatives are not defined. When this happens we start to encounter asymptotes and cusps. We will not dwell on this a great deal here, but we can use the following examples to see how this may affect our sketches. Example 4.10
Sketch the curve y = (x − 1)−1 .
Here we have y = f (x) where the function, f (x), is given by f (x) =
1 , x−1
as long as x 6= 1. In particular, this means that we have f 0 (x) = −
1 (x − 1)2
and
f 00 (x) =
2 , (x − 1)3
and so these derivatives aren’t defined at x = 1 either.2 Using these, we can see that when x < 1 we have f (x) < 0, f 0 (x) < 0 and f 00 (x) < 0, meaning that for these values of x the function is negative, decreasing and concave; whereas when x > 1 we have f (x) > 0, f 0 (x) < 0 and f 00 (x) > 0, meaning that for these values of x the function is positive, decreasing and convex. We can also see that the y-intercept of this curve occurs when y = −1 and that f (x) → 0 as x → ±∞ which means that this function has a horizontal asymptote given by y = 0. However, the main feature that concerns us here is the vertical asymptote at x = 1 which comes about because lim f (x) = −∞
x→1−
and
lim f (x) = ∞,
x→1+
as we should expect to see from our discussion of hyperbolae in Section 2.2.4. The sketch of this curve is illustrated in Figure 4.12(a). In particular, observe that in Example 4.10, we have a case like the one mentioned at the end of Section 4.3.3. That is, the function changes from being concave to convex at a point, but there is no point of inflection. This happens because the second derivative of this function does not exist at the point. 2
That is, the function and its derivatives are undefined when x = 1 as that would require us to ‘divide by zero’ and that is never allowed.
121
4
4. One-variable optimisation
Example 4.11
Sketch the curve y = (x − 1)−2 .
Here we have y = f (x) where the function, f (x), is given by f (x) =
1 , (x − 1)2
as long as x 6= 1. In particular, this means that we have f 0 (x) = −
4
2 (x − 1)3
f 00 (x) =
and
6 , (x − 1)4
and so these derivatives aren’t defined at x = 1 either.3 Using these, we can see that when x < 1 we have f (x) > 0, f 0 (x) > 0 and f 00 (x) > 0, meaning that for these values of x the function is positive, increasing and convex; whereas when x > 1 we have f (x) > 0, f 0 (x) < 0 and f 00 (x) > 0, meaning that for these values of x the function is positive, decreasing and convex. We can also see that the y-intercept of this curve occurs when y = 1 and that f (x) → 0 as x → ±∞ which means that this function has a horizontal asymptote given by y = 0. However, the main feature that concerns us here is the vertical asymptote at x = 1 which comes about because lim f (x) = ∞
x→1−
and
lim f (x) = ∞,
x→1+
as now, f (x) is always positive. The sketch of this curve is illustrated in Figure 4.12(b).
Example 4.12
Sketch the curve y = (x − 1)2/3 .
Here we have y = f (x) where the function, f (x), is given by f (x) = (x − 1)2/3 , which is defined for all x ∈ R. However, this means that we have f 0 (x) =
2 3(x − 1)1/3
and
f 00 (x) = −
2 , 9(x − 1)4/3
and so these derivatives aren’t defined at x = 1.4 Using these, we can see that when x < 1 we have f (x) > 0, f 0 (x) < 0 and f 00 (x) < 0, meaning that for these values of x the function is positive, decreasing and concave; whereas when 3
Again, the function and its derivatives are undefined when x = 1 as that would require us to ‘divide by zero’ and that is never allowed.
122
4.5. Optimisation
x > 1 we have f (x) > 0, f 0 (x) > 0 and f 00 (x) < 0, meaning that for these values of x the function is positive, increasing and concave. We can also see that the y-intercept of this curve occurs when y = 1. The sketch of this curve is illustrated in Figure 4.12(c) and we say that this curve has a cusp at x = 1. y
y 1 y= x−1 O −1
y
y= x
1 (x − 1)2
1
4
y = (x − 1)2/3
1 O x=1
x=1
(a)
(b)
x
O
1
x
(c)
Figure 4.12: Sketches of the curves in (a) Example 4.10, (b) Example 4.11 and (c)
Example 4.12. Observe the behaviour of all three of these curves at x = 1: in (a) and (b) we have a vertical asymptote at x = 1 and in (c) we have a cusp at x = 1.
4.5
Optimisation
We have seen how to use derivatives to find and classify the stationary points of a function and we have seen that a local maximum (or local minimum) is a point where the function is larger (or smaller) than it is at other nearby points. However, we now want to find the points, called a global maximum (or global minimum), where the function is larger (or smaller) than it is at all other points. In such cases, we often say that we are looking for the points where the function is optimised. We will see that some functions do not have a global maximum (or a global minimum) even though they may have a local maximum (or a local minimum). In order to determine whether a function, f (x), has a global maximum or a global minimum, it is always useful to ask the following questions. Which local maximum gives the largest value of f (x) and which local minimum gives the smallest value of f (x)? What is the behaviour of f (x) as x → ∞ and as x → −∞?
Then, having answered these questions one should be in a position to identify the global maximum with the largest value of f and the global minimum with the smallest value of f assuming, of course, that these exist. Indeed, one way of making sense of these questions and their answers is to sketch the relevant features of the curve y = f (x) and then, using this sketch, one can then easily identify any global maximum or global minimum that the function may have. 4
We can see that these derivatives are undefined when x = 1 as that would require us to ‘divide by zero’ and that is never allowed. Moreover, observe that this function does not have a vertical tangent line at x = 1 because to the left of x = 1 the gradient is tending to −∞ and to the right of x = 1 the gradient is tending to ∞.
123
4. One-variable optimisation
For instance, consider the function whose graph is sketched in Figure 4.13(a) which has two local maxima and two local minima. If we ask our questions about this function, we see that: Comparing the relevant values, we see that the largest local maximum occurs when x = a and the smallest local minimum occurs when x = b. The function tends to zero as x → ±∞.
So, in this case, it should be clear that the global maximum occurs when x = a and the global minimum occurs when x = b as illustrated in Figure 4.13(b). However, if we have
4
y
y
global max
local max
local max
local max
O
b a
O
x
b a
local min
local min
local min
(a) The sketch
global min
(b) The identification
Figure 4.13: (a) A sketch of a function with two local maxima and two local minima
which tends to zero as x → ±∞. (b) This function has a global maximum and a global minimum as indicated. the function sketched in Figure 4.14(a) and ask our questions about that we see that: Comparing the relevant values, we see that the largest local maximum occurs when x = a and the smallest local minimum occurs when x = b. The function tends to zero as x → −∞ but tends to −∞ as x → ∞.
In this case, as illustrated in Figure 4.14(b), it should be clear that the global maximum still occurs when x = a but now there is no global minimum since we can get far smaller values of the function as x → ∞ than we do from the smallest local minimum. Activity 4.7 Use the sketches in Figures 4.9(b), 4.10(b) and 4.11(b) to determine whether the functions in Examples 4.7, 4.8 and 4.9 have any global maxima or global minima. So, in general, we can see that if f : R → R is a function that is differentiable for all x ∈ R, then its global maximum (or global minimum) can exist if the function is suitably well-behaved as x → ∞ and x → −∞; and if they exist, its global maximum (or global minimum) must occur at a local maximum (or a local minimum). But, having said this, a sketch is still the easiest way to see what is happening. We now turn to some cases of optimisation where things work slightly differently.
124
x
4.5. Optimisation y
y local max
global max
local max
local max
O
x
a
O
local min
x
a
local min
local min
local min
!!
(a) The sketch
4
(b) The identification
Figure 4.14: (a) A sketch of a function with two local maxima and two local minima
which tends to zero as x → −∞ and tends to −∞ as x → ∞. (b) This function has a global maximum but no global minimum as indicated.
4.5.1
Constrained optimisation
Sometimes, it may be necessary to find the maximum (or minimum) value of f (x) when the values of x are constrained (or restricted ). In such cases, there will be some interval, such as x ≥ a or a ≤ x ≤ b, and we need to find the maximum (or minimum) value of f (x) when x can only take these values. For instance, consider the function whose graph is sketched in Figure 4.15(a) which has a local minimum and a local maximum in the interval a ≤ x ≤ b. In this case, we can see that the maximum and minimum values of f (x) for x in this interval must occur at one of the points indicated by a ‘•’. And, by comparing the values of f (x) at these points we can see that the maximum occurs at the local maximum and the minimum occurs at the local minimum as illustrated in Figure 4.15(b). y
y
max
local max
min
local min
O
a
b
(a) The sketch
x
O
a
b
x
(b) The identification
Figure 4.15: (a) A sketch of a function in the interval a ≤ x ≤ b with a local maximum
and a local minimum. (b) This function has a maximum and a minimum as indicated.
125
4. One-variable optimisation
However, suppose we have the function whose graph is sketched in Figure 4.16(a) which, again, has a local minimum and a local maximum in the interval a ≤ x ≤ b. In this case, we can again see that the maximum and minimum values of f (x) for x in this interval must occur at one of the points indicated by a ‘•’. And, by comparing the values of f (x) at these points we can now see that the maximum occurs at the end-point x = a and the minimum occurs at the end-point x = b as illustrated in Figure 4.16(b). y
y max local max
4
local max
local min
local min min
O
a
b
(a) The sketch
x
O
a
b
x
(b) The identification
Figure 4.16: (a) A sketch of a function in the interval a ≤ x ≤ b with a local maximum
and a local minimum. (b) This function has a maximum and a minimum as indicated. Activity 4.8 Use the sketches in Figures 4.9(b), 4.10(b) and 4.11(b) to find the maximum and minimum values of the functions in Example 4.7 when −3 ≤ x ≤ 5, Example 4.8 when 0 ≤ x ≤ 1 and Example 4.9 when 0 ≤ x ≤ 3. So, in general, suppose that we have the interval a ≤ x ≤ b and f is a differentiable function on this interval. In this case, the maximum (or minimum) value of f (x) will occur
either at the local maximum (or local minimum) inside the interval that gives the largest (or smallest) value of f (x) or at one of the end-points of the interval, i.e. at x = a or x = b, if these give the largest (or smallest) value of f (x). This means that we should find the value of f (x) at any local maximum (or local minimum) inside the interval and its value at the end-points of the interval, i.e. f (a) and f (b). Having done this, the maximum (or minimum) will be the largest (or smallest) of these values of f (x). But, of course, a sketch is still the easiest way to see what is happening.
4.5.2
What happens when differentiability fails?
Our discussion of optimisation has assumed that the function in question is differentiable for all relevant values of x whether that means x ∈ R or values of x inside
126
4.5. Optimisation
some interval. However, it is important to note that even if the function is not differentiable at some relevant value(s) of x, we may still find that the maximum (or minimum) value of the function occurs at such a point. For instance, in Sections 3.3.4 and 4.4.3, we considered some ways in which a function could fail to be differentiable at a point. Using these as a guide, we can consider the three functions illustrated in Figure 4.17 which all fail to be differentiable at x = 1. However, despite this, we see that in all three cases the global maximum of the function occurs at x = 1 even though none of these points is a local maximum.5 y
y
O
1
(a) discontinuous
x
y
O
1
x
(b) corner
4
O
1
x
(c) cusp
Figure 4.17: Three functions which are not differentiable at x = 1 because (a) the function
is discontinuous at x = 1, (b) the function has a corner at x = 1 and (c) the function has a cusp at x = 1. Also, thinking about what we saw in Section 4.4.3, the presence of a vertical asymptote may also mean that a global maximum or global minimum does not exist. Of course, as we saw above, a sketch should enable us to see what is happening in any of these cases. Activity 4.9 Consider the curves sketched in Figures 4.12(a) and (b). Do either of these curves have a global minimum or a global maximum? Now suppose that we are only interested in these curves for values of x in the interval 0 ≤ x ≤ 1. Do either of these curves have a maximum or a minimum?
4.5.3
Applications of optimisation
Optimisation problems are very common in economics and we now introduce two ways in which they can arise in that subject. The first is their use when a firm wants to find the level of production which maximises its profit; and the second is when a government wants to find the level of taxation which maximises the revenue generated by a tax that has been imposed on a market. Profit maximisation When a firm sells an amount, q, it makes a profit given by π(q) = R(q) − C(q), 5
That is, in all three cases, as f 0 (1) does not exist it certainly can’t be equal to zero!
127
4. One-variable optimisation
where R(q) is the revenue generated by selling this amount and C(q) is the cost of producing this amount. Obviously, when doing this, the firm will want to sell an amount q that will maximise its profit. Indeed, whereas the costs involved are determined by factors intrinsic to the firm, the revenue generated is given by R(q) = pq, where p, the price per unit, is determined by the market the firm is selling in.
4
As an example, consider the case where the firm is a monopoly, i.e. it is the only supplier of this product to the market. Indeed, as they are the only suppliers and the amount they are supplying is q, the price that the consumers will be willing to pay for this is given by p = pD (q) where pD (q) is, as in Section 2.1.5, the inverse demand function of the market. As such, in this case, the revenue generated by the sale of an amount q is given by R(q) = qpD (q), and this will yield a profit of π(q) = qpD (q) − C(q). Thus, in the case of a monopoly, given the firm’s cost function and the inverse demand function for the market, we should be able to determine the amount, q, that the firm should be selling by finding the value of q that maximises the firm’s profit. Let’s look at an example. Example 4.13
Suppose that a firm is a monopoly with a cost function given by C(q) = q 3 − 10q 2 + 25q + 10,
and the inverse demand function for this good is pD (q) = 10 − q. Find the value of q that will maximise the firm’s profit. This is a constrained optimisation problem as we must have q ≥ 0 as q denotes the amount of good being sold, and q ≤ 10 as, otherwise, the price that the consumers will pay will be negative.
So, we need to maximise the firm’s profit, i.e.
π(q) = qpD (q) − C(q) = q(10 − q) − (q 3 − 10q 2 + 25q + 10) = −q 3 + 9q 2 − 15q − 10, given that q is in the interval given by 0 ≤ q ≤ 10. To do this, we note that π 0 (q) is given by π 0 (q) = −3q 2 + 18q − 15, and so, as the stationary points occur when π 0 (q) = 0, we solve the equation −3q 2 + 18q − 15 = 0
128
=⇒
q 2 − 6q + 5 = 0
=⇒
(q − 1)(q − 5) = 0,
4.5. Optimisation
to see that the stationary points occur when q = 1 and q = 5. We can then see that π 00 (q) = −6q + 18, which, using the second-derivative test, tells us that when: q = 1, we have π 00 (1) = 12 > 0 and so this is a local minimum. q = 5, we have π 00 (5) = −12 < 0 and so this is a local maximum.
This means that the point we seek, i.e. the maximum of the profit function, must occur at q = 5 or at one of the two end-points of our interval. But, using the profit function, we see that π(0) = −10,
π(5) = 15
and
π(10) = −260,
which means that the maximum occurs at q = 5 because it yields the largest profit. Thus, q = 5 will maximise the firm’s profit. Activity 4.10 Sketch the profit function from Example 4.13 to verify that q = 5 does indeed give a maximum. (Do not try to find the q-intercepts here.) Maximising tax revenue In Section 2.1.5, we saw how the supply and demand functions for a market are modified if a tax is imposed. We are now in a position to see what level of tax should be imposed if the government wants to maximise its tax revenue. For instance, if an excise tax of T per unit is imposed, then the government’s tax revenue, R(T ), is given by the tax per unit multiplied by the number of units sold at equilibrium, i.e. R(T ) = qT∗ T, where qT∗ is the equilibrium quantity in the presence of the tax. Of course, we can then use this to find the value of T , say T ∗ , that maximises this tax revenue. Let’s look at an example. Example 4.14 In Example 2.7, we saw how the introduction of an excise tax affected the market in Example 2.6 and that the maximum tax that can be imposed is given by Tm = 4. What excise tax, T ∗ , should be imposed if the government wants to maximise its tax revenue, R(T ), from this market? Sketch a graph of the tax revenue, R(T ), against T and comment on the relationship between the values of Tm and T ∗ . This is a constrained optimisation problem as we must have T ≥ 0 as T is the tax per unit, and T ≤ Tm as, otherwise, the market will cease to function.
129
4
4. One-variable optimisation
So, we need to maximise the tax revenue generated by the tax, R(T ), i.e. T T2 ∗ T =− + 2T, R(T ) = qT T = 2 − 2 2 given that T is in the interval given by 0 ≤ T ≤ Tm with Tm = 4. To do this, we note that R0 (T ) is given by R0 (T ) = −T + 2,
4
and so, as the stationary point occurs when R0 (T ) = 0, we see that we have a stationary point when T = 2. We can then see that R00 (T ) = −1 < 0 which, using the second-derivative test, tells us that this stationary point is a maximum. This means that the point we seek, i.e. the maximum of the tax revenue function, must occur at T = 2 or at one of the two end-points of our interval. But, using the tax revenue function, we see that R(0) = 0,
R(2) = 2
and
R(4) = 0,
which means that the maximum occurs at T = 2 because it yields the largest tax revenue. Thus, we take T ∗ = 2 and, as in the sketch in Figure 4.18, we find that T ∗ is half-way between no tax (i.e. T = 0) and the maximum tax, Tm = 4. RT 2
O
2
4
T
Figure 4.18: A sketch of the tax revenue generated by an excise tax of T for Example 4.14.
Notice how, in the presence of an excise tax, the tax revenue is maximised at a value of T half-way between no tax (i.e. T = 0) and the maximum tax that can be imposed (i.e. Tm = 4). Of course, if a percentage of the price tax of 100r% is imposed, then the government’s tax revenue, R(r), would be given by the tax per unit, rp∗r , multiplied by the number of units sold at equilibrium, i.e. R(r) = rp∗r qr∗ , where p∗r and qr∗ are the equilibrium price and quantity in the presence of the tax. Of course, we can also use this to find the value of r, say r∗ , that maximises this tax revenue. See, for example, Exercise 4.5.
Learning outcomes At the end of this chapter and having completed the relevant reading and activities, you should be able to:
130
4.5. Solutions to activities
use first and second-order derivatives to identify the relevant features of a function; sketch curves by identifying their key features; optimise functions of one variable; solve problems from economics-based subjects that involve optimisation.
Solutions to activities
4
Solution to activity 4.1 A concave function on an interval, I, has f 00 (x) < 0 for all x ∈ I. So, if we take any particular point, say a ∈ I, the tangent line to f at x = a has an equation given by y = f (a) + (x − a)f 0 (a), and so, our second-order Taylor approximation can be written as f (x) = y +
(x − a)2 00 f (a). 2
Now, as f 00 (a) < 0 (recall that a ∈ I too), we see that f (x) < y for all x ∈ I where x 6= a, i.e. these values of f always lie below the values from the tangent line to f at x = a, as illustrated in Figure 4.19(a). But, of course, we can use any a ∈ I when we run this argument and so a concave function is one which lies below all its tangent lines, as illustrated in Figure 4.19(b). In particular, a function must be concave in the neighbourhood of a local maximum. y
y
T
y f (x) y = f (x) O
x
y = f (x) O
x
a
(a) f lies below the tangent at a ∈ I
x
(b) f lies below all its tangent lines
Figure 4.19: The relationship between a concave function and its tangent lines. (a) When
changing the value of x, we can see that the values of f (x) are less than the corresponding values of y from the tangent line to f at a, i.e. f lies below this tangent line. (b) By changing the value of a, we can see that f lies below all of its tangent lines. Solution to activity 4.2 For f (x) = x4 − 1, we see that f 0 (x) = 4x3
and
f 00 (x) = 12x2 ,
131
4. One-variable optimisation
which means, in particular, that f 00 (0) = 0. Then, looking at the first-order derivative, we see that f 0 (x) < 0 for x < 0 and f 0 (x) > 0 for x > 0 which means that the function is decreasing for x < 0 and then increasing for x > 0 as shown in Figure 4.8(a). Or, looking at the second-order derivative, we see that f 00 (x) > 0 for all x 6= 0 and so the function is convex as shown in Figure 4.8(a). For f (x) = 1 − x4 , we see that
f 0 (x) = −4x3
4
and
f 00 (x) = −12x2 ,
which means, in particular, that f 00 (0) = 0. Then, looking at the first-order derivative, we see that f 0 (x) > 0 for x < 0 and f 0 (x) < 0 for x > 0 which means that the function is increasing for x < 0 and then decreasing for x > 0 as shown in Figure 4.8(b). Or, looking at the second-order derivative, we see that f 00 (x) < 0 for all x 6= 0 and so the function is concave as shown in Figure 4.8(b). Solution to activity 4.3 Given that f (x) is a polynomial and that, for some constants a 6= 0 and n ∈ N, the term in this polynomial with the highest power of x is axn . We see that, as x → ∞, we have f (x) → ∞ if a > 0 as axn → ∞, and f (x) → −∞ if a < 0 as axn → −∞,
regardless of whether n is even or odd. However, as x → −∞, we have f (x) → ∞ if a > 0 and n is even as axn → ∞, f (x) → −∞ if a > 0 and n is odd as axn → −∞, f (x) → −∞ if a < 0 and n is even as axn → −∞, and f (x) → ∞ if a < 0 and n is odd as axn → ∞,
where now, it does matter whether n is even or odd. Solution to activity 4.4 In Example 4.8, we found that f 00 (x) = 24x2 − 24x + 4, and so we begin our search for points of inflection by seeing where f 00 (x) = 0. That is, we solve the equation √ 1 1 6 ± 12 2 2 = ±√ , 24x − 24x + 4 = 0 =⇒ 6x − 6x + 1 = 0 =⇒ x = 12 2 12 if we use the ‘quadratic formula’. Now, if f 00 (x) also changes sign at these values of x we have a point of inflection. To see whether this is the case, consider that we now have 1 1 1 1 00 2 x− = 24(x − a)(x − b), f (x) = 24x − 24x + 4 = 24 x − −√ +√ 2 2 12 12 if we let x = a and x = b denote the smaller and the larger values of x we are interested in respectively. This means that, considering the signs of the two factors, we have
132
4.5. Solutions to activities
x −2 as f 00 (x) > 0.
In particular, we have a point of inflection at x = −2 as this makes f 00 (x) = 0 and f 00 (x) changes sign at this point (i.e. the function changes from being concave to being convex at this point). Solution to exercise 4.2 For x > 0, we have f (x) = −12 ln(x) − x2 + 10x
=⇒
f 0 (x) = −
12 − 2x + 10. x
The stationary points of f (x) occur when f 0 (x) = 0 and so we have to solve the equation −
12 + 2x2 − 10x 12 −2x+10 = 0 =⇒ − = 0 =⇒ x2 −5x+6 = 0 =⇒ (x−2)(x−3) = 0. x x
138
4.5. Solutions to exercises
Thus the x-coordinates of the stationary points of f (x) are x = 2 and x = 3. To classify them, we note that f 0 (x) = −12x−1 − 2x + 10
f 00 (x) = 12x−2 − 2 =
=⇒
12 − 2, x2
which tells us that when x = 2, we have f 00 (2) = 3 − 2 = 1 > 0 and so this is a local minimum. x = 3, we have f 00 (3) =
4 3
− 2 = − 23 < 0 and so this is a local maximum.
Thus, the function, f (x), has a local minimum when x = 2 and a local maximum when x = 3. Solution to exercise 4.3 To sketch the curve y = f (x) where f (x) = x3 +
1 = x3 + x−3 , x3
for x 6= 0, we find its key features, namely x-intercepts: These occur when y = 0 and so we solve the equation given by x3 +
1 =0 x3
x6 + 1 =0 x3
=⇒
=⇒
x6 + 1 = 0.
But, as x6 + 1 > 0 for all x ∈ R, we find that this equation has no solutions and so the curve has no x-intercepts. y-intercept: This occurs when x = 0 and so, as the function is not defined when x = 0, we find that the curve has no y-intercepts. finding the stationary points: These occur when f 0 (x) = 0 and so, as f 0 (x) = 3x2 − 3x−4 , we have to solve the equation 3x2 − 3x−4 = 0
=⇒
x2 =
1 x4
=⇒
x6 = 1,
i.e. the stationary points of f (x) occur when x = ±1. Then, we use y = f (x) to find the values of y at these points so that we can locate them on the sketch. Doing this, we find that • x = 1 gives y = f (1) = 1 + 1 = 2, and
• x = −1 gives y = f (−1) = −1 + (−1) = −2.
So, the stationary points have coordinates given by (1, 2) and (−1, −2). classifying the stationary points: The second-order derivative of the function is f 00 (x) = 6x + 12x−5 = 6x +
12 , x5
and so, looking at the stationary points, we have
139
4
4. One-variable optimisation
• f 00 (1) = 6 + 12 = 18 > 0 and so (1, 2) is a local minimum.
• f 00 (−1) = −6 + (−12) = −18 < 0 and so (−1, −2) is a local maximum. limiting behaviour: We see that f (x) → ∞ as x → ∞ and f (x) → −∞ as x → −∞. (Note, in particular, that 1/x3 → 0 as x → ±∞ and so the limiting behaviour is determined by the x3 term in f (x).) In this case, we must also look at what the function is doing near x = 0 as it is undefined there. Indeed, here, because of the 1/x3 term in f (x), we have
4
f (x) → ∞ as x → 0+ , and f (x) → −∞ as x → 0− ,
i.e. the curve y = f (x) has a vertical asymptote when x = 0. Consequently, using this information, we can get the sketch in Figure 4.22. y
y = x3 +
2 −1
O
1 x3 x
1
−2
Figure 4.22: A sketch of the curve y = f (x) from Exercise 4.3.
Indeed, using this sketch, we can clearly see that this function has neither a global minimum nor a global maximum. In particular, notice that the local minimum is not global because our local maximum gives us a smaller value of f (x) and the local maximum is not global since the local minimum gives us a larger value of f (x)! Solution to exercise 4.4 (a) To find the x-intercepts of the curve y = f (x) we set y = 0 and solve the equation 3x5 −25x3 +60x = 0
=⇒
x(3x4 −25x2 +60) = 0
=⇒
x = 0 or 3x4 −25x2 +60 = 0.
To deal with this second possibility, we notice that we have a quadratic equation in x2 and so, if we were to use the ‘quadratic formula’ (say), we get p 25 ± 252 − 4(3)(60) 2 . x = 2(3) But here, the discriminant is negative as 252 − 4(3)(60) = 625 − 720 = −95,
140
4.5. Solutions to exercises
and so this equation gives us no solutions for x2 and, hence, no solutions for x. Thus, the only solution to y = 0 is x = 0 and this is, therefore, the only x-intercept of the curve y = f (x). (b) The stationary points occur when f 0 (x) = 0 and so, as f 0 (x) = 15x4 − 75x2 + 60, we have to solve the equation 15x4 − 75x2 + 60 = 0
=⇒
x4 − 5x2 + 4 = 0.
4
This is also a quadratic equation in x2 and, if we factorise (say), we get (x2 − 4)(x2 − 1) = 0
=⇒
x2 = 4, 1,
and this, in turn, gives us x = ±2 and x = ±1 as the x-coordinates of the stationary points. To classify these stationary points, we find the second derivative of f (x), i.e. f 00 (x) = 60x3 − 150x = 30x(2x2 − 5), and we can see that If x = −2, we have f 00 (−2) = 30(−2)(2(−2)2 − 5) = −60(8 − 5) = −180 < 0, and so this is a local maximum. At this point we also have y = f (−2) = 3(−2)5 − 25(−2)3 + 60(−2) = −96 + 200 − 120 = −16, and so the coordinates of this point are (−2, −16). If x = −1, we have f 00 (−1) = 30(−1)(2(−1)2 − 5) = −30(2 − 5) = 90 > 0, and so this is a local minimum. At this point we also have y = f (−1) = 3(−1)5 − 25(−1)3 + 60(−1) = −3 + 25 − 60 = −38, and so the coordinates of this point are (−1, −38). If x = 1, we have f 00 (1) = 30(1)(2(1)2 − 5) = 30(2 − 5) = −90 < 0, and so this is a local maximum. At this point we also have y = f (1) = 3(1)5 − 25(1)3 + 60(1) = 3 − 25 + 60 = 38, and so the coordinates of this point are (1, 38).
141
4. One-variable optimisation
If x = 2, we have f 00 (2) = 30(2)(2(2)2 − 5) = 60(8 − 5) = 180 > 0, and so this is a local minimum. At this point we also have y = f (2) = 3(2)5 − 25(2)3 + 60(2) = 96 − 200 + 120 = 16, and so the coordinates of this point are (2, 16).
4
(c) We can use the information that we have found so far together with the observation that the y-intercept occurs when x = 0, i.e. when y = f (0) = 0, to get the sketch in Figure 4.23(a).
y
y y = f (x)
y = f (x)
(3, 234) (1, 38)
(−2, −16)
(2, 16)
O
(−1, −38)
(1, 38)
x
−3 (−2, −16)
(−3, −234)
(a)
O
(2, 16) 3
x
(−1, −38)
(b)
Figure 4.23: (a) A sketch of the curve y = f (x) from Exercise 4.4(c). (b) For
Exercise 4.4(d), ‘picking out’ the interval −2 ≤ x ≤ 2 using vertical dotted lines and the interval −3 ≤ x ≤ 3 using vertical dashed lines. (d) Given that −2 ≤ x ≤ 2 and looking at the sketch in Figure 4.23(a), it should be clear that the global maximum and the global minimum of f (x) are at the points (1, 38) and (−1, −38) respectively. If you’re unclear about this, this interval is ‘picked out’ by the vertical dotted lines in Figure 4.23(b). If we now have −3 ≤ x ≤ 3, looking at the sketch in Figure 4.23(a), it should be clear that the global maximum and the global minimum of f (x) are at the points (3, 234) as f (3) = 234 and (−3, −234) as f (−3) = −234 respectively. If you’re unclear about this, this interval is ‘picked out’ by the vertical dashed lines in Figure 4.23(b).
142
4.5. Solutions to exercises
Solution to exercise 4.5 As we mentioned at the end of Section 4.5.3, the tax revenue, R(r), generated by this tax is given by R(r) = (rp∗r )qr∗ , as it is the tax paid per unit sold, i.e. rp∗r , multiplied by the quantity sold, i.e. qr∗ , if the market is in equilibrium. If we refer back to Exercise 2.3 for p∗r and qr∗ , this then gives us 12 4 − 8r r − 2r2 R(r) = r = 48 . 2−r 2−r (2 − r)2
Now, to find the value of r, i.e. r∗ , that maximises R(r), we differentiate it with respect to r using the quotient and chain rules to get R0 (r) = 48
2
2
2
(1 − 4r)(2 − r) − (r − 2r )[2(2 − r)(−1)] (1 − 4r)(2 − r) + 2(r − 2r ) = 48 , 4 (2 − r) (2 − r)3
and this simplifies to give us
2 − 7r . (2 − r)3 This has a stationary point when R0 (r) = 0, i.e. when r = 2/7, and as R0 (r) changes from positive to negative as r goes through this value, we can see that this stationary point is a local maximum.6 Now, in this case, we must have 0 ≤ r ≤ rm for the market to function and so this is a constrained optimisation problem. That is, the maximum we seek is either the value of R(r) at our local maximum, i.e. 12 4 − 8 × 27 12 2 2 2 12 7 = = = 2, R 2 2 12 12 7 7 2− 7 7 7 2− 7 7 R0 (r) = 48
or its value at one of the end-points, i.e. r = 0 or r = rm = 1/2. But, 4 − 8 × 12 1 1 12 R(0) = 0 < 2 and R = = 0 < 2, 2 2 2 − 12 2 − 12
and so the maximum value of R(r) is 2 and this occurs when r = 2/7, i.e. at the local maximum. Thus, r∗ = 2 and using the information we have so far, we can get the sketch in Figure 4.24(a) for values of r that make economic sense, i.e. those where 0 ≤ r ≤ 1/2.7 Aside: As shown in Figure 4.24(b), observe that once we move away from the economically meaningful values of r (i.e. where 0 ≤ r ≤ 1/2) the graph of R(r) gets quite complicated. Indeed, note that as R(r) = 48
r − 2r2 , (2 − r)2
we can see that it has a vertical asymptote when r = 2 and, because we can write R(r) = 48
r − 2r2 −2(r2 − 4r + 4) − 7r + 8 7r − 8 = 48 = −96 − 48 , 2 2 (2 − r) 4 − 4r + r (2 − r)2
we can see that R(r) → −96 as r → ±∞, i.e. we also have a horizontal asymptote here. Alternatively, you can show that this stationary point is a local maximum by showing that R00 (r) < 0 when r = 2/7, but this isn’t quite so easy. 7 Note, in particular, that r∗ is clearly not half-way between no tax (i.e. r = 0) and the maximum tax (i.e. rm = 1/2) as it was in Example 4.14 when we looked at an excise tax. 6
143
4
4. One-variable optimisation
4 R(r) R(r) 2 O 2
O
2 7
1 2 (a)
r
r
−96 (b)
Figure 4.24: For Exercise 4.5: (a) A sketch of the graph of R(r) for the economically meaningful values of r, i.e. those between zero (i.e. no tax) and 1/2 (i.e. the maximum tax). (b) As an aside, we could have sketched the graph of R(r) for some economically meaningless values of r (specifically r < 0 and r ≥ 1/2). Observe, in particular, the vertical asymptote when r = 2 and the horizontal asymptote where R(r) → −96 as r → ±∞. (Note that the details of what is happening in the positive quadrant, which we saw in (a), have been omitted from (b) for clarity.)
144
Chapter 5 Integration Essential reading (For full publication details, see Chapter 1.) Binmore and Davies (2002) Sections 10.2, parts of 10.3–10.4, 10.5–10.9.
5
Anthony and Biggs (1996) Chapters 25 and 26. Further reading Simon and Blume (1994) Appendix A4. Adams and Essex (2010) Sections 5.5–5.7, 6.1–6.2 and parts of 6.3. Aims and objectives The objectives of this chapter are as follows. To introduce the idea of an integral and see how it can be found using various techniques. To use integrals to find areas. To see how integrals can be used in economics-based subjects. Specific learning outcomes can be found near the end of this chapter.
5.1
Introduction: What is integration?
In Chapter 3, we introduced differentiation and saw that a function, f (x), could be differentiated with respect to x to yield its derivative, which we denoted by df dx
or
f 0 (x).
And, in particular, we saw how to find such derivatives by using the rules of differentiation and some standard derivatives. Now, given a function, f (x), we want to make sense of what it means to find the indefinite integral of this function with respect to x, which is denoted by Z f (x) dx.
145
5. Integration
In such cases, as we are integrating the function f (x) with respect to x, we call it the integrand. And, similarly to what we saw before, we will see how to find such integrals by using the rules of integration and some standard integrals. In particular, the standard integrals will be closely related to our standard derivatives since the key idea behind our method for finding integrals will be the idea that integration is the process that ‘undoes’ (or ‘reverses’) the process of differentiation, i.e. the process of indefinite integration can be thought of as antidifferentiation and the resulting indefinite integral can be thought of as an antiderivative.
5
Consider the functions F (x) and f (x) where we know that f (x) is the derivative1 of F (x), i.e. dF = f (x). dx Now, using the idea that integration ‘undoes’ differentiation, i.e. if we integrate f (x) with respect to x we are looking for a function, F (x), whose derivative is f (x), we can see that Z f (x) dx must be, more or less, given by F (x). In such cases, we say that F (x) is an antiderivative of f (x) as opposed to, say, the indefinite integral.
However, you may wonder why we say that the function, F (x), that we found above is ‘an’, as opposed to ‘the’, antiderivative of f (x). The reason for this is that if, instead of the function F (x) we had the function F (x) + c where c is a constant, then its derivative would still be f (x), i.e. d F (x) + c = f (x), dx and so, using the reasoning above, we would find that Z f (x) dx can also, more or less, be given by F (x) + c,
where c is a constant. That is, F (x) + c is also an antiderivative of f (x) for this constant c. Example 5.1
Show that 4x2 and 4x2 + 1 are both antiderivatives of 8x.
4x2 is an antiderivative of 8x as we can differentiate 4x2 to get 8x. But, similarly, we can see that 4x2 + 1 is also an antiderivative of 8x as we can differentiate 4x2 + 1 to get 8x. As such, because this works for any constant c we add to F (x), we say that the indefinite integral gives us a whole family of antiderivatives which only differ by a constant, i.e. the choice of c. In this way, we say that indefinite integration, i.e. the process of finding Z f (x) dx,
is antidifferentiation, i.e. it seeks all the functions F (x) + c that can be differentiated to yield f (x) and, as such, every one of these functions will be an antiderivative of f (x). 1
We say that it is the derivative because differentiation always yields exactly one answer.
146
5.2. How to find indefinite integrals
Example 5.2
What is
Z
8x dx?
We saw in Example 5.1 that 4x2 is an antiderivative of 8x. This means that Z 8x dx = 4x2 + c, where c is an arbitrary (i.e. any) constant. Notice that this works because differentiating 4x2 + c we get 8x. Generally speaking then, we have the following. If F (x) is a function whose derivative is the function f (x), then we have Z f (x) dx = F (x) + c,
5
where c is an arbitrary constant. In particular, we call the function, f (x), the integrand as it is what we are integrating, function, F (x), an antiderivative as its derivative is f (x), constant, c, a constant of integration which is completely arbitrary,2 and Z integral, f (x) dx, an indefinite integral since, in the result, c is arbitrary.
Now that we have the idea, let’s see how we’re going to actually find the indefinite integrals of the functions that commonly occur in this course.
5.2
How to find indefinite integrals
The previous section told us how to find indefinite integrals using the antiderivatives, but now we want to explore a more convenient way of finding them. The key idea is that we introduce standard integrals which tell us how to integrate the basic functions that we saw in Chapter 2. Once we know how to integrate these, the rules of integration will allow us to integrate combinations of these functions.
5.2.1
Standard integrals
In Example 5.2, we used the idea that indefinite integration is antidifferentiation to show that the function f (x) = 8x has an indefinite integral given by Z 8x dx = 4x2 + c, where c is an arbitrary constant. We now state some results that will allow us to find the indefinite integrals of our other basic functions. 2
As we can add any constant to F (x) to account for the fact that F (x) + c, for any constant c ∈ R, is also an antiderivative.
147
5. Integration
Power functions If n 6= −1, we have
Z
xn+1 x dx = + c, n+1 where c is an arbitrary constant and this works because n+1 d x (n + 1)xn +c = + 0 = xn . dx n + 1 n+1 In particular, if n = 0, we have Z
5
n
1 dx =
Z
x0 dx = x + c,
and this works because the derivative of x + c is 1. However, if we have n = −1, we have Z Z 1 −1 x dx = dx = ln |x| + c, x where we need the modulus sign in ln |x| as x may be negative but the logarithm function is only defined for x > 0. This works because, if x > 0, we have |x| = x and so d ln(x) 1 d ln |x| = = , dx dx x whereas if x < 0, we have |x| = −x and so
if we use the chain rule.
d ln(−x) −1 1 d ln |x| = = = , dx dx −x x
Exponential and logarithmic functions If we are using e, we have
Z
ex dx = ex +c,
where c is an arbitrary constant and this works because d x e +c = ex . dx However, there is no nice standard integral for ln x and so we’ll see how to find Z ln x dx, when we encounter integration by parts in Example 5.20. If we have another base, a, the standard integrals are not so simple. But, we can see that Z ax ax dx = + c, ln a
148
5.2. How to find indefinite integrals
where c is an arbitrary constant since, using the result from Activity 3.9, we have x a ax ln a d +c = + 0 = ax . dx ln a ln a However, there is also no nice standard integral for loga x and so we’ll see how to find Z loga x dx, in Activity 5.12 where we will use the change of base formula once we can integrate ln x. Sine and cosine functions For the sine and cosine function we find that Z sin x dx = − cos x + c and
Z
5 cos x dx = sin x + c,
where c is an arbitrary constant. The former works because d − cos x + c = −(− sin x) + 0 = sin x, dx whereas the latter works because the derivative of sin x is cos x.
5.2.2
The basic rules of integration
In Section 2.1.2, we saw that there are several standard ways of making new functions from old ones and, in Section 3.2.2, we saw how the rules of differentiation could be used to differentiate these new functions. Here we will see how we can use standard integrals, i.e. the integrals of our basic functions, and rules of integration to integrate the new functions that are created from these basic ones in these standard ways. We start with the most straightforward of these which allows us to integrate linear combinations of functions. The linear combination rule If k and l are constants, this allows us to integrate the linear combination, kf (x) + lg(x), of two functions f (x) and g(x). It states that Z Z Z [kf (x) + lg(x)] dx = k f (x) dx + l g(x) dx. Indeed, this gives us three more basic rules straightaway, i.e. the constant multiple rule: If k is a constant and f (x) is a function, then Z Z kf (x) dx = k f (x) dx.
149
5. Integration
sum rule: If f (x) and g(x) are functions, then Z Z Z [f (x) + g(x)] dx = f (x) dx + g(x) dx. difference rule: If f (x) and g(x) are functions, Z Z Z [f (x) − g(x)] dx = f (x) dx + g(x) dx. Activity 5.1 Derive the constant multiple, sum and difference rules from the linear combination rule. Activity 5.2
5
Use antiderivatives to show that the linear combination rule works. Using these rules we see that:
Example 5.3 Z
1
−3x
Z h Z
− 21
2
dx = −3
x +x
1 2
i
x2 1 2
!
1
+ c = −6x 2 + c by the constant multiple rule,
3 3 1 3 x3 x 2 dx = + 3 +c= x + 2x 2 + c by the sum rule, 3 3 2
[sin x − cos x] dx = − cos x − sin x + c by the difference rule, and
3 x − 4 e dx = 3 ln |x| − 4 ex +c by the linear combination rule, x where c is an arbitrary constant. Z
So, in the case of linear combinations of functions such as these, we see that the integral of the linear combination is given by the linear combination of the integrals. Activity 5.3 to x.
Use the rules above to integrate the following functions with respect (a) − 3 cos x,
(b) ex + cos x,
3 (c) 3 sin x − . x
We now look at the other rules of integration, i.e. the ones that will allow us to integrate other combinations of functions. But, unlike what we saw with the rules of differentiation in Section 3.2.2, we shall see that these are harder to apply.
5.2.3
Integration by substitution
Integration by substitution is a way of dealing with integrands that involve the composition of two functions and, as such, it is closely related to the chain rule of differentiation. To see how it works, we will start by seeing how integration by substitution is related to the chain rule and then we will describe how to apply this rule. We will then apply this rule in some simple examples and then some harder ones.
150
5.2. How to find indefinite integrals
Why integration by substitution works We start by noting that the chain rule for differentiation tells us that if h(x) = (f ◦ g)(x) = f (g(x)), then we write h as f (g) so that, on differentiating, we get dh df dg = . dx dg dx But, because of this we can see that h(x) = (f ◦ g)(x) = f (g(x)) is an antiderivative of and so, we have
dh df dg = , dx dg dx
Z
df dg dx = f (g(x)) + c, dg dx which is the basis of integration by substitution. However, this is quite hard to apply and so, as a useful way of applying this rule, we think of Z df dg dx, so that we have dg = f (g) + c, dg as dx dg and this is the key to the method that we shall be using here. How to integrate by substitution We can now see how to apply integration by substitution. The basic idea is that, if you are given an integrand that involves a composition of two functions, this rule of integration sometimes allows you to turn it into an easier integral by making a substitution. That is: The Z integral involves the derivative of a composition and has the form f 0 (g(x))g 0 (x) dx.
Write f 0 (g(x)) as f 0 (g) and g 0 (x)dx as dg. This should give you the easier integral Z f 0 (g) dg. Find this integral and replace all occurrences of g with g(x) to get your final answer. Now, to make this clearer, let’s look at some examples. Some simple applications of integration by substitution Easy integrations by substitution involve an integrand which is nothing more than a simple composition of two functions and so there can be no doubt about which function should be ‘g’. To see this, let’s consider what happens when we want to integrate a simple composition which involves the function 3x + 1.
151
5
5. Integration
Example 5.4
Find
Z
(3x + 1)2 dx.
dg = 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence, Taking g = 3x + 1 we have dx substitution gives Z Z Z 1 1 (3x + 1)3 1 g3 2 2 dg = +c= + c, (3x + 1) dx = g g 2 dg = · 3 3 3 3 9 where c is an arbitrary constant.
Example 5.5
5
Find
Z
1 dx. 3x + 1
dg Taking g = 3x + 1 we have = 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence, dx substitution gives Z Z Z 1 1 1 1 1 1 dx = dg = g −1 dg = ln |g| + c = ln |3x + 1| + c, 3x + 1 g 3 3 3 3 where c is an arbitrary constant.
Example 5.6
Find
Z
e3x+1 dx.
dg = 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence, Taking g = 3x + 1 we have dx substitution gives Z Z Z 1 1 1 1 3x+1 g e dx = e dg = eg dg = eg +c = e3x+1 +c, 3 3 3 3 where c is an arbitrary constant. In particular, observe what changes in these examples and what stays the same. Indeed, just for comparison, we can see what would happen if we had a composition which is like the one in Example 5.4 but it now involves the function 4x + 7 instead of 3x + 1. Example 5.7
Find
Z
(4x + 7)2 dx.
dg Taking g = 4x + 7 we have = 4 and so dg = 4 dx, i.e. dx = 14 dg. Hence, dx substitution gives Z Z Z 1 1 1 g3 (4x + 7)3 2 2 (4x + 7) dx = g dg = g 2 dg = · +c= + c, 4 4 4 3 12 where c is an arbitrary constant.
152
5.2. How to find indefinite integrals
In a similar manner, find
Activity 5.4
Z
1 dx and 4x + 7
Z
e4x+7 dx.
Note that in all of these examples, the substitution works because we have g(x) = ax + b and hence dg =a dx
=⇒
dg = a dx
1 dg = dx, a
=⇒
where a 6= 0 and b are constants. Indeed, as we end up with an integrand involving a1 , which is a constant, it can be moved out of the integral using the constant multiple rule of integration. So, if our integrand is a composition, i.e. f (g(x)), and g(x) is a linear function, i.e. it has the form ax + b where a 6= 0 and b are constants, this kind of substitution will always work and this leads to the general result that Z 1 f (ax + b) dx = F (ax + b) + c, a where F (x) is an antiderivative of f (x) and c is an arbitrary constant.
Activity 5.5 Suppose that a 6= 0 and b are constants. Use this result to find an expression for Z (ax + b)n dx, when n is a constant. Also find expressions for Z Z ax+b e dx, sin(ax + b) dx and
Z
cos(ax + b) dx.
What happens if a = 0? Activity 5.6 Using the expressions you found in Activity 5.5, verify your answers to Activity 5.4. Some less simple applications of integration by substitution We will also see slightly harder integrations by substitution where the integrand involves a composition of two functions multiplied by another function. Although, even in these cases, there can be little doubt about which function should be ‘g’. To see this, let’s consider what happens when we want to integrate a simple composition which involves the function x2 + 1. Example 5.8
Find
Z
(x2 + 1)2 x dx.
Taking g = x2 + 1 we have g 0 (x) = 2x and so dg = 2x dx, i.e. x dx = 12 dg. Hence, substitution gives Z Z Z 1 1 1 g3 (x2 + 1)3 2 2 2 (x + 1) x dx = g g 2 dg = · dg = +c= + c, 2 2 2 3 6
153
5
5. Integration
i.e. the extra ‘x’ in the integrand was actually needed for the substitution g = x2 + 1 to work. Find
Example 5.9
Z
x dx. x2 + 1
Taking g = x2 + 1 we have g 0 (x) = 2x and so dg = 2x dx, i.e. x dx = 12 dg. Hence, substitution gives Z Z Z 1 1 1 1 x 1 dx = dg = g −1 dg = ln |g| + c = ln |x2 + 1| + c, 2 x +1 g 2 2 2 2 i.e. the extra ‘x’ in the integrand was, again, needed for the substitution g = x2 + 1 to work.
5 Example 5.10
Find
Z
x ex
2 +1
dx.
Taking g = x2 + 1 we have g 0 (x) = 2x and so dg = 2x dx, i.e. x dx = 12 dg. Hence, substitution gives Z Z Z Z 1 1 1 1 2 x2 +1 x2 +1 g xe dx = e x dx = e dg = eg dg = eg +c = ex +1 +c, 2 2 2 2 i.e. the extra ‘x’ in the integrand was, again, needed for the substitution g = x2 + 1 to work. In particular, observe what changes in these examples and what stays the same. Indeed, just for comparison, we can see what would happen if we had a composition which is like the one in Example 5.8 but it now involves the function 3x2 + 7 instead of x2 + 1. Example 5.11
Find
Z
(3x2 + 7)2 x dx.
Taking g = 3x2 + 7 we have g 0 (x) = 6x and so dg = 6x dx, i.e. x dx = 16 dg. Hence, substitution gives Z Z Z 1 1 1 g3 (3x2 + 7)3 2 2 2 (3x + 7) x dx = g dg = g 2 dg = · +c= + c, 6 6 6 3 18 i.e. the extra ‘x’ in the integrand was actually needed for the substitution g = 3x2 + 7 to work. Activity 5.7
In a similar manner, find
Z
x dx and 2 3x + 7
Z
2 +7
x e3x
dx.
To summarise, it is worth noting that in all of these examples, the substitution works because we have g(x) = ax2 + b and hence dg = 2ax dx
154
=⇒
dg = 2ax dx,
5.2. How to find indefinite integrals
where a 6= 0 and b are constants. But, 2ax is not a constant and so we can not deal with this by taking it out of the integral as we did in the last set of examples. However, in these cases, the substitution still works because we have dg = 2ax dx
=⇒
dg = 2ax dx
=⇒
1 dg = xdx, 2a
and there is also an ‘x’ in the integrand to facilitate the transition from ‘dx’ to ‘dg’. Indeed, in the absence of this extra ‘x’, the substitution would produce a more complicated integral and we would not be able to proceed! Integration by substitution more generally The general lesson that we should be drawing from the last two sets of examples is that integration by substitution works when we have an integrand which is the product of 0
the composition of two functions f (g(x)), and a constant multiple of g 0 (x). The first of these enables us to replace f 0 (g(x)) with f 0 (g) and the second enables us to replace dx with some constant multiple of dg. Having done this, the substitution has turned a hard integral into an easier one and we can proceed. Let’s now consider some more complicated examples. Example 5.12
Find
Z
(x3 + x2 )7 (3x2 + 2x) dx.
Here the composition is (x3 + x2 )7 and so we take g = x3 + x2 . As such, we have dg = 3x2 + 2x, dx which is the other part of the product in the integrand, i.e. this substitution will work. Thus, we see that dg = (3x2 + 2x) dx, and so the substitution gives Z Z g8 (x3 + x2 )8 3 2 7 2 7 (x + x ) (3x + 2x) dx = g dg = +c= + c. 8 8 Here, the extra ‘3x2 + 2x’ in the integrand was needed for the substitution g = x3 + x2 to work.
Example 5.13
Find
Z
x2
2x + 2 dx. + 2x + 2
Here the composition is (x2 + 2x + 2)−1 and so we take g = x2 + 2x + 2. As such, we have dg = 2x + 2, dx
155
5
5. Integration
which is the other part of the product in the integrand, i.e. this substitution will work. Thus, we see that dg = (2x + 2) dx, and so the substitution gives Z Z 1 2x + 2 dx = dg = ln |g| + c = ln |x2 + 2x + 2| + c. x2 + 2x + 2 g Here, the extra ‘2x + 2’ in the integrand was needed for the substitution g = x2 + 2x + 2 to work.
Example 5.14
5
Find
Z
(x2 + 1) ex
Here the composition is ex
3 +3x+7
3 +3x+7
dx.
and so we take g = x3 + 3x + 7. As such, we have
dg = 3x2 + 3 = 3(x2 + 1), dx which is a constant multiple of the other part of the product in the integrand, i.e. this substitution will work. Thus, we see that dg = 3(x2 + 1) dx
=⇒
1 dg = (x2 + 1) dx, 3
and so the substitution gives Z Z Z 1 1 1 1 3 2 x3 +3x+7 g (x + 1) e dx = e dg = eg dg = eg +c = ex +3x+7 +c. 3 3 3 3 Here, the extra ‘x2 + 1’ in the integrand was needed for the substitution g = x3 + 3x + 7 to work.
Activity 5.8
Find
Z
x sin(x2 ) dx.
Integration by substitution with trigonometric functions Sometimes we can straightforwardly apply what we have just seen to find integrals that involve compositions of trigonometric functions as the following examples show. Example 5.15
Find
Z
sin2 x cos x dx.
Here the composition is sin2 x and so we take g = sin x. As such, we have dg = cos x, dx which is the other part of the product in the integrand, i.e. this substitution will
156
5.2. How to find indefinite integrals
work. Thus, we see that dg = cos x dx, and so the substitution gives Z Z 1 g3 2 + c = sin3 x + c. sin x cos x dx = g 2 dg = 3 3 Here, of course, the extra ‘cos x’ in the integrand was needed for the substitution g = sin x to work.
Activity 5.9
Find
Z
cos2 x sin x dx.
Indeed, as the next example shows, this kind of substitution allows us to find another useful result. Example 5.16
Find
Z
tan x dx.
In (2.1), we saw that
sin x , cos x which means that the composition is (cos x)−1 and so we take g = cos x. As such, we have dg = − sin x, dx which, up to a minus, is the other part of the product in the integrand, i.e. this substitution will work. Thus, we see that tan x =
dg = − sin x dx, and so the substitution gives Z Z Z sin x dg tan x dx = dx = − = − ln |g| + c = − ln | cos x| + c. cos x g Here, of course, the extra ‘sin x’ in the integrand was needed for the substitution g = cos x to work.
Activity 5.10
Find
Z
cot x dx.
However, not every trigonometric substitution is so easy to spot as the next example shows. Example 5.17
Find
Z
dx . (x + a)2 + b2
157
5
5. Integration
Here, for reasons that will soon become apparent, we make the substitution x + a = b tan θ. As such, differentiating both sides of this expression with respect to θ, we have dx = b sec2 θ =⇒ dx = b sec2 θ dθ. dθ This means that our integral becomes Z Z Z Z dθ b sec2 θ sec2 θ dx dθ = , = dθ = 2 (x + a)2 + b2 b2 tan θ + b2 b sec2 θ b
5
if we use the trigonometric identity tan2 θ + 1 = sec2 θ from (2.4). This then gives us Z x+a dθ θ 1 −1 = + c = tan + c, b b b b since x + a = b tan θ and where c is an arbitrary constant. Thus, we have found that Z dx x+a 1 −1 + c, = tan (x + a)2 + b2 b b which is another useful result. Activity 5.11 denominator.)
Find
Z
x2
dx . (Hint: Complete the square in the + 2x + 2
We will see other examples of how trigonometric identities can be used when finding integrals in Section 5.2.6.
5.2.4
Integration by parts
Integration by parts is a way of dealing with integrands which involve the product of two functions and, as such, it is closely related to the product rule of differentiation. To see how it works, we will start by seeing how integration by parts is related to the product rule and then we will describe how to apply this rule. We will then see some examples of how it can be applied. Why integration by parts works We start by noting that the product rule for differentiation tells us that d [f (x)g(x)] = f 0 (x)g(x) + f (x)g 0 (x). dx So, integrating both sides with respect to x, we get Z Z Z d 0 [f (x)g(x)] dx = f (x)g(x) dx + f (x)g 0 (x) dx, dx which, on noting that integration ‘undoes’ differentiation, yields Z Z 0 f (x)g(x) = f (x)g(x) dx + f (x)g 0 (x) dx.
158
5.2. How to find indefinite integrals
Rearranging this then gives us Z Z 0 f (x)g (x) dx = f (x)g(x) − f 0 (x)g(x) dx, and we call this new rule integration by parts. How to integrate by parts Observe that integration by parts allows us to write one integral in terms of another and so a successful application of this rule requires a ‘good choice’ of f (x) and g 0 (x), i.e. one where it is straightforward to integrate g 0 (x) and the new integral is easier to find than the old one. That is: The integral involves a product of two functions and has the form
Z
f (x)g 0 (x) dx.
Choose f (x) and g 0 (x) so that we can differentiate f (x) to get f 0 (x) and straightforwardly integrate g 0 (x) to get g(x). Z Apply the formula and make sure that the new integral, f 0 (x)g(x) dx, is easier to
integrate.
If it is, proceed. If it is not, then you have been unwise in your choice of f (x) and g 0 (x). Let’s look at some simple examples of how it works.
Example 5.18
Find
Z
x ex dx.
Here we have a product and, to apply integration by parts, we choose f (x) = x
and
g 0 (x) = ex ,
so that differentiating f (x) and integrating g 0 (x) we get f 0 (x) = 1
and
g(x) = ex ,
where we have suppressed the arbitrary constant from the integration. Applying the rule then gives, Z Z Z x x x x x e dx = (x)(e ) − (1)(e ) dx = x e − ex dx, and, clearly, the new integral is easier to find. Thus, finding this integral, we get Z Z x x x e dx = x e − ex dx = x ex − ex +c = (x − 1) ex +c, as the answer.
159
5
5. Integration
Warning! Observe that if we had chosen f (x) and g 0 (x) differently, we would have got f (x) = ex and g 0 (x) = x, so that differentiating f (x) and integrating g 0 (x) we would have got f 0 (x) = ex
and
g(x) =
x2 , 2
where we have suppressed the arbitrary constant from the integration. Applying the rule then gives, 2 2 Z Z Z x x2 ex 1 x x x x − (e ) dx = − x e dx = (e ) x2 ex dx, 2 2 2 2
5
and this is bad because the new integral is harder to find.
Example 5.19
Find
Z
x ln x dx.
Here we have a product and, to apply integration by parts, we choose f (x) = ln x
and
g 0 (x) = x,
so that differentiating f (x) and integrating g 0 (x) we get f 0 (x) =
1 x
and
g(x) =
x2 , 2
where we have suppressed the arbitrary constant from the integration. Applying the rule then gives, 2 Z 2 Z Z x 1 x x2 x x ln x dx = (ln x) − dx = ln x − dx, 2 x 2 2 2 and, clearly, the new integral is easier to find. Thus, finding this integral, we get Z Z x2 x x2 x2 x ln x dx = ln x − dx = ln x − + c, 2 2 2 4 as the answer. Warning! Observe that if we had chosen f (x) and g 0 (x) differently, we would have got f (x) = x and g 0 (x) = ln x. This would have been bad because we can’t integrate g 0 (x) = ln x to get g(x) at the moment. However, having said that, now that we can integrate by parts, we can finally see how to integrate ln x.
160
5.2. How to find indefinite integrals
Example 5.20
Find
Z
ln(x) dx.
To do this using integration by parts, we treat the integrand as 1 · ln(x) so that we have a product, i.e. we want to find, Z Z ln(x) dx = 1 · ln(x) dx. To apply integration by parts, we choose f (x) = ln(x)
and
g 0 (x) = 1,
so that differentiating f (x) and integrating g 0 (x) we get f 0 (x) =
1 x
and
5
g(x) = x,
where we have suppressed the arbitrary constant from the integration. Applying the rule then gives, Z Z Z 1 1 · ln(x) dx = (x)(ln(x)) − (x) dx = x ln(x) − 1 dx, x and, clearly, the new integral is easier to find. Thus, finding this integral, we get Z Z ln(x) dx = x ln(x) − 1 dx = x ln(x) − x + c, and so we have now found the integral of ln x as promised in Section 5.2.1. Activity 5.12 Use the result in Example 5.20 and the change of base formula for logarithms to find Z loga x dx,
which was also promised in Section 5.2.1.
Activity 5.13 Use the result in the previous example to find the integral in Example 5.19 the other way, i.e. by choosing f (x) = x
and
g 0 (x) = ln x,
when integrating by parts. We observe that integration by parts is not useful for all products since, as we saw above, integrals like Z (x2 + 1)2 x dx,
in Example 5.8 contain a product and yet they are best dealt with by substitution as the extra ‘x’ in the product is a constant multiple of the derivative of g = x2 + 1.
161
5. Integration
However, integrals like
Z
(x2 + 1)2 x2 dx,
would require integration by parts since, now, the extra ‘x2 ’ in the product is not a constant multiple of the derivative of g = x2 + 1. Indeed, the main skill involved in finding integrals using these rules is choosing the appropriate method.3 To illustrate this, let’s see how we would find this last integral. Example 5.21
Find
Z
(x2 + 1)2 x2 dx.
Here we have a product and, to apply integration by parts, we choose f (x) = (x2 + 1)2
5
and
g 0 (x) = x2 ,
so that differentiating f (x) and integrating g 0 (x) we get f 0 (x) = 2(x2 + 1)(2x)
and
g(x) =
x3 , 3
where we have used the chain rule to perform the differentiation and suppressed the arbitrary constant from the integration. Applying the rule then gives, Z Z x3 x3 2 2 2 2 2 2 (x + 1) x dx = (x + 1) − dx, 2(x + 1)(2x) 3 3
and, clearly, the new integral is easier to find because we can easily multiply out the brackets and integrate term-by-term. Thus, finding this integral, we get Z Z 4 x3 2 4 x7 x5 x3 2 2 6 4 2 2 2 2 x + x dx = (x + 1) − + + c, (x + 1) x dx = (x + 1) − 3 3 3 3 7 5 as the answer.
Activity 5.14 Verify that this answer is correct by multiplying out the brackets in the integrand and integrating term-by-term. The last two ways of making progress with an integral that we will consider are not rules of integration, but handy techniques that allow us to rewrite integrands so that we can see how to integrate them. The first of these uses a particular kind of algebraic identity known as ‘partial fractions’ and the second involves the use of trigonometric identities.
5.2.5
Using partial fractions to simplify integrands
Suppose that we have an integrand which is a rational function of two polynomials, say R(x) = 3
P (x) . Q(x)
This is unlike the situation with differentiation where it is always pretty obvious which rule we should be applying!
162
5.2. How to find indefinite integrals
In order to apply the method of partial fractions, it must be the case that the degree of the numerator, i.e. P (x), is less than the degree of the denominator, i.e. Q(x). If this is the case, we start by looking at how the denominator factorises and then proceed according to which of the following cases we are in. Case 1: The denominator has distinct [real] linear factors If the denominator, Q(x), is of degree n and has n real and distinct roots a1 , a2 , . . . , an then we can write Q(x) = (x − a1 )(x − a2 ) · · · (x − an ), i.e. Q(x) has distinct [real] linear factors. In this case, the method of partial fractions dictates that we can write A1 A2 An P (x) = + + ··· + , R(x) = (x − a1 )(x − a2 ) · · · (x − an ) x − a1 x − a2 x − an
and we can find the numbers A1 , A2 , . . . , An we require by cross-multiplying on the right-hand-side, comparing the numerators and letting x = a1 , x = a2 , . . . , x = an respectively. Let’s look at a simple example. Example 5.22
Find
Z
x dx. x2 − x − 2
Here the integrand is a rational function of two polynomials and the degree of the numerator is less than the degree of the denominator. As such, we can use the method of partial fractions and, looking at the denominator, we see that x2 − x − 2 = (x − 2)(x + 1), so we are in the case where we have distinct linear factors. This means that we can write x A1 A2 x = = + , 2 x −x−2 (x − 2)(x + 1) x−2 x+1 for some constants A1 and A2 . To find these constants, we cross-multiply on the right-hand-side to see that A1 (x + 1) + A2 (x − 2) x = , (x − 2)(x + 1) (x − 2)(x + 1)
and so, comparing the numerators, we need
x = A1 (x + 1) + A2 (x − 2). Indeed, setting x = 2 on both sides, we see that 2 = 3A1 whereas setting x = −1 on both sides, we see that −1 = −3A2 . Thus, we have x2
x x 2/3 1/3 = = + , −x−2 (x − 2)(x + 1) x−2 x+1
using the values of A1 and A2 that we have found. Consequently, we find that Z Z x 2/3 1/3 2 1 dx = + dx = ln |x − 2| + ln |x + 1| + c, 2 x −x−2 x−2 x+1 3 3 where c is an arbitrary constant.
163
5
5. Integration
We observe, in particular, that the degree of the denominator determines how many constants we have to find. Case 2: The denominator has a repeated [real] linear factor If we find that one of the roots, say ak , of the denominator, Q(x), is real and repeated m times then we replace the term Ak , x − ak in the expansion from Case 1 with the terms B1 B2 Bm + + ··· + . 2 x − ak (x − ak ) (x − ak )m
5
We then have to find the numbers B1 , B2 , . . . , Bm as well as any other numbers that remain from Case 1. Let’s look at a simple example. Example 5.23
Find
Z
x+3 dx. (x + 2)(x − 1)2
Here the integrand is a rational function of two polynomials and the degree of the numerator is less than the degree of the denominator. As such, we can use the method of partial fractions and, looking at the denominator, we have (x + 2)(x − 1)2 ,
and so we are in the case where we have a repeated linear factor. This means that we can write A1 B1 B2 x+3 = + + , 2 (x + 2)(x − 1) x + 2 x − 1 (x − 1)2 for some constants A1 , B1 and B2 . To find these constants, we cross-multiply on the right-hand-side to see that A1 (x − 1)2 + B1 (x − 1)(x + 2) + B2 (x + 2) x+3 = , (x + 2)(x − 1)2 (x + 2)(x − 1)2
and so, comparing the numerators, we need
x + 3 = A1 (x − 1)2 + B1 (x − 1)(x + 2) + B2 (x + 2).
Indeed, setting x = −2 on both sides, we see that 1 = 9A1 and setting x = 1 on both sides, we see that 4 = 3B2 . However, to find B1 , we now note that comparing (say) the coefficient of the x2 term on both sides of this expression we get 0 = A1 + B1 and so B1 = −A1 = −1/9. Thus, we have x+3 1/9 −1/9 4/3 = + + , 2 (x + 2)(x − 1) x + 2 x − 1 (x − 1)2
using the values of A1 , B1 and B2 that we have found. Consequently, we find that Z Z x+3 1/9 −1/9 4/3 dx = + + dx (x + 2)(x − 1)2 x + 2 x − 1 (x − 1)2 1 1 4 = ln |x + 2| − ln |x − 1| − + c, 9 9 3(x − 1) where c is an arbitrary constant.
164
5.2. How to find indefinite integrals
We observe, again, that the degree of the denominator determines how many constants we have to find. Case 3: The denominator has an irreducible [real] factor If we find that the denominator, Q(x), has an irreducible [real] factor like ax2 + bx + c,4 then we replace the corresponding term in the expansion from Case 1 with the term C1 x + C2 . ax2 + bx + c We then have to find the numbers C1 and C2 as well as any other numbers that remain from Case 1. Let’s look at a simple example.
Example 5.24
Find
Z
5
x dx. 2 (x − 1)(x + 2x + 2)
Here the integrand is a rational function of two polynomials and the degree of the numerator is less than the degree of the denominator. As such, we can use the method of partial fractions and, looking at the denominator, we have (x − 1)(x2 + 2x + 2), and so we are in the case where we have an irreducible factor as x2 + 2x + 2 has no real roots as, for instance, “b2 − 4ac” gives us 22 − 4(1)(2) = 4 − 8 = −4 < 0. This means that we can write A1 C1 x + C2 x = + , (x − 1)(x2 + 2x + 2) x − 1 x2 + 2x + 2 for some constants A1 , C1 and C2 . To find these constants, we cross-multiply on the right-hand-side to see that x A1 (x2 + 2x + 2) + (C1 x + C2 )(x − 1) = , (x − 1)(x2 + 2x + 2) (x − 1)(x2 + 2x + 2) and so, comparing the numerators, we need x = A1 (x2 + 2x + 2) + (C1 x + C2 )(x − 1). Indeed, setting x = 1 on both sides, we see that 1 = 5A1 and, to find C1 , we now note that comparing the coefficient of the x2 term on both sides of this expression we get 0 = A1 + C1 and so C1 = −A1 = −1/5 and comparing the coefficient of the constant term on both sides we get 0 = 2A1 − C2 and so C2 = 2A1 = 2/5. Thus, we have (x −
x 1/5 (1/5)(−x + 2) = + 2 , + 2x + 2) x−1 x + 2x + 2
1)(x2
4
That is, we have a quadratic like ax2 + bx + c with b2 − 4ac < 0 so we cannot find real roots. This means that we cannot factorise it using real factors and so we cannot use Case 1 or Case 2 on it.
165
5. Integration
using the values of A1 , C1 and C2 that we have found. Consequently, we find that Z Z 1/5 (−1/5)(x − 2) x dx = + 2 dx (x − 1)(x2 + 2x + 2) x−1 x + 2x + 2 Z 1 1 x−2 = − dx. 5 x − 1 x2 + 2x + 2 Now, the integral of the first term is easy but, to deal with the integral of the second term, we note that the derivative of x2 + 2x + 2 is 2x + 2 (i.e. we are thinking about the substitution g = x2 + 2x + 2 which we saw in Example 5.13). This means that, writing x2
5
1 2x − 4 1 2x + 2 − 6 1 2x + 2 3 x−2 = = = − 2 , 2 2 2 + 2x + 2 2 x + 2x + 2 2 x + 2x + 2 2 x + 2x + 2 x + 2x + 2
we can see that, completing the square in the denominator of the last term, we have Z Z x 1 1 1 2x + 2 3 dx = − + dx (x − 1)(x2 + 2x + 2) 5 x − 1 2 x2 + 2x + 2 (x + 1)2 + 1 1 1 2 −1 ln |x − 1| − ln |x + 2x + 2| + 3 tan (x + 1) + c, = 5 2 where c is an arbitrary constant. Here, we have implicitly made the substitution g = x2 + 2x + 2 in the middle term (as we saw in Example 5.13) and we saw how to integrate the last term in Activity 5.11. Of course, the key here is that, in this new term the linear expression in the numerator that we have to find has a degree which is one less than the degree of the irreducible quadratic expression in the denominator.5 This means that, if we had a repeated irreducible factor in the denominator, we would have to compensate in a way which is reminiscent of Case 2 as the next example shows. Example 5.25
Find
Z
x4 + x3 + 2x2 dx. (x − 1)(1 + x2 )2
Here the integrand is a rational function of two polynomials and the degree of the numerator is less than the degree of the denominator. As such, we can use the method of partial fractions and, looking at the denominator, we have (x − 1)(1 + x2 )2 , and so we are in the case where we have a repeated irreducible factor as x2 + 1 has no real roots as, for instance, “b2 − 4ac” gives us 02 − 4(1)(1) = −4 < 0. This means that we can write x4 + x3 + 2x2 A1 C1 x + C2 D1 x + D2 = + + , 2 2 (x − 1)(1 + x ) x−1 1 + x2 (1 + x2 )2 5
That is, the number of constants we have to find is equal to the degree of the denominator in the term we are dealing with.
166
5.2. How to find indefinite integrals
for some constants A1 , C1 , C2 , D1 and D2 . To find these constants, we cross-multiply on the right-hand-side to see that x4 + x3 + 2x2 A1 (1 + x2 )2 + (C1 x + C2 )(x − 1)(1 + x2 ) + (D1 x + D2 )(x − 1) = , (x − 1)(1 + x2 )2 (x − 1)(1 + x2 )2 and so, comparing the numerators, we need x4 + x3 + 2x2 = A1 (1 + x2 )2 + (C1 x + C2 )(x − 1)(1 + x2 ) + (D1 x + D2 )(x − 1). Indeed, setting x = 1 on both sides, we see that 4 = 4A1 and, to find C1 , we now note that comparing the coefficient of the x4 term on both sides of this expression we get 1 = A1 + C1 and so C1 = 1 − A1 = 0 and comparing the coefficient of the x3 term on both sides of this expression we get 1 = −C1 + C2 and so C2 = 1 + C1 = 1. To find D1 and D2 we note that, using what we have found so far, we have x4 + x3 + 2x2 = (1 + x2 )2 + (x − 1)(1 + x2 ) + (D1 x + D2 )(x − 1), which means that, comparing the coefficient of x2 on both sides of this expression we get 2 = 2 − 1 + D1 and so D1 = 1 whereas comparing the coefficient of x on both sides we get 0 = 0 + 1 − D1 + D2 and so D2 = 0. Thus, we have x4 + x3 + 2x2 1 1 x = + + , 2 2 2 (x − 1)(1 + x ) x−1 1+x (1 + x2 )2 using the values of the constants A1 , C1 , C2 , D1 and D2 that we have found. Consequently, we find that Z Z x4 + x3 + 2x2 1 1 x dx = + + dx (x − 1)(1 + x2 )2 x − 1 1 + x2 (1 + x2 )2 1 +c = ln |x − 1| + tan−1 x − 2(1 + x2 ) where c is an arbitrary constant and we have implicitly used the substitution u = 1 + x2 to work out the integral of the last term. So, once again, we observe that the degree of the denominator determines how many constants we have to find in all of these examples. Generally speaking, as we are using partial fractions to help us find integrals, we shouldn’t expect to see anything more complicated than this.
5.2.6
Using trigonometric identities to simplify integrands
In Example 5.15 we saw how to find Z
sin2 x cos x dx,
by using the substitution g = sin x, but what if you were asked to find Z sin2 x dx?
167
5
5. Integration
In this case, the substitution would not work since we do not have the ‘extra’ factor of cos x in the integrand. However, as we shall see in the next example, we can easily find this new integral if we use one of the trigonometric identities that we saw in Section 2.1.4.
Example 5.26
Find
Z
sin2 x dx.
In Activity 2.18, we saw the double-angle formula cos(2x) = 1 − 2 sin2 x,
5
which allows us to write the problematic integrand sin2 x in terms of the function cos(2x) which is far easier to integrate. That is, rearranging this trigonometric identity, we have 1 2 sin x = 1 − cos(2x) , 2
and so we find that Z Z 1 1 1 2 sin x dx = 1 − cos(2x) dx = x − sin(2x) + c, 2 2 2 where c is an arbitrary constant.
Activity 5.15
Find
Z
cos2 x dx.
Indeed, in Example 5.17, we used a substitution that worked because of the trigonometric identity tan2 θ + 1 = sec2 θ to obtain a useful result. Here’s another one that is very similar.
Example 5.27
Use the substitution x + a = b sin θ to find
Z
dx
p
b2
− (x + a)2
.
Here, for reasons that will soon become apparent, we make the suggested substitution. As such, differentiating both sides of x + a = b sin θ with respect to θ, we have dx = b cos θ =⇒ dx = b cos θ dθ. dθ This means that our integral becomes Z Z Z Z dx b cos θ cos θ p p = dx = dθ = dθ, cos θ b2 − (x + a)2 b2 − b2 sin2 θ
if we use the trigonometric identity 1 − sin2 θ = cos2 θ from (2.2). This then gives us Z x+a −1 dθ = θ + c = sin + c, b
168
5.2. How to find indefinite integrals
since x + a = b sin θ and where c is an arbitrary constant. Thus, we have found that Z dx x+a −1 p + c, = sin b b2 − (x + a)2 which is another useful result.
As a last example, let’s see another way in which trigonometric identities can be used to find an integral.
Example 5.28
Use the substitution t = tan θ to find
Z
dθ . 1 + cos(2θ)
The substitution t = tan θ is very useful and so we start by seeing how it can be applied. Firstly, we note that, differentiating both sides with respect to θ, we get dt = sec2 θ, dθ and so, using the trigonometric identity sec2 θ = 1 + tan2 θ from (2.4), this gives us dθ =
dt . 1 + t2
Secondly, we note that the denominator of our integrand is 1 + cos(2θ) = 1 + cos2 θ − sin2 θ, using a double-angle formula from (2.6) and so we will need to be able to write sin θ and cos θ in terms of t. An easy way to do this is to consider the right-angled triangle in Figure 5.1 as this immediately tells us that t sin θ = √ , 1 + t2
and
cos θ = √
1 , 1 + t2
and so we see that the denominator of our integrand can be written as 1 t2 2 1 + cos(2θ) = 1 + cos θ − sin θ = 1 + − = , 2 2 1+t 1+t 1 + t2 2
2
in terms of t. Thus, returning to the integral, we have Z Z Z dθ 1 + t2 dt 1 t 1 = = dt = + c = tan θ + c, 1 + cos(2θ) 2 1 + t2 2 2 2 where c is an arbitrary constant. Generally, as in the last two examples, when an unusual substitution is required in this unit, it will be given in the question. Indeed, we’ll see a little bit more of this kind of thing in Examples 5.36 and 5.37.
169
5
5. Integration
√ 1+
t2
t
θ 1 Figure 5.1: A right-angled triangle with t = tan θ can have t on the opposite side and 1
on the √ adjacent side which means that, using Pythagoras’ theorem, the hypotenuse must be 1 + t2 . With this triangle, we can then quickly deduce the expressions for sin θ and cos θ in terms of t which are needed for Example 5.28.
5.3
5
Definite integrals and areas
So far, we have been looking at indefinite integrals and we have been finding them by using the idea of an antiderivative to deduce standard integrals and rules of integration. We now turn to the geometric interpretation of an integral and this involves introducing the idea of a definite integral and seeing what it represents.
5.3.1
Definite integrals and what they represent
In Section 3.3.1 we saw that the derivative of a function, f (x), gave us the gradient of the curve y = f (x). We now consider what the integral of a function, f (x), tells us about the curve y = f (x) and see how this comes about through the idea of a definite integral. What is a definite integral? Recall that an indefinite integral is so-called since, given a function, f (x), and one of its antiderivatives, F (x), i.e. two functions related by the fact that dF = f (x), dx we have
Z
f (x) dx = F (x) + c,
where c is an arbitrary constant. And, indeed, it is this arbitrary constant that makes this integral indefinite as we do not know what c is. In a similar vein, instead of writing, Z Z b f (x) dx we could also write f (x) dx, a
where the constants a and b are called the limits of integration. In order to work out integrals that look like this we need to know what to do with these limits and the procedure is: Firstly: deal with the integral. Integrating f (x), we take one of its antiderivatives, F (x), and then write b Z b f (x) dx = F (x) . a
170
a
5.3. Definite integrals and areas
In particular, as we shall see below, observe that we no longer need a constant of integration. Secondly: deal with the limits. By definition, we let b F (x) = F (b) − F (a), a
i.e. we subtract the value of the antiderivative at x = a from its value at x = b. Notice that this means that, if F (x) is an antiderivative of f (x), we have Z b f (x) dx = F (b) − F (a), a
i.e. the value of the integral depends only on the value of the antiderivative at the points x = a and x = b. Thus, this is now a definite integral as it no longer involves an arbitrary constant, c. Activity 5.16
If F (x) is an antiderivative of f (x), show that Z
b
f (x) dx = F (x) + c
a
b
a
= F (b) − F (a),
if c is a constant. Hence explain why we can omit the constant of integration when evaluating definite integrals. Another consequence of this discussion is that it allows us to see how to use our basic rules of integration to evaluate definite integrals. For instance, if k and l are constants and f (x) and g(x) are functions, then we can see that the linear combination rule gives us Z b Z b Z b g(x) dx, f (x) dx + l [kf (x) + lg(x)] dx = k a
a
a
if we are using definite integrals.
Activity 5.17 Following what we saw in Section 5.2.2, write down the constant multiple rule, the sum rule and the difference rule for definite integrals. Activity 5.18 Using what we have seen so far, derive the linear combination rule for definite integrals. Now that we have the basic idea, let’s see how we can work out a definite integral. Evaluate
Example 5.29
Z
3
(x + 4) dx.
1
If we follow the two step procedure above, i.e. integrating to find an antiderivative and then dealing with the limits, we get Z
1
3
x2 (x+4) dx = + 4x 2
3 1
=
2 32 1 9 1 + 4(3) − + 4(1) = + 12 − + 4 = 12, 2 2 2 2
171
5
5. Integration
which is the value of this definite integral. Alternatively, we could use the linear combination rule to get Z
1
3
3 3 2 x2 12 3 4 dx = x dx + (x + 4) dx = + 4x = − + 4(3) − 4(1) 2 1 2 2 1 1 1 9 1 − + 12 − 4 = 12, = 2 2 Z
3
Z
3
which is the same answer as before. What definite integrals with non-negative integrands represent
5
Definite integrals are useful because they tell us about the area under a curve. Specifically, if we have the definite integral Z b f (x) dx,
(5.1)
a
where f (x) ≥ 0 for all x such that a ≤ x ≤ b,6 we say that we have a non-negative integrand and find that the value of the integral is the area of the region between the curve y = f (x), the x-axis and the vertical lines x = a and x = b as illustrated in Figure 5.2. y
y = f (x)
O
x a
b
Figure 5.2: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-negative integrand, i.e. f (x) ≥ 0 for a ≤ x ≤ b, and so the definite integral in (5.1) gives us the area of this hatched region. Example 5.30 Find the area of the region between the line y = 4 − 2x, the x-axis and the vertical lines x = 0 and x = 2 which is illustrated in Figure 5.3(a). There are two ways to find this area: As this is just a right-angled triangle, the area is just ‘half times base times height’, i.e. 1 area of triangle = × 2 × 4 = 4. 2 Thus, the area of the region is four. 6
At the moment we will just accept this caveat. The reason why we need f (x) to be non-negative for values of x between the limits of integration will become clear very soon.
172
5.3. Definite integrals and areas
As we have y = f (x) with f (x) = 4 − 2x, we can see from Figure 5.3(a) that f (x) ≥ 0 between x = 0 and x = 2. So, as noted above, the area should be given by 2 Z 2 2 = (4 × 2 − 22 ) − (4 × 0 − 02 ) = (8 − 4) − 0 = 4, (4 − 2x) dx = 4x − x 0
0
which is, again, four. Consequently, this confirms that the definite integral does give us the area of the region between the line y = 4 − 2x, the x-axis and the vertical lines x = 0 and x = 2, at least, when f (x) ≥ 0 between the vertical lines. y
y
11111 00000 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111
5 4
4 3
3
y = 4 − 2x
2
2
1
1
O
1
(a)
2
y = 4 − x2
x
−2
−1
O
x 1
2
(b)
Figure 5.3: Non-negative integrands. (a) For Example 5.30, the region between the line
y = 4 − 2x, the x-axis and the vertical lines x = 0 and x = 2. (b) For Example 5.31, the region between the parabola y = 4 − x2 , the x-axis and the vertical lines x = −1 and x = 1. However, generally, we won’t have a simple geometric way of finding the area under a curve and so we will have to use integration. Example 5.31 Find the area of the region between the parabola y = 4 − x2 , the x-axis and the vertical lines x = −1 and x = 1 which is illustrated in Figure 5.3(b). As we have y = f (x) with f (x) = 4 − x2 , we can see from Figure 5.3(b) that f (x) ≥ 0 between x = −1 and x = 1. So, as noted above, the area should be given by 1 x3 (−1)3 (1)3 (4 − x ) dx = 4x − = 4(1) − − 4(−1) − 3 −1 3 3 −1 11 11 22 = − − = , 3 3 3
Z
1
2
i.e. the area is 7 13 .
173
5. Integration
Activity 5.19 Observe that the region in the previous example is symmetric about the y-axis. Use this observation to explain why the area of this region is two times the area represented by the definite integral, Z 1 (4 − x2 ) dx, 0
and verify that this does indeed give the correct area. What definite integrals with non-positive integrands represent
5
We now start to consider what happens to the definite integral in (5.1) when we can’t guarantee that the integrand is non-negative, i.e. what happens if we do not have f (x) ≥ 0 for all x such that a ≤ x ≤ b? To simplify matters, we will start by asking: What happens when this condition always fails? That is, what happens when the integrand is non-positive as f (x) ≤ 0 for all x such that a ≤ x ≤ b.
So what does the definite integral in (5.1) tell us about the area of the region bounded by the curve y = f (x), the x-axis and the vertical lines x = a and x = b when we have a non-positive integrand, i.e. when f (x) ≤ 0 for a ≤ x ≤ b, as illustrated in Figure 5.4? One way of looking at this is to note that, If f (x) ≤ 0 for all a ≤ x ≤ b, then −f (x) ≥ 0 for all a ≤ x ≤ b. But, this means that −f (x) gives us a non-negative integrand and the area, A, of the region in question is given by A=
Z
a
b
−f (x) dx = −
Z
b
f (x) dx
=⇒
a
Z
a
b
f (x) dx = −A,
i.e. for non-positive integrands, the definite integral gives us minus the area. Thus, in the case of non-positive integrands, the area is given by the magnitude of the definite integral. Let’s have a look at an example. y O
a
b
x y = f (x)
Figure 5.4: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-positive integrand, i.e. f (x) ≤ 0 for a ≤ x ≤ b, and so the definite integral in (5.1) gives us minus the area of this hatched region.
174
5.3. Definite integrals and areas
Example 5.32 Find the area of the region between the line y = 4 − 2x, the x-axis and the vertical lines x = 2 and x = 4 which is illustrated in Figure 5.5(a). There are two ways to find this area: As this is just a right-angled triangle, the area is just ‘half times base times height’, i.e. 1 area of triangle = × 2 × 4 = 4. 2 Thus, the area of the region is four. As we have y = f (x) with f (x) = 4 − 2x, we can see from Figure 5.5(a) that f (x) ≤ 0 between x = 2 and x = 4. So, looking at the definite integral we get, Z
4
(4−2x) dx = 4x−x
2
2
4 2
= (4×4−42 )−(4×2−22 ) = (16−16)−(8−4) = −4,
which is minus the answer we would expect. As such, we take the magnitude of this answer and so the area is, again, four. Consequently, if f (x) ≤ 0 between the vertical lines, the definite integral gives us minus the area and so we take the magnitude of the definite integral to find the area.
y
y
4
4
3
3
y = 4 − 2x
2
2
1
1 O
x 1
2
3
O
4
−1
−1
−2
−2
−3
−3
−4
−4
(a)
y = 4 − 2x
x 1
2
3
4
(b)
Figure 5.5: Negative integrands and their relation to area. The region between the line
y = 4 − 2x, the x-axis and the vertical lines (a) x = 2 and x = 4 for Example 5.32, and (b) x = 0 and x = 4 for Example 5.33.
175
5
5. Integration
What definite integrals with general integrands represent We now consider what happens to the definite integral in (5.1) when we can’t guarantee that the integrand is non-positive or non-negative, i.e. what happens if f (x) ≥ 0 for some x such that a ≤ x ≤ b but not others? Let’s start by considering the simple case where we have an integrand which is neither non-positive nor non-negative because there is some number c such that a ≤ c ≤ b where f (x) ≥ 0 for all x such that a ≤ x ≤ c, and f (x) ≤ 0 for all x such that c ≤ x ≤ b,
as illustrated in Figure 5.6. One way of looking at this is to note that the definite integral
5
Z
c
f (x) dx gives us the hatched area, A1 , between the vertical lines x = a and x = c, Z b f (x) dx gives us minus the hatched area, A2 , between the vertical lines x = c a
c
and x = b.
As such, the hatched area, A, between the lines x = a and x = b is given by Z b Z c A = A1 + A 2 = f (x) dx + f (x) dx , a
c
which is not the value of the definite integral in (5.1). y y = f (x) b O
x a
c
Figure 5.6: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-negative integrand for a ≤ x ≤ c and a non-positive integrand for a ≤ x ≤ c, i.e. the definite integral in (5.1) can not be used to find the area of the region. Thus, for general integrands, the procedure for finding the area of the region bounded by the curve y = f (x), the x-axis and the vertical lines x = a and x = b is as follows: Firstly, determine all the points where the curve crosses the x-axis with x-coordinates between x = a and x = b. Secondly, use these points to determine (possibly via a sketch) where the curve is positive and where the curve is negative.
176
5.3. Definite integrals and areas
Thirdly, use this information to determine the areas by finding the appropriate definite integrals (bearing in mind that the integrands will now be either non-negative or non-positive). Fourthly, add up all the areas to find the total area. To see how this works let’s consider a couple of examples. Example 5.33 Find the area of the region between the line y = 4 − 2x, the x-axis and the vertical lines x = 0 and x = 4 which is illustrated in Figure 5.5(b). As indicated in Figure 5.5(b), the line y = 4 − 2x crosses the x-axis when x = 2 and this lies between x = 0 and x = 4. We can also see that the function is non-negative for 0 ≤ x ≤ 2 and non-positive for 2 ≤ x ≤ 4. As such, using our earlier workings in Examples 5.30 and 5.32, we split the total region into two sub-regions to see that:
5
Between x = 0 and x = 2 we evaluate the definite integral, Z 2 (4 − 2x) dx which gives us 4, 0
as we saw in Example 5.30. Thus, the area is four here as we have a non-negative integrand. Between x = 2 and x = 4 we evaluate the definite integral, Z 4 (4 − 2x) dx which gives us −4, 2
as we saw in Example 5.32. Thus, the area is four here as we have a non-positive integrand. Consequently, the total area is eight. We also note, in passing, that the definite integral 4 Z 4 2 (4 − 2x) dx = 4x − x = (4 × 4 − 42 ) − (4 × 0 − 02 ) = (16 − 16) − 0 = 0, 0
0
and, as this is zero, it most definitely is not giving us the area we seek! Activity 5.20 Verify that the answer to the previous example is correct by finding the areas of the triangles involved. Example 5.34 Find the area of the region between the parabola y = 1 − x2 , the x-axis and the vertical lines x = −2 and x = 2 which is illustrated in Figure 5.7. As indicated in Figure 5.7, the parabola y = 1 − x2 crosses the x-axis when x = ±1 and these points lie between x = −2 and x = 2. We can also see that the function is non-negative for −1 ≤ x ≤ 1 and non-positive for −2 ≤ x ≤ −1 and 1 ≤ x ≤ 2. As such, we split the total region into three sub-regions to see that:
177
5. Integration
Between x = −2 and x = −1 we evaluate the definite integral, Z
−1 x3 (−1)3 (−2)3 (1 − x ) dx = x − = −1 − − −2 − 3 −2 3 3 1 8 4 = −1 + − −2 + =− . 3 3 3
−1
2
−2
Thus, the area is
4 3
here as we have a non-positive integrand.
Between x = −1 and x = 1 we evaluate the definite integral,
1 x3 13 (−1)3 (1 − x ) dx = x − = 1− − −1 − 3 −1 3 3 −1 1 4 1 − −1 + = . = 1− 3 3 3
Z
5
1
2
Thus, the area is
4 3
here as we have a non-negative integrand.
Between x = 1 and x = 2 we evaluate the definite integral, Z
1
2
x3 (1 − x ) dx = x − 3 2
Thus, the area is
4 3
1
−1
23 13 8 1 4 = 2− − 1− = 2− − 1− =− . 3 3 3 3 3
here as we have a non-positive integrand.
Consequently, the total area is
4 3
+ 43 +
4 3
which is four.
We also note, in passing, that the definite integral, 2 23 (−2)3 8 8 4 x3 = 2− − (−2) − = 2 − − −2 + =− , (1−x ) dx = x − 3 −2 3 3 3 3 3 −2
Z
2
2
and this is most definitely not giving us the area we seek!
5.3.2
Definite integrals and the other rules of integration
We have seen how to use the basic rules of integration when dealing with definite integrals and so we now look at how we can use the other two rules of integration, namely integration by substitution and integration by parts, in this context. Integration by substitution When evaluating a definite integral using integration by substitution we follow the same procedure as before but now, we also change the limits of integration so that they are values of g rather than values of x. That is, if we are making the substitution g = g(x) and we have a definite integral with limits x = a and x = b, after the substitution, the limits will be g = g(a) and g = g(b) respectively. This is best illustrated by an example.
178
5.3. Definite integrals and areas
y y = 1 − x2
1
−2 −1
O
2
x
1
−1 −2 −3
Figure 5.7: Negative integrands and their relation to area (continued). For Example 5.34,
the region between the parabola y = 1 − x2 , the x-axis and the vertical lines x = −2 and x = 2.
Example 5.35
Find
Z
1
x ex
2 +1
dx.
0
We saw in Example 5.10 that, taking g = x2 + 1, we have dg = 2x dx
=⇒
dg = 2x dx
=⇒
x dx =
1 dg. 2
In this case, as we have a definite integral, we also change the limits of integration, i.e. lower limit: x = 0 gives g = g(0) = 02 + 1 = 1, and upper limit: x = 1 gives g = g(1) = 12 + 1 = 2. Hence, the substitution gives 2 Z 1 Z 2 Z 1 2 g 1 1 2 e 1 g x2 +1 g 1 xe dx = e dg = e dg = e = e −e = (e −1), 2 2 1 2 2 2 0 1 1 as the answer. Alternatively, using our indefinite integral from Example 5.10, we saw that integration by substitution gave us Z 1 2 2 x ex +1 dx = ex +1 +c, 2 and so this means that, if we suppress the constant of integration, we get 1 Z 1 1 x2 +1 1 12 +1 1 2 e x2 +1 02 +1 1 xe dx = e = e −e = e −e = (e −1), 2 2 2 2 0 0 as before.
179
5
5. Integration
For a harder example, let’s see what happens when we have to make a substitution that works because of our double-angle formulae from Section 2.1.4. Example 5.36
2
Use the substitution x = sin θ to find
Z
1
0
√ √ x 1 − x dx.
Differentiating both sides of x = sin2 θ with respect to θ, we have dx = 2 sin θ cos θ dθ
=⇒
dx = 2 sin θ cos θ dθ,
and changing the limits of integration we get lower limit: x = 0 gives sin2 θ = 0 and so θ = 0, and upper limit: x = 1 gives sin2 θ = 1 and so θ = π/2.
5
Hence, the substitution gives us Z
1
0
√ √ x 1 − x dx =
Z
π/2
(sin θ)(cos θ)(2 sin θ cos θ) dt =
0
Z
π/2
2 sin2 θ cos2 θ dt,
0
where we have √ used the trigonometric identity cos2 θ = 1 − sin2 θ from (2.2)to get cos θ from the 1 − x in the integrand. Then, using the double-angle formula sin(2θ) = 2 sin θ cos θ from (2.6), we see that this gives us Z
1
0
√ √ 1 x 1 − x dx = 2
Z
π/2
sin2 (2θ) dθ,
0
which we solve using a variation on the method given in Example 5.26, i.e. we note that cos(4θ) = 1 − 2 sin2 (2θ) from Activity 2.18, so that Z
0
1
√ √ 1 x 1 − x dx = 4
Z
0
π/2
π/2 1 1 π 1 − cos(4θ) dθ = θ − sin(4θ) = , 4 4 8 0
as sin(4θ) = 0 when θ = 0 or θ = π/2. Lastly, let’s see another application of the t = tan θ substitution that we saw in Example 5.28. Example 5.37
Use the substitution t = tan θ to find
Z
π/2
0
dθ . 4 − 2 cos2 θ
Using what we saw in Example 5.28, we see that dθ =
dt 1 + t2
and
cos2 θ =
1 , 1 + t2
and so, in particular, the denominator of our integrand can be written as 4 − 2 cos2 θ = 4 −
180
2 2 + 4t2 = . 1 + t2 1 + t2
5.3. Definite integrals and areas
Also, changing the limits of integration, we get lower limit: θ = 0 gives t = tan 0 = 0, and upper limit: θ = π/2 gives t = tan(π/2) = ∞,
which means that the substitution gives Z
π/2
0
Z
∞
1 + t2 dt dt = 2 + 4t2 1 + t2 0 ∞ √ 1 √ −1 = 2 tan ( 2t) = 4 0
dθ = 4 − 2 cos2 θ
Z 1 ∞ 4 0 √ 2 π, 8
1 2
1 dt + t2
as tan−1 ∞ = π/2 and we have used the result we saw in Example 5.17.
5
Of course, in this example, when we write things like “tan(π/2) = ∞” or “tan−1 (∞) = π/2”, what we really mean is “tan θ → ∞ as θ → π/2” and “tan−1 t → π/2 as t → ∞”. This shorthand will be fine for this course, but in 176 Further Calculus, we will see how to do things like this properly. Integration by parts When evaluating a definite integral using integration by parts we use b Z b Z b 0 f (x)g (x) dx = f (x)g(x) − f 0 (x)g(x) dx, a
a
a
i.e. we have to evaluate the f (x)g(x) term using the limits of integration as well as evaluating the new [easier] definite integral. Example 5.38
Find
Z
1
x ex dx.
0
We saw in Example 5.18 that, to apply integration by parts to this integral, we choose f (x) = x and g 0 (x) = ex , so that differentiating f (x) and integrating g 0 (x) we get f 0 (x) = 1
g(x) = ex ,
and
where we have suppressed the arbitrary constant from the integration. Applying the rule in the case of a definite integral then gives, 1 Z 1 1 Z 1 Z 1 x x x x x e dx = (x)(e ) − (1)(e ) dx = x e − ex dx, 0
0
0
0
0
which leads to 1 Z 1 x 1 0 x 1 1 0 x e dx = (1)(e ) − (0)(e ) − e = e −0 − e − e = 1, 0
0
as the answer.
181
5. Integration
Alternatively, using our indefinite integral from Example 5.18, we saw that integration by parts gave us Z x ex dx = (x − 1) ex +c, and so this means that, if we suppress the constant of integration, we get 1 Z 1 x x = (1 − 1) e1 −(0 − 1) e0 = 0 − (− e0 ) = 1, x e dx = (x − 1) e 0
0
as before.
5
5.4
Applications of integrals
Integrals can be used in economics and we now introduce two ways in which they can arise in that subject. The first is what happens when we want to find a cost function but we only know the marginal cost; and the second introduces the idea of consumer and producer surpluses.
5.4.1
Marginal functions revisited
Suppose that the cost of producing a quantity, q, of goods is given by the cost function, C(q). In Section 3.3.3, we met the idea of the marginal cost, MC(q), of producing q units which was given by dC MC(q) = , dq and this was useful since the approximation ∆C ' MC(q)∆q,
allowed us to estimate the change in costs, ∆C, due to an increase in production of ∆q, i.e. where the quantity produced is increased from q to q + ∆q. We now consider the problem of finding the cost function, C(q), when we are given the marginal cost function, MC(q). Indeed, as the marginal cost function is the derivative of the cost function, we can see that C(q) is an antiderivative of MC(q), and so, C(q) =
Z
MC(q) dq.
(5.2)
However, this presents us with a problem as finding the indefinite integral on the right-hand-side of (5.2) will yield all the antiderivatives of MC(q) — i.e. a function C(q) that contains an arbitrary constant — whereas we want to find the particular antiderivative that is actually the cost function — i.e. we want to find a particular value of this constant. So, the question is: Which value of the arbitrary constant will give us the cost function? In order to answer this question, we need to be given more information, say the fixed costs associated with this production, so that we can find the right value for this constant. Let’s consider an example.
182
5.4. Applications of integrals
Example 5.39
A company’s marginal cost function is given by MC(q) = 2q + 100 eq ,
and its fixed costs are 10, 000. What is the cost function, C(q), for this company? Using (5.2) above, we see that the cost function is given by the integral of the marginal cost, i.e. Z C(q) = (2q + 100 eq ) dq = q 2 + 100 eq +c, where c is an arbitrary constant. This tells us, depending on the value of c, all of the possible cost functions for this company. But, which one should we take? Obviously, perhaps, we want the one which also gives us fixed costs of 10, 000, i.e. we want C(0) = 10, 000 =⇒ 10, 000 = 02 + 100 e0 +c =⇒ 10, 000 = 100 + c =⇒ c = 9, 900, as the fixed costs are the cost of producing nothing. Thus, the cost function for this company is given by C(q) = q 2 + 100 eq +9, 900, as this function agrees with the question on both the marginal and the fixed costs of production.
5.4.2
Consumer and producer surpluses
Suppose that a market has linear supply and demand functions as illustrated in Figure 5.8. As we know from Section 2.1.5, the equilibrium price, p∗ , and the equilibrium quantity, q ∗ , occur at the point where the graphs of these functions intersect. Indeed, at equilibrium, as the consumers buy q ∗ units of the good at a price of p∗ per unit, they pay an amount p∗ q ∗ to the suppliers and we can think of this as the area of the hatched region in Figure 5.9(b). However, if the consumers are willing to buy q ∗ units of the good, it can be argued7 that the consumers would be willing to pay an amount given by Z q∗ pD (q) dq, 0
which is the area of the hatched region in Figure 5.9(a). The difference between the area that represents what they would pay and the area that represents what they actually pay, i.e. the area of the hatched region in Figure 5.9(d), is called the consumer surplus. Indeed, this consumer surplus, CS, can be found using the formula Z q∗ CS = pD (q) dq − p∗ q ∗ , 0
and this is the amount that the consumers save by paying what they actually paid instead of what they would have paid. 7
See, for example, Section 25.1 of Anthony and Biggs (1996).
183
5
5. Integration
p
S
p∗
D
O
5
q
q∗
Figure 5.8: Linear supply and demand functions for a market. Note that the equilibrium
price, p∗ , and the equilibrium quantity, q ∗ , occur at the point where the graphs of these functions intersect. Similarly, if the suppliers are willing to supply q ∗ units of the good, it can be argued that they need to be paid an amount given by Z q∗ pS (q) dq, 0
which is the area of the hatched region in Figure 5.9(c). The difference between the area that represents what they are actually paid and the area that represents what they need to be paid, i.e. the area of the hatched region in Figure 5.9(e), is called the producer surplus. Indeed, this producer surplus, PS, can be found using the formula Z q∗ ∗ ∗ PS = p q − pD (q) dq, 0
and this is the amount that the suppliers gain by being paid what they actually receive instead of what they need to receive. Let’s look at a simple example. Example 5.40
A market has an inverse demand function given by
1 pD (q) = 70 − q, 3 and an inverse supply function given by 1 pS (q) = 20 + q. 2 Find the equilibrium price and quantity. What are the consumer and producer surpluses for this market? The equilibrium quantity, q ∗ , makes the prices obtained from the inverse demand and supply functions equal, i.e. 1 1 5 70 − q ∗ = 20 + q ∗ =⇒ 50 = q ∗ =⇒ q ∗ = 60, 3 2 6
184
5.4. Applications of integrals p
p
111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 p∗ 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 O ∗ q
p
S
S
111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 p∗
D
q
O
q∗
(a)
S
1111111111 0000000000 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111
p∗
D
q
O
(b)
p
q∗
D
q
(c) p
111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 p∗ 000000 111111 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 O 0q∗ 1
S
5
S
1010 111111 000000 000000 111111 10 000000 111111 1010 000000 111111 000000 111111 1010 p∗
D
Consumer surplus: area (a) − area (b)
Producer surplus: area (b) − area (c)
q
O
(d)
q∗
D
q
(e)
Figure 5.9: What people pay or need to be paid. (a) What the consumers would pay for a
quantity q ∗ . (b) What the consumers pay for a quantity q ∗ if the market is at equilibrium. (c) What the suppliers need to be paid for a quantity q ∗ . (d) What the consumers save if they pay for a quantity q ∗ in a market that is at equilibrium, this is the consumer surplus. (e) What the producers gain if they sell a quantity q ∗ in a market that is at equilibrium, this is the producer surplus. and this means that the equilibrium price, p∗ , is given by 1 p∗ = 70 − (60) = 70 − 20 = 50, 3 if we use the inverse demand function. Hence, to find the consumer surplus, CS, we have Z q∗ CS = pD (q) dq − p∗ q ∗ , 0
and so we need to find 60 Z 60 1 q2 1 2 70 − q dq = 70q − = 70(60) − (60) − 0 = 4, 200 − 600 = 3, 600, 3 6 0 6 0 which means that CS = 3, 600 − (50)(60) = 3, 600 − 3, 000 = 600,
185
5. Integration
is the consumer surplus. And, to find the producer surplus, PS, we have Z q∗ ∗ ∗ PS = p q − pS (q) dq, 0
and so we need to find 60 Z 60 1 q2 1 2 20 + q dq = 20q + = 20(60) + (60) − 0 = 1, 200 + 900 = 2, 100, 2 4 0 4 0 which means that PS = (50)(60) − 2, 100 = 3, 000 − 2, 100 = 900, is the producer surplus.
5 Although, as both the demand and supply functions are linear in this example, there is an easier way to find the consumer and producer surpluses as the next Activity shows. Activity 5.21 Sketch the inverse demand and supply functions in the previous example and shade in the regions which represent the consumer and producer surplus. What are the areas of these regions? Of course, the demand and supply functions that we are given may not be linear and, in such cases, we would have to use integration to find the consumer and producer surpluses. Activity 5.22
The demand for a commodity is given by the equation p(q + 1) = 231.
If the equilibrium quantity is 10, find the equilibrium price and hence determine the consumer surplus.
Learning outcomes At the end of this chapter and having completed the relevant reading and activities, you should be able to: find integrals using standard integrals and the rules of integration; find integrals by simplifying the integrand using partial fractions and trigonometric identities; use integrals to find areas; solve problems from economics-based subjects that involve integrals.
186
5.4. Solutions to activities
Solutions to activities Solution to activity 5.1 Given the linear combination rule, i.e. Z Z Z [kf (x) + lg(x)] dx = k f (x) dx + l g(x) dx, we can derive the constant multiple rule by setting l = 0 so that Z Z Z Z Z kf (x) dx = [kf (x) + 0g(x)] dx = k f (x) dx + 0 g(x) dx = k f (x) dx, the sum rule by setting k = 1 and l = 1 so that Z Z Z Z [f (x) + g(x)] dx = [1f (x) + 1g(x)] dx = 1 f (x) dx + 1 g(x) dx Z Z = f (x) dx + g(x) dx,
5
and the difference rule by setting k = 1 and l = −1 so that Z Z Z Z [f (x) − g(x)] dx = [1f (x) + (−1)g(x)] dx = 1 f (x) dx + (−1) g(x) dx Z Z = f (x) dx − g(x) dx. Solution to activity 5.2 Suppose that F (x) and G(x) are antiderivatives of f (x) and g(x) respectively, i.e. dF = f (x) dx This means that k
Z
f (x) dx + l
Z
and
dG = g(x). dx
g(x) dx = kF (x) + lG(x) + c,
where c is an arbitrary constant. But, by the linear combination rule for differentiation, we also have d dF dG kF (x) + lG(x) + c = k +l + 0 = kf (x) + lg(x), dx dx dx which means that kF (x) + lG(x) + c is also an antiderivative of kf (x) + lg(x), i.e. Z [kf (x) + lg(x)] dx = kF (x) + lG(x) + c. Consequently, we have Z Z Z [kf (x) + lg(x)] dx = k f (x) dx + l g(x) dx, as they have the same antiderivatives.
187
5. Integration
Solution to activity 5.3 For (a), use the constant multiple rule to see that Z Z −3 cos x dx = −3 cos x dx = −3 sin x + c, where c is an arbitrary constant. For (b), we use the sum rule to see that Z Z Z x x (e + cos x) dx = e dx + cos x dx = ex + sin x + c,
5
where c is an arbitrary constant. For (c), we use the linear combination rule to see that Z Z Z 1 3 dx = 3 sin x dx−3 dx = 3(− cos x)−3 ln |x|+c = −3 cos x−3 ln |x|+c, 3 sin x − x x where c is an arbitrary constant. Solution to activity 5.4 For both of these integrals we use the substitution g = 4x + 7 so that we have dg =4 dx
=⇒
dg = 4dx
=⇒
1 dx = dg. 4
Hence making this substitution in the first integral we get Z Z Z 1 1 1 1 1 1 1 dx = dg = dg = ln |g| + c = ln |4x + 7| + c, 4x + 7 g 4 4 g 4 4 where c is an arbitrary constant whereas, in the second integral, we get Z Z Z 1 1 1 1 4x+7 g e dx = e dg = eg dg = eg +c = e4x+7 +c, 4 4 4 4 where c is an arbitrary constant. Solution to activity 5.5 Using the standard integrals as a source of antiderivatives, we see that, if n 6= −1, Z 1 xn+1 + c, (ax + b)n dx = · a n+1 whereas, if n = −1, we have Z Z −1 (ax + b) dx =
1 1 dx = ln |ax + b| + c, ax + b a
where c is an arbitrary constant. We also have Z 1 eax+b dx = eax+b +c, a Z 1 sin(ax + b) dx = − cos(ax + b) + c, and a
188
5.4. Solutions to activities
Z
1 sin(ax + b) + c, a where c is an arbitrary constant. cos(ax + b) dx =
Of course, if a = 0, then the dependence on x in the integrand disappears and so we are just integrating a constant, i.e. we have Z bn dx = xbn + c, for any n, as well as Z eb dx = x eb +c,
Z
sin b dx = x sin b + c and
where c is an arbitrary constant.
Z
cos b dx = x cos b + c,
5
Solution to activity 5.6 Using what we saw in Activity 5.5 we see that the integrals from Activity 5.4 are, simply, Z Z 1 1 1 dx = ln |4x + 7| + c and e4x+7 dx = e4x+7 +c, 4x + 7 4 4
where c is an arbitrary constant. This is, of course, exactly what we found in Activity 5.4. Solution to activity 5.7
Taking g = 3x2 + 7 we have g 0 (x) = 6x and so dg = 6x dx, i.e. x dx = 16 dg. Hence, in the first integral, this substitution gives Z Z Z 1 1 1 1 1 1 x dx = dg = dg = ln |g| + c = ln |3x2 + 7| + c, 2 3x + 7 g 6 6 g 6 6
where c is an arbitrary constant whereas, in the second integral, this substitution gives Z Z Z 1 1 1 2 1 3x2 +7 g dg = eg dg = eg +c = e3x +7 +c, xe dx = e 6 6 6 6
where c is an arbitrary constant. In both cases, note that the extra ‘x’ in the integrand was actually needed for the substitution g = 3x2 + 7 to work. Solution to activity 5.8 Here the composition is sin(x2 ) and so we take g = x2 . As such, we have dg 1 = 2x =⇒ x dx = dg, dx 2 which is a constant multiple of the other part of the product in the integrand, i.e. this substitution will work. Thus, the substitution gives Z Z Z 1 1 1 1 2 x sin(x ) dx = sin(g) dg = sin(g) dg = − cos(g) + c = − cos(x2 ) + c, 2 2 2 2 where c is an arbitrary constant. Here, of course, the extra ‘x’ in the integrand was needed for the substitution g = x2 to work.
189
5. Integration
Solution to activity 5.9 Here the composition is cos2 x and so we take g = cos x. As such, we have dg = − sin x, dx which, up to a minus, is the other part of the product in the integrand, i.e. this substitution will work. Thus, we see that dg = − sin x dx,
5
and so the substitution gives Z Z Z 1 g3 2 2 sin x cos x dx = g (− dg) = − g 2 dg = − + c = − cos3 x + c. 3 3 Here, of course, the extra ‘sin x’ in the integrand was needed for the substitution g = cos x to work. Solution to activity 5.10 In Activity 2.4, we saw that
cos x , sin x which means that the composition is (sin x)−1 and so we take g = sin x. As such, we have dg = cos x, dx which is the other part of the product in the integrand, i.e. this substitution will work. Thus, we see that dg = cos x dx, cot x =
and so the substitution gives Z Z Z cos x dg cot x dx = dx = = ln |g| + c = ln | sin x| + c. sin x g Here, of course, the extra ‘cos x’ in the integrand was needed for the substitution g = sin x to work. Solution to activity 5.11 We note that the quadratic expression in the denominator can be written as x2 + 2x + 2 = (x + 1)2 + 1, if we complete the square. As such, we have Z Z dx dx = = tan−1 (x + 1) + c, 2 2 x + 2x + 2 (x + 1) + 1 using the result we derived in Example 5.17. (A useful exercise at this point is to try and get this answer by actually making the substitution x + 1 = tan θ as we did in that example.)
190
5.4. Solutions to activities
Solution to activity 5.12 Using the change of base formula for logarithms from Section 2.1.4, i.e. loga (x) = we have Z
1 loga x dx = ln a
Z
ln x , ln a
x 1 x ln(x) − x + c = x loga (x) − + c, ln x dx = ln a ln a
where c is an arbitrary constant. Solution to activity 5.13 Z To find x ln x dx the other way, i.e. by choosing f (x) = x
and
5
0
g (x) = ln x,
we differentiate f (x) and integrate g 0 (x) using the result in Example 5.20 to get f 0 (x) = 1
and
g 0 (x) = x ln x − x,
where we have suppressed the arbitrary constant from the integration. Applying the rule then gives Z Z x ln x dx = x(x ln x − x) − (1)(x ln x − x) dx Z x2 2 2 = x ln x − x − x ln x dx − +c 2 Z x2 2 = x ln x − − x ln x dx + c, 2 so that, taking the integral on the right-hand-side over to the left-hand-side, we have Z Z x2 x2 x2 2 2 x ln x dx = x ln x − +c =⇒ x ln x dx = ln x − + c, 2 2 4 where c is an arbitrary constant. Notice that this is the same as the answer we found in Example 5.19 but it is slightly trickier to get and we need to know the answer to Example 5.20. Solution to activity 5.14 Unlike what we saw in Example 5.21, it would actually make more sense to find Z (x2 + 1)2 x2 dx, by multiplying out the brackets and integrating term-by-term rather than integrating it by parts. Doing this, we get Z Z Z x7 2 5 x3 2 2 2 4 2 2 (x + 1) x dx = (x + 2x + 1)x dx = (x6 + 2x4 + x2 ) dx = + x + + c, 7 5 3
191
5. Integration
where c is an arbitrary constant. Indeed, to verify that this is the same answer as the one we saw in the example, it is easiest to take the earlier answer and note that 4 x 7 x5 x3 4 4 x 7 x5 x3 2 2 2 (x + 1) − + + c = (x + 2x + 1) − + +c 3 3 7 5 3 3 7 5 x7 2 5 x3 4 4 = + x + − x 7 − x5 + c 3 3 3 21 15 7 3 x 2 x = + x5 + + c, 7 5 3 which is what we got above. Solution to activity 5.15
5
To find this integral we also use the other double-angle formula from Activity 2.18, namely 1 2 2 cos(2x) = 2 cos x − 1 =⇒ cos x = 1 + cos(2x) , 2
as this allows us to write the problematic integrand cos2 x in terms of the function cos(2x) which is far easier to integrate. This means that we have Z Z 1 1 1 2 cos x dx = 1 + cos(2x) dx = x + sin(2x) + c, 2 2 2 where c is an arbitrary constant. Solution to activity 5.16 Using the first step, we can see that b Z b f (x) dx = F (x) + c , a
a
as F (x) + c is also an antiderivative of f (x) if c is a constant. Then, using the second step we get8 b F (x) + c = F (b) + c − F (a) + c = F (b) − F (a), a
which is exactly what we wanted. That is, including a constant of integration does not affect the value of a definite integral and so we can omit it. Solution to activity 5.17 For definite integrals, it should be easy to see that we have the constant multiple rule: If k is a constant and f (x) is a function, then Z b Z b kf (x) dx = k f (x) dx. a
8
a
In what follows, bear in mind that a constant such as c, when evaluated at either x = a or x = b, is just c.
192
5.4. Solutions to activities
sum rule: If f (x) and g(x) are functions, then Z
b
[f (x) + g(x)] dx =
a
Z
b
f (x) dx +
a
Z
b
g(x) dx.
a
difference rule: If f (x) and g(x) are functions, Z
a
b
[f (x) − g(x)] dx =
Z
a
b
f (x) dx −
Z
b
g(x) dx.
a
Solution to activity 5.18 Suppose that F (x) and G(x) are antiderivatives of f (x) and g(x) respectively, i.e. dF = f (x) dx
dG = g(x). dx
and
5
This means that b b Z b Z b k f (x) dx+l g(x) dx = k F (x) +l G(x) = k F (b)−F (a) +l G(b)−G(a) . a
a
a
a
But, by the linear combination rule for differentiation, we also have d dF dG +l = kf (x) + lg(x), kF (x) + lG(x) = k dx dx dx which means that kF (x) + lG(x) is also an antiderivative of kf (x) + lg(x), i.e. Z
a
b
b [kf (x) + lg(x)] dx = kF (x) + lG(x) = kF (b) + lG(b) − kF (a) + lG(a) .
a
Consequently, we have Z b Z b Z [kf (x) + lg(x)] dx = k f (x) dx + l g(x) dx, a
a
as they have the same values. Solution to activity 5.19 As the region in Example 5.31 is symmetric about the y-axis it should be clear that we have an area given by Z 1 Z 0 Z 1 2 2 (4 − x ) dx = (4 − x ) dx + (4 − x2 ) dx, −1
−1
0
where the values of the two integrals on the right-hand-side, i.e. the areas they represent, are equal. As such, we can write Z 1 Z 1 2 (4 − x ) dx = 2 (4 − x2 ) dx, −1
0
193
5. Integration
if we decide to find the second of these integrals. Then, looking at the integral on the right-hand-side, we get 1 Z 1 x3 (1)3 (0)3 11 2 (4 − x ) dx = 4x − = 4(1) − − 4(0) − = , 3 0 3 3 3 0 so that, multiplying this by two, we get an area of 22/3 as before. Solution to activity 5.20 Looking at the triangles in Figure 5.5(b), we use ‘half times base times height’ to see that the area of the triangle on the left is 1 × 2 × 4 = 4, 2
5
and the area of the triangle on the right is also given by 1 × 2 × 4 = 4. 2 As such, the total area is eight as we found in Example 5.33. Solution to activity 5.21 A sketch of the inverse supply and demand functions from Example 5.40 is given in Figure 5.10 and the shaded regions are the consumer and producer surpluses as indicated. Notice that we have also labelled the equilibrium price and quantity, which we found in the example, on the sketch. Indeed, from this sketch it should be clear that: The consumer surplus, CS, is the area of a triangle of base 60 and height 20, i.e. we can use ‘half times base times height’ to see that CS =
1 × 60 × 20 = 600, 2
and this agrees with what we found in the example. The producer surplus, PS, is the area of a triangle of base 60 and height 30, i.e. we can use ‘half times base times height’ to see that PS =
1 × 60 × 30 = 900, 2
and this agrees with what we found in the example. Solution to activity 5.22 As the demand equation is p(q + 1) = 231, we see that the inverse demand function is pD (q) =
231 , q+1
and, an equilibrium quantity, q ∗ , of 10 then gives us an equilibrium price of p∗ = pD (q ∗ ) = 21. This means that, using Z q∗ CS = pD (q) dq − p∗ q ∗ , 0
194
5.4. Exercises p
111111 000000 70 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 CS 000000 111111 000000 111111 50 000000 111111
S
1111111111 0000000000 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 PS
D
20
O
210
60
q
Figure 5.10: A sketch of the consumer and producer surpluses for Activity 5.21.
we need to find 10 Z 10 231 dq = 231 ln |q + 1| = 231 ln 11 − 231 ln 1 = 231 ln 11, q+1 0 0
5
as ln 1 = 0, and this gives us CS = 231 ln 11 − (21)(10) = 231 ln 11 − 210, for the consumer surplus.
Exercises Exercise 5.1 Find the following indefinite integrals. Z Z 3 (a) sin x cos x dx, (b) sin3 x dx,
(c)
Z
(x + 2) ln x dx.
Exercise 5.2 √ Z p 1 + e−x √ dx. Find ex Exercise 5.3 Z x2 Find dx. x2 − 1 Exercise 5.4 x Use the substitution t = tan to evaluate 2
Z
0
π/2
dx . 2 + cos x
Exercise 5.5 Find the area of the region between the curve y = x3 , the x-axis and the vertical lines x = −1 and x = 2.
195
5. Integration
Solutions to exercises Solution to exercise 5.1 For (a), we have to find
Z
sin3 x cos x dx,
and we notice that the integrand involves the composition sin3 x. This suggests that we should make the substitution g = sin x and, as this gives us dg = cos x dx
5
=⇒
dg = cos x dx,
which is the other part of the product in the integrand, we can be sure that this will work. So, using this substitution we get Z Z g4 1 3 sin x cos x dx = g 3 dg = + c = sin4 x + c, 4 4 where c is an arbitrary constant. For (b), we have to find
Z
sin3 x dx,
and we note that, using the trigonometric identity sin2 x = 1 − cos2 x from (2.2), this can be written as Z Z Z Z 3 2 sin x dx = (1 − cos x) sin x dx = sin x dx − cos2 x sin x dx. Of course, the first of these integrals on the right-hand-side is trivial and the other was found in Activity 5.9. So, using this, we find that Z Z Z 1 3 2 3 sin x dx = sin x dx − cos x sin x dx = − cos x − − cos x + c 3 1 = − cos x + cos3 x + c, 3 where c is an arbitrary constant. For (c), we have to find
Z
(x + 2) ln x dx,
and we note that the integrand is a product. This suggests that we should use integration by parts with f (x) = ln x
and
g 0 (x) = x + 2,
like we did in Example 5.19. So, differentiating f (x) and integrating g 0 (x) we get f 0 (x) =
196
1 x
and
g(x) =
x2 + 2x, 2
5.4. Solutions to exercises
where we have suppressed the arbitrary constant from the integration. Applying the rule then gives, Z 2 2 Z Z x x 1 x x + 2x − + 2x dx = (x+4) ln x− + 2 dx, (x+2) ln x dx = (ln x) 2 2 x 2 2 and, clearly, the new integral is easier to find. Thus, finding this integral, we get 2 Z Z x x x x (x + 2) ln x dx = (x + 4) ln x − + 2 dx = (x + 4) ln x − + 2x + c, 2 2 2 4 where c is an arbitrary constant. Solution to exercise 5.2 It makes sense to start by rewriting the integral so that we have √ Z p Z p 1 + e−x √ 1 + e−x/2 e−x/2 dx, dx = ex √ since, in this form, we can see that we have the composition 1 + e−x/2 in the integrand. This suggests that we should make the substitution g = 1 + e−x/2 and, as this gives us 1 dg = − e−x/2 dx 2
=⇒
−2 dg = e−x/2 dx,
which is the other part of the product in the integrand, we can be sure that this will work. So, using this substitution we get √ Z p Z Z g 3/2 4 1 + e−x √ 1/2 −x/2 3/2 √ g (−2 dg) = −2 g dg = −2 +c = − dx = 1 + e +c, 3/2 3 ex where c is an arbitrary constant. Solution to exercise 5.3 The integral
Z
x2 dx, x2 − 1 has an integrand which is the quotient of two polynomials. But, as these have the same degree, we can not use the method of partial fractions on it as it stands. Instead, we start by rewriting the integrand as x2 x2 − 1 + 1 1 = =1+ 2 , 2 2 x −1 x −1 x −1
so that now, we can use the method of partial fractions on x2
1 , −1
as the degree of its numerator is less than the degree of its denominator. That is, since x2 − 1 = (x − 1)(x + 1), we have distinct linear factors and so we can write x2
1 1 A1 A2 = = + , −1 (x − 1)(x + 1) x−1 x+1
197
5
5. Integration
for some constants A1 and A2 . To find these constants, we cross-multiply on the right-hand-side to see that 1 A1 (x + 1) + A2 (x − 1) = , (x − 1)(x + 1) (x − 1)(x + 1) and so, comparing the numerators, we need 1 = A1 (x + 1) + A2 (x − 1). Indeed, setting x = 1 on both sides, we see that 1 = 2A1 whereas setting x = −1 on both sides, we see that 1 = −2A2 . Thus, we have x2
5
1 1/2 −1/2 = + , −1 x−1 x+1
using the values of A1 and A2 that we have found. Consequently, putting this all together, we find that Z Z 1/2 1 −1/2 1 x2 1+ dx = x + ln |x − 1| − ln |x + 1| + c, dx = + 2 x −1 x−1 x+1 2 2 where c is an arbitrary constant. Solution to exercise 5.4 We are asked to evaluate the definite integral Z π/2 dx , 2 + cos x 0 using the substitution t = tan(x/2). This substitution, like the substitution t = tan θ that we saw in Example 5.28, is very useful and so we start by seeing how it can be applied. Firstly, we note that we can easily write sin(x/2) and cos(x/2) in terms of t by using a right-angled triangle like the one in Figure 5.1 as this immediately tells us that sin
t x =√ 2 1 + t2
and
cos
x 1 =√ . 2 1 + t2
So, using the double-angle formula cos(2x) = cos2 x − sin2 x from (2.6), we see that the denominator of our integrand can be written as 2 + cos x = 2 + cos2 x − sin2 x = 2 +
t2 3 + t2 1 − = . 1 + t2 1 + t2 1 + t2
Secondly, differentiating both sides of t = tan(x/2) with respect to x, we get dt 1 x = sec2 , dx 2 2 and so, since sec(x/2) is the reciprocal of cos(x/2) as we saw in Section 2.1.2, we have dx =
2 dt , 1 + t2
in terms of t. Thirdly, as this is a definite integral, we also have to change the limits of integration, i.e.
198
5.4. Solutions to exercises
lower limit: x = 0 gives t = tan 0 = 0, and upper limit: x = π/2 gives t = tan(π/4) = 1. Thus, returning to the integral, we have Z 1 Z 1 Z π/2 1 + t2 dt dx 2 dt = =2 , 2 2 2 2 + cos x 3+t 1+t 0 0 3+t 0
and, using the result we found in Example 5.17, this gives us 1 Z π/2 1 dx 2 2 π −1 x −1 1 −1 = 2 √ tan √ −0 , = √ tan √ − tan 0 = √ 2 + cos x 3 3 0 3 3 3 6 0 π and so √ is the answer. 3 3 Solution to exercise 5.5 To find the area of the region between the curve y = x3 , the x-axis and the vertical lines x = −1 and x = 2, we note that the curve will be similar to what we saw in Figure 2.2(c) and so the region we are looking at is the one illustrated in Figure 5.11. y
y = x3
1 0 0 1 0 1 01 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 0 1 01 011 00 1 1 −1
O
2
x
Figure 5.11: The hatching indicates the region of interest in Exercise 5.5.
In particular, we see that the curve crosses the x-axis when x = 0 and that the function is non-positive when −1 ≤ x ≤ 0 and non-negative when 0 ≤ x ≤ 2. As such, we split the total region into two sub-regions to see that: Between x = −1 and x = 0 we evaluate the definite integral, 4 0 Z 0 x 04 (−1)4 1 3 x dx = = − =− . 4 −1 4 4 4 −1 Thus, the area is
1 4
here as we have a non-positive integrand.
Between x = 0 and x = 2 we evaluate the definite integral, 4 2 Z 2 x 24 04 16 3 x dx = = − = = 4. 4 0 4 4 4 0
Thus, the area is 4 here as we have a non-negative integrand. 1 17 Consequently, the total area of this region is + 4 = . 4 4
199
5
5. Integration
5
200
Chapter 6 Functions of several variables Essential reading (For full publication details, see Chapter 1.) Binmore and Davies (2002) Sections 3.1–3.9. Anthony and Biggs (1996) Chapters 11 and 12. Further reading Simon and Blume (1994) parts of 13.1–13.2, parts of 14.1–14.6 and 14.8, parts of 15.1–15.2. Adams and Essex (2010) parts of Chapter 12. Aims and objectives The objectives of this chapter are as follows. To understand that functions of two variables represent surfaces and see how to visualise these surfaces using sections and contours. To introduce partial derivatives and use them in various contexts. To introduce tangent planes, gradient vectors, directional derivatives and Taylor series for functions of two variables. Specific learning outcomes can be found near the end of this chapter.
6.1
Introduction
In Section 2.1, we saw that a function f : R → R was a ‘rule’ which takes an input, x ∈ R, and gives us a unique output, f (x) ∈ R. We now turn our attention to functions of two variables, i.e. functions where the input consists of a pair of numbers, (x, y) ∈ R2 , and whose output is a unique number f (x, y) ∈ R.1 In particular, we will mainly be concerned with functions of two variables where the variables are independent, i.e. the 1
The theory we consider extends to the general case where the input consists of n numbers (x1 , x2 , . . . , xn ). This extension to functions of n variables (with n ≥ 3) should be obvious and so we do not spend much time on it here. However, although we will mainly be dealing with the two-variable case, we will occasionally consider functions of more than two variables.
201
6
6. Functions of several variables
value of x can be chosen independently of the value of y and vice versa. As we shall see, functions of two variables often occur in economics and other fields where we might wish to apply mathematical techniques. Two important examples of such functions from economics are: The production function of a firm, q(k, l), gives the amount it produces when using k units of capital and l units of labour. The utility function of a consumer, u(x1 , x2 ), describes how much ‘utility’ a consumer derives from a bundle (x1 , x2 ) of two goods. As such it enables us to compare the preferences of the consumer when he is confronted with different combinations of these two goods. These applications will be discussed later because, before we consider what we may want to use them for, we want to know how we can ‘visualise’ what is going on when we have a function of two variables.
6 6.2
Surfaces
Let f : R2 → R be a function of the two independent variables x and y. We can think of any input (a, b) as a point in the (x, y)-plane and the output will be the corresponding value of f , i.e. f (a, b), which we can take to be the number c. That is, generally speaking, each point (x, y) in the (x, y)-plane will have an output given by the corresponding value of f , i.e. f (x, y), which we can take to be the value of another variable z. As such, to visualise a function of two variables we need three axes, two to represent the inputs, i.e. x and y, and one to represent the output, i.e. z. Drawing these as in Figure 6.1, we take the (x, y)-plane of the inputs to correspond to points where z = 0, i.e. the input (a, b) is represented on our axes by the point (a, b, 0), and then the output of z = f (x, y) is represented on our axes by the point (a, b, c) which is a vertical distance c above the point (a, b, 0) in the (x, y)-plane. If we do this for all possible inputs (x, y) ∈ R2 we obtain a surface in three-dimensional space whose equation is given by z = f (x, y). For instance, the surfaces obtained from three different functions of two variables, namely f (x, y) = x2 + y 2 ,
g(x, y) = x2 − y 2
and
h(x, y) = −x2 − y 2 ,
are illustrated in Figures 6.2(a), (b) and (c) respectively. Of course, it would be difficult for us to sketch such surfaces by hand and, indeed, it is hard enough to even contemplate how and why they look like they do without a computer. But, as we shall soon see, it is possible to get some feel for what these surfaces look like by thinking about how we can represent them in a two-dimensional way. However, before we do that, let’s take a moment to look at some far simpler surfaces than the ones in Figure 6.2, namely those that can arise from linear functions of two variables, as these turn out to be planes.
202
6.2. Surfaces
z c
(a, b, c)
O b
a
x
(a, b, 0)
y Figure 6.1: Representing the point (a, b, c) using the x, y and z-axes in R3 .
6.2.1
6
Planes
The simplest kind of two-variable function is one which is linear in x and y, i.e. where z = f (x, y) = ax + by, for some constants a and b. Such functions represent planes and, generally speaking, any surface which has an equation of the form ax + by + cz = d, where at least one of the constants a, b and c is non-zero will represent a plane. For what follows, the important kinds of plane are, basically, those that fall into the following categories: The (x, y), (y, z) and (x, z)-planes which have equations z = 0, x = 0 and y = 0 respectively. (These are the planes in the middle of the three planes illustrated in Figures 6.3(a), (b) and (c) respectively.) Planes parallel to the (x, y), (y, z) and (x, z)-planes which, for some constant c, will have equations z = c, x = c and y = c respectively. (These are the other planes illustrated in Figures 6.3(a), (b) and (c) respectively.) Planes which don’t fall into either of the above categories, i.e. those with equations of the form ax + by + cz = d, for some constants a, b, c and d (where at least two of the constants a, b and c are non-zero) will not overly concern us here even though you will come across them in Section 2.11 of 173 Algebra.
203
6. Functions of several variables
x
x
x
y
y
y
(a)
(b)
(c)
Figure 6.2: Visualising the surfaces (a) z = f (x, y) = x2 + y 2 , (b) z = g(x, y) = x2 − y 2
and (c) z = h(x, y) = −x2 − y 2 in three-dimensions. In particular, observe how (c) is the reflection of (a) in the (x, y)-plane as h(x, y) = −f (x, y).
6
z z
x
z
x
y
y y x
(a)
(b)
(c)
Figure 6.3: Planes parallel to the (x, y), (y, z) and (x, z)-planes: (a) From bottom, z =
−10, 0, 10; (b) From left x = −10, 0, 10 and (c) From right y = −10, 0, 10. (Note, in particular, how the axes are labelled in these pictures.)
6.2.2
Contours and sections
Although curve sketching (which is sketching the graph of a function of one variable) is important in this course, you will not be asked to sketch surfaces (such as the ones illustrated above in Figure 6.2) for functions of two variables. However, there are useful ways of visualising such surfaces which do not involve sketching it in three dimensions. One of these is to use planes, such as the ones we saw in Figure 6.3, to ‘carve up’ a three-dimensional illustration of a surface into two-dimensional representations in terms of contours and sections. In particular, these ideas may be familiar to you from your experiences with maps (for contours) and other technical diagrams (for sections). Horizontal planes and the contours of a surface One way of visualising a surface is to look at its contours, which are the curves of intersection that arise when we look at the points of intersection of a surface with planes that are parallel to the (x, y)-plane. To find the contours, we take a plane
204
6.2. Surfaces
parallel to the (x, y)-plane, say the plane z = c, and find the curve of intersection between it and the surface z = f (x, y), i.e. the curve with equation c = f (x, y). This curve is the z = c contour, i.e. the set of points (x, y) which give z = c when we put them into the equation z = f (x, y). Example 6.1 Find the z = 2 contour of the surface z = x − y + 4. Repeat for z = 4 and z = 6. To find the z = 2 contour of the surface z = x − y + 4 we need to find the curve of intersection, which in this case, is given by 2 = x − y + 4. Rearranging this gives the equation y = x + 2 which is the equation of a straight line. Similarly, we find that: For z = 4, the curve of intersection is given by 4 = x − y + 4 which gives us y = x.
6
For z = 6, the curve of intersection is given by 6 = x − y + 4 which gives us y = x − 2.
Thus, we see from these equations that these two contours are straight lines as well. The surface and its contours are illustrated in Figure 6.4. 5
4
8
8
6
6
3
2
1 z
z
4
4
0 −5 2
−4
−2
−3
0
−1
1
2
3
4
5
−1
x
2 −2 0 0
5.0
−5.0
−2.5
0.0 0.0
2.5
x
−2.5 5.0−5.0
2.5
−5.0
−2.5
0.0
2.5
x y
5.0−5.0
−2.5
0.0
2.5
5.0
y −3
y −4
−5
(a)
(b)
(c)
Figure 6.4: For Example 6.1. (a) The surface z = x − y + 4 and, from the bottom, the
planes z = 2, 4, 6. (b) The curves of intersection of the surface and the planes in (a) with their corresponding values of z. (c) The contours: Each line represents a contour (i.e. the points with coordinates (x, y) that map to a particular value of z) — in this case, the further to the right the line is, the larger the corresponding value of z is, as we have z = 2, 4, 6 as we move from left to right. Notice that, here, the contours are parallel straight lines (i.e. they have the same gradient but different y-intercepts).
Activity 6.1 Find the equations of the z = −10, z = 0 and z = 10 contours of the surface z = 4x + 2y − 2 and sketch these in the (x, y)-plane clearly labelling the value of z which is associated with each contour.
205
6. Functions of several variables
Example 6.2 Find the z = 16 contour of the surface z = x2 + y 2 . What are the z = c contours of the surface z = x2 + y 2 when (i) c > 0, (ii) c = 0 and (iii) c < 0? To find the z = 16 contour of the surface z = x2 + y 2 we need to find the curve of intersection which, in this case, is simply x2 + y 2 = 16. This is the equation of a circle, centred on the origin, with a radius of four. To find the z = c contours in the three cases indicated we just need to find out what the curve x2 + y 2 = c, looks like in the three cases. So, we have: √
If c > 0, the contour is a circle, centred on the origin, with a radius of
c.
If c = 0, the contour is the point (0, 0) as this is the only solution to the equation x2 + y 2 = 0.
6
If c < 0, there are no contours as we know that x2 + y 2 ≥ 0 for all values of x and y. In particular, notice that z = 0 is the smallest value of z that arises from a point on this surface. The surface and three of its contours for c > 0 are illustrated in Figure 6.5.
5
4 70 3 60 2 50
70
1
60
40 50
0
30 40
−5
−4
−3
−2
−1
z
20
30 20
−5.0 0 5.0
10
0.0
5.0 −2.5
x
−5.0
2
3
4
5
y
0
2.5 2.5
1
−2
10 −2.5 0.0 y
0 −1
x
5.0
2.5 2.5
0.0
5.0 −2.5
−5.0 −2.5 0.0 y
−3
−4
−5.0
x
−5
(a)
(b)
(c)
Figure 6.5: For Example 6.2. (a) The surface z = x2 + y 2 , which we saw in Figure 6.2(a),
and the planes z = 4, 16, 25. (b) The curves of intersection of the surface and the planes in (a) with their corresponding values of z. (c) The contours: Each circle represents a contour (i.e. the points with coordinates (x, y) that map to a particular value of z) — in this case, the larger the radius of the contour, the larger the corresponding value of z as we have z = 4, 16, 25. Notice that, here, the contours are concentric circles (i.e. they have the same centre but different radii).
206
6.2. Surfaces
Activity 6.2 Find the z = −25 contour of the surface z = −x2 − y 2 . What are the z = c contours of this surface when (i) c > 0, (ii) c = 0 and (iii) c < 0? Vertical planes and the sections of a surface Another way of visualising a surface is to look at its sections, which are the curves of intersection that arise when we look at the points of intersection of a surface with planes that are perpendicular to the (x, y)-plane. To find the sections, we take a plane perpendicular to the (x, y)-plane and find the curve of intersection between it and the surface z = f (x, y). In particular, in this course, we shall only need to consider sections that arise from planes that are parallel to the (x, z)-plane (i.e. y = c for some constant c) or parallel to the (y, z)-plane (i.e. x = c for some constant c). As such, the easiest sections to sketch are the ones we get when we consider the (x, z) and (y, z)-planes which are both perpendicular to the (x, y)-plane. In particular, we find that the section which we get from the: (x, z)-plane, which has the equation y = 0, is the curve of intersection between it and the surface z = f (x, y), i.e. the curve with equation z = f (x, 0). (y, z)-plane, which has the equation x = 0, is the curve of intersection between it and the surface z = f (x, y), i.e. the curve with equation z = f (0, y). Let’s look at what these sections look like in the case of the two surfaces we considered above when we were looking for contours. Example 6.3
Find the (x, z) and (y, z)-sections of the surface z = x − y + 4.
To find these sections of the surface z = x − y + 4 we need to find the curves of intersection, which in this case, are given by: For the (x, z)-section, we have y = 0 and so the curve of intersection is given by z = x + 4 and this is a straight line in the (x, z)-plane. For the (y, z)-section, we have x = 0 and so the curve of intersection is given by z = −y + 4 and this is a straight line in the (y, z)-plane.
The surface and these sections are illustrated in Figure 6.6.
Activity 6.3 Find the (x, z) and (y, z)-sections of the surface z = 4x + 2y − 2 and sketch these in the appropriate planes. Example 6.4
Find the (x, z) and (y, z)-sections of the surface z = x2 + y 2 .
To find these sections of the surface z = x2 + y 2 we need to find the curves of intersection, which in this case, are given by: For the (x, z)-section, we have y = 0 and so the curve of intersection is given by z = x2 and this is a parabola in the (x, z)-plane.
207
6
6. Functions of several variables
For the (y, z)-section, we have x = 0 and so the curve of intersection is given by z = y 2 and this is a parabola in the (y, z)-plane. The surface and these sections are illustrated in Figure 6.7.
8.0
8.0
7.2
7.2
6.4
6.4
5.6
5.6
4.8
4.8
z 4.0
z 4.0
3.2
3.2
2.4
2.4
1.6
1.6
2.5
0.8
0.8
y
0.0
8
6 z 4 5.0 2
0
0.0
−5.0 −2.5
−2.5 0.0 x
2.5
−5
−4
−3
−2
−1
0.0 0
1
2
3
4
5
−5
−4
−3
x
−5.0
−2
0
−1
1
2
3
4
5
y
5.0
(a)
6
(b)
(c)
Figure 6.6: For Example 6.3. (a) The surface z = x − y + 4 and the planes x = 0 (which
goes diagonally from bottom left to top right) and y = 0 (which goes diagonally from top left to bottom right). (b) The (x, z)-section is the line z = x + 4. (c) The (y, z)-section is the line z = −y + 4.
8.0
8.0
7.2
7.2
6.4
6.4
5.6
5.6
4.8
4.8
z 4.0
z 4.0
3.2
3.2
2.4
2.4
1.6
1.6
2
0.8
0.8
y
0.0
8
6 z 4 4
2
0
0 −4
−2
−2 0 x
−4
2
−5
−4
−3
−2
−1
0.0 0
x
1
2
3
4
5
−5
−4
−3
−2
−1
0
1
2
3
4
5
y
4
(a)
(b)
(c)
Figure 6.7: For Example 6.4. (a) The surface z = x2 + y 2 and the planes x = 0 (which
goes diagonally from bottom left to top right) and y = 0 (which goes diagonally from top left to bottom right). (b) The (x, z)-section is the parabola z = x2 . (c) The (y, z)-section is the parabola z = y 2 .
Activity 6.4 Find the (x, z) and (y, z)-sections of the surface z = −x2 − y 2 and sketch these in the appropriate planes. More generally, we may want to look at the sections we get when we consider planes that are parallel to the (x, z) and (y, z)-planes which we considered above. In particular, we find that the sections we get from the planes that are parallel to the:
208
6.2. Surfaces
(x, z)-plane, which have equations of the form y = c where c is a constant, are the curves of intersection between it and the surface z = f (x, y), i.e. the curve with equation z = f (x, c). (y, z)-plane, which have equations of the form x = c where c is a constant, are the curves of intersection between it and the surface z = f (x, y), i.e. the curve with equation z = f (c, y). Let’s see what these sections look like in the case of the two surfaces we considered above. Example 6.5
Find the y = 0, 2, 4 sections of the surface z = x − y + 4.
To find these sections of the surface z = x − y + 4 we need to find the curves of intersection, which in this case, are given by: For the y = 0 section, we have y = 0 and so the curve of intersection is given by z = x + 4 and this is a straight line in the (x, z)-plane. Of course, this is just the (x, z)-section we found in Example 6.3! For the y = 2 section, we have y = 2 and so the curve of intersection is given by z = x − 2 + 4 = x + 2 and this is a straight line. For the y = 4 section, we have y = 4 and so the curve of intersection is given by z = x − 4 + 4 = x and this is a straight line.
Observe that only the first of these sections ‘lives’ in the (x, z)-plane, but we can sketch the other two in this plane to get a feel for how the surface is changing when we look at the sections y = c for different values of c. The surface and these sections, when drawn in the (x, z)-plane, are illustrated in Figure 6.8.
8
6 8
z
6
4 z 4 2 4
2 2 0 0
x 0
−2 5
4
3
2
−4 1
0
y
(a)
−1
−3
−1
−2
0
1
2
3
x
(b)
Figure 6.8: For Example 6.5. (a) The surface z = x − y + 4 and the planes y = 0, y = 2
and y = 4 as we move from right to left. (b) The y = 0, y = 2 and y = 4 sections (as we move from top to bottom) all drawn in the (x, z)-plane. Note that, the y = 0 section is the (x, z)-section and, of the three sections illustrated, this is the only one that really ‘lives’ in the (x, z)-plane. Also notice that, as the value of c increases when we look at the plane y = c, the value of the z-intercept decreases when we look at the section.
209
6
6. Functions of several variables
Activity 6.5 Find the x = 0, 2, 4 sections of the surface z = x − y + 4 and sketch them in the (y, z)-plane. Of these three sections, which one have we found before and what did we call it? Of these three sections, which is the only one that really ‘lives’ in the (y, z)-plane? Activity 6.6
Consider the surface z = 4x + 2y − 2.
Find the y = −2, 0, 2 sections of this surface and sketch them in the (x, z)-plane. Find the x = −2, 0, 2 sections of this surface and sketch them in the (y, z)-plane. Example 6.6
Find the x = 0, 1, 2 sections of the surface z = x2 + y 2 .
To find these sections of the surface z = x2 + y 2 we need to find the curves of intersection, which in this case, are given by: For the x = 0 section, we have x = 0 and so the curve of intersection is given by z = y 2 and this is a parabola in the (y, z)-plane. Of course, this is just the (y, z)-section we found in Example 6.4!
6
For the x = 1 section, we have x = 1 and so the curve of intersection is given by z = 1 + y 2 and this is a parabola. For the x = 2 section, we have x = 2 and so the curve of intersection is given by z = 4 + y 2 and this is a parabola. Observe that only the first of these sections ‘lives’ in the (y, z)-plane, but we can sketch the other two in this plane to get a feel for how the surface is changing when we look at the sections x = c for different values of c. The surface and these sections, when drawn in the (x, z)-plane, are illustrated in Figure 6.9. Activity 6.7 Find the y = 0, 1, 2 sections of the surface z = x2 + y 2 and sketch them in the (x, z)-plane. Of these three sections, which one have we found before and what did we call it? Of these three sections, which is the only one that really ‘lives’ in the (x, z)-plane? Activity 6.8
Consider the surface z = −x2 − y 2 .
Find the y = 0, 1, 2 sections of this surface and sketch them in the (x, z)-plane. Find the x = 0, 1, 2 sections of this surface and sketch them in the (y, z)-plane.
6.3
Partial differentiation
In Chapter 3, we saw how to differentiate functions of one variable. Unsurprisingly, perhaps, we can also differentiate functions of two variables using partial differentiation
210
6.3. Partial differentiation
8
8 6 6 z 4 z 4
2 2 4 2 0 y
0
0 −4
−2
0
2
4
−2 −3
−2
y −1
0
−4 1
2
x
3
(a)
(b)
Figure 6.9: For Example 6.6. (a) The surface z = x2 + y 2 and the planes x = 0, x = 1 and
x = 2 as we move from left to right. (b) The x = 0, x = 1 and x = 2 sections all drawn in the (y, z)-plane. Note that, the x = 0 section is the (y, z)-section and, of the three sections illustrated, this is the only one that really ‘lives’ in the (y, z)-plane. Notice that, as the value of c increases when we look at the plane x = c, the value of the z-intercept increases when we look at the section. to yield partial derivatives.2 In some ways, this will be similar to what we saw when we differentiated functions of one variable to get their derivatives, but as we now have two variables to deal with, things get a little trickier.
6.3.1
Sections and partial derivatives
Consider f (x, y), a function of two independent variables. For a fixed value of y, say y = y0 , we can look at the function g(x) = f (x, y0 ) which is now a function of x only. Clearly, the rate of change of g(x) with respect to x is just the derivative of this function with respect to x. But, what happens when we want to calculate the rate of change of f (x, y) with respect to x for any fixed value of y? To do this we avoid specifying a particular value of y by just assuming that y is a constant and differentiating with respect to x. So, given a function f (x, y) we denote the operation of differentiating f with respect to x whilst holding y constant by ∂f or, more compactly, fx (x, y), ∂x
(6.1)
and call this the partial derivative of f (x, y) with respect to x.3 In a similar manner, we can define the partial derivative of f (x, y) with respect to y, denoted by ∂f or, more compactly, fy (x, y), ∂y
(6.2)
2
Most of the material in these notes can be generalised to functions with more than two variables. But, in this course, almost without exception, we will be considering functions of two variables. 3 Note that we use the ‘curly-d’, i.e. ‘∂’, for partial derivatives rather than the normal ‘straight-d’, i.e. ‘d’, which one encounters in the notation dg/dx for the derivative of a function g(x) of one variable. We shall see why it is important to keep these two notions of differentiation separate later. Similarly, we use fx (x, y) as shorthand for the partial derivative of f (x, y) with respect to x rather than the g 0 (x) which one encounters as the shorthand for the derivative of a function g(x) of one variable.
211
6
6. Functions of several variables
which is what we obtain from differentiating f (x, y) with respect to y whilst holding x constant. Clearly, the partial derivative of f (x, y) with respect to x, i.e. the result of differentiating f (x, y) with respect to x whilst holding y constant, is going to be another function of x and y. This function of x and y is what is denoted by the symbols in (6.1). But, what does this partial derivative mean? In effect, what we have done when we consider the function f (x, y) for some fixed value of y, say y0 , is to look at the section of the curve z = f (x, y) we get when y = y0 , i.e. the section given by the equation z = f (x, y0 ) which lies in a plane that has y = y0 and is parallel to the (x, z)-plane. Then, when we differentiate f (x, y0 ) with respect to x, we are finding the gradient of this section, i.e. it tells us how z = f (x, y0 ) is varying with x. Consequently, this partial derivative is telling us something about the gradient of the surface when we are at the point (x, y0 ) and we are ‘looking’ in the x-direction. This will become clear when we look at tangent planes in Section 6.4.1.
6
Activity 6.9 Describe what the partial derivative of f (x, y) with respect to y evaluated at the point (x0 , y) tells us about the gradient of the surface at the point (x0 , y).
6.3.2
Finding partial derivatives
Calculating the partial derivatives of f (x, y) is only slightly more difficult than finding the derivative of a function of one variable. Recalling that the partial derivative of a function f (x, y) with respect to x, i.e. fx (x, y), is just the derivative of f (x, y) with respect to x whilst holding y constant, to calculate fx (x, y) we just treat any occurrence of y in f (x, y) as if it were a constant and differentiate f (x, y) with respect to x. And, in a similar way, we can find the partial derivative of a function f (x, y) with respect to y, i.e. fy (x, y). Let’s look at an example. Example 6.7
Given that f (x, y) = x2 y + 5xy 3 + y 2 , find fx (x, y) and fy (x, y).
Let’s do this ‘slowly’ so that we get the idea. To find fx (x, y), we treat y as if it were a constant and let’s say that this constant is c. So, we have a function of one variable given by g(x) = f (x, c) = cx2 + 5c3 x + c2 , and differentiating this with respect to x gives dg = 2cx + 5c3 . dx But, c is the constant we’re using to represent y and so replacing all the ‘c’s with ‘y’s we have ∂f = 2xy + 5y 3 , ∂x which is the partial derivative of f (x, y) with respect to x. Similarly, to find fy (x, y), we treat x as if it were a constant and (again) let’s say that this constant is c. So, we have a function of one variable given by g(y) = f (c, y) = c2 y + 5cy 3 + y 2 ,
212
6.3. Partial differentiation
and differentiating this with respect to y gives dg = c2 + 15cy 2 + 2y. dy But, c is the constant we’re using to represent x and so replacing all the ‘c’s with ‘x’s we have ∂f = x2 + 15xy 2 + 2y, ∂y which is the partial derivative of f (x, y) with respect to y. Obviously, there is no need to go through all this detail whenever we calculate a partial derivative — all you have to do is remember what you are keeping constant and then differentiate whatever is left. Let’s look at another example. Example 6.8
Given that f (x, y) = 3x3 + 7xy −1 + 2y 9 , find fx (x, y) and fy (x, y).
Let’s do this ‘quickly’. To find fx (x, y), we treat y as a constant and differentiate with respect to x to get ∂f = 9x2 + 7y −1 . ∂x Similarly, to find fy (x, y), we treat x as a constant and differentiate with respect to y to get ∂f = −7xy −2 + 18y 8 . ∂y And, we’re done! Activity 6.10
Given that f (x, y) = 2x + x3 y −
x y3 + , y 2
find fx (x, y) and fy (x, y). So far, we have calculated the partial derivatives of very simple functions of x and y. But, sometimes, we will need to use the chain, product and quotient rules when calculating partial derivatives. Let’s look at an example to see how this is done. Example 6.9
Given that
2
f (x, y) = x ex+y , find fx (x, y) and fy (x, y). We first note that we can write this function as 2
f (x, y) = (x ex ) ey , 2
and so, to find fx (x, y), we treat ey as a constant and we differentiate the function x ex using the product rule to get x ex +1 ex . This gives us ∂f 2 2 = ey (x ex + ex ) = (x + 1) ex+y . ∂x
213
6
6. Functions of several variables
To find fy (x, y), we treat x ex as a constant and we differentiate the function ey 2 using the chain rule to get 2y ey . This gives us
2
∂f 2 2 = x ex (2y ey ) = 2xy ex+y . ∂y
Activity 6.11
6.3.3
6
Given that f (x, y) =
p
x2 + y 2 , find fx (x, y) and fy (x, y).
The chain rule
Sometimes a function of one variable is defined with reference to a function of two variables. For instance, suppose that the production level, q, of a firm depends on the amounts k of capital and l of labour used through the function q(k, l). Suppose also that both k and l change over time in some known way so that we have formulas for k(t) and l(t) where t is a parameter measuring time.4 How, we might ask, can we find the rate of change of production with time? Example 6.10 Given that we have the production function q(k, l) = kl where k and l are functions of time, t, given by k(t) = 3 + 2t
and
l(t) = 10 − 3t,
find the rate of change of production with time. In this case, we can calculate the production as a function of time by explicitly finding Q(t) = q(k(t), l(t)) which, in this case is Q(t) = k(t)l(t) = (3 + 2t)(10 − 3t) = 30 + 11t − 6t2 . And, in particular, we can now differentiate this to find the rate of change of production with time, i.e. we have dQ = 11 − 12t, dt in this case. More generally, suppose we are given a function f of two variables x and y, both of which are themselves functions of t. We can think of this as defining a composite function F (t) = f (x(t), y(t)). In the case of a single variable we have a rule, i.e. the chain rule, which enables us to work out the derivative of a composite function. Amazingly, perhaps, there is a similar rule for composite functions of two variables such as the one we have here which is also known as the chain rule. It states that dF ∂f dx ∂f dy = + . dt ∂x dt ∂y dt 4
(6.3)
Notice that, since k and l both depend on t, we can only pick certain pairs of values, (k, l). That is, in this case, the variables k and l are not independent.
214
6.3. Partial differentiation
Sometimes, in this context, we call F 0 (t) the total derivative of F (t) with respect to t (in order to distinguish it from the partial derivatives of f with respect to x and y). To see why the chain rule works, consider that if we change t by a small amount, ∆t, the corresponding change in F (t) is given by dF ∆t, dt but here, there are two ways in which F (t) = f (x(t), y(t)) can change with t. ∆F '
Firstly, F can change with t because f changes with x and x changes with t, let’s denote this change in F by ∆x F . In this case, we have ∂f ∆x, ∂x as we are holding y constant to see how F changes with x and this means that ∆x F '
∂f dx ∆t, ∂x dt as the change in x, ∆x, is related to a change in t by ∆x ' x0 (t)∆t. ∆x F '
Secondly, F can change with t because f changes with y and y changes with t, let’s denote this change in F by ∆y F . In this case, we have ∆y F '
∂f ∆y, ∂y
as we are holding x constant to see how F changes with y and this means that ∆y F '
∂f dy ∆t, ∂y dt
as the change in y, ∆y, is related to a change in t by ∆y ' y 0 (t)∆t.
Thus, as the total change in F due to these two changes is given by ∆F = ∆x F + ∆y F '
∂f dx ∂f dy ∆t + ∆t, ∂x dt ∂y dt
we can now equate our two expressions for ∆F and divide through by ∆t to get the chain rule which we saw above in (6.3). Let’s see how we could have used it to answer the question we saw in Example 6.10. Example 6.11 Consider the functions in Example 6.10. Use the chain rule to find the rate of change of production with time. Here q(k, l) = kl, k(t) = 3 + 2t and l(t) = 10 − 3t. In this case, if we again let Q(t) = q(k(t), l(t)), the chain rule states that dQ ∂q dk ∂q dl = + . dt ∂k dt ∂l dt As such, using this, we can see that dQ = (l)(2) + (k)(−3) = 2(10 − 3t) − 3(3 + 2t) = 11 − 12t, dt which agrees with our earlier answer.
215
6
6. Functions of several variables
Activity 6.12 Suppose that f (x, y) = x2 y and that x(t) = 2 + 3t and y(t) = t2 + 1. If F (t) = f (x(t), y(t)), use the chain rule to find the total derivative of F with respect to t and check your answer by explicitly finding F (t) and differentiating it with respect to t. We now consider one of the many useful applications of the chain rule. The derivative of an implicit function An equation g(x, y) = c where c is a constant can, in some cases, be rearranged (or solved) to give y as an explicit function of x. Once we have done this, we can then differentiate our expression for y with respect to x to find its derivative, y 0 (x). Example 6.12 Suppose that y is a function of x defined by the equation x2 − y = 7. Find y as an explicit function of x and hence find y 0 (x).
6
As we have x2 − y = 7 we can easily rearrange this to get y = x2 − 7, i.e. we have y(x) = x2 − 7, if we want y as an explicit function of x. In particular, this means that dy = 2x, dx in this case. In general, we say that an equation g(x, y) = c defines y implicitly as a function of x if there is a function y(x) which satisfies the equation for a range of values of x. But, in general, it may be difficult or impossible to solve the equation g(x, y) = c to find an explicit formula for y(x) as we did in Example 6.12. However, we can [often] still find the derivative y 0 (x), even if we don’t have an explicit expression for y in terms of x. To see how we can do this, consider that if we knew the function, y(x), that satisfied the equation g(x, y) = c, we could find a new function, G(x), of x only which would be given by G(x) = g(x, y(x)). Then, using the chain rule, we would have ∂g dx ∂g dy dG = + . dx ∂x dx ∂y dx But, G(x) = c where c is a constant and so we also have dG =0 dx
as well as
which means that we are left with 0=
∂g ∂g dy + . ∂x ∂y dx
Rearranging this then gives us dy ∂g/∂x =− , dx ∂g/∂y
216
dx = 1, dx
6.3. Partial differentiation
as long as gy (x, y) 6= 0, That is, y 0 (x) can easily be found by using the partial derivatives of g. (But, don’t forget the minus sign!) Example 6.13 In Example 6.12, y was a function of x defined implicitly by the equation x2 − y = 7. Find y 0 (x) using the result above. As we have the equation x2 − y = 7 we can write this as g(x, y) = c with g(x, y) = x2 − y and c = 7. Using the above result we can then see that ∂g = 2x ∂x which means that
as before. Example 6.14
and
∂g = −1, ∂y
dy ∂g/∂x 2x =− =− = 2x, dx ∂g/∂y −1
Suppose that y is a function of x defined implicitly by the equation 2 3
3 2
x y − 6x y + 2xy = 1. Verify that the point (x, y) = (1/2, 2) satisfies this equation and find the value of the derivative, y 0 (x), at this point. The point (x, y) = (1/2, 2) satisfies the equation since, putting x = 1/2 and y = 2 into the left-hand side, we get 2 3 1 1 1 3 2 (2) − 6 (2) + 2 (2) = 2 − 3 + 2 = 1, 2 2 2 which is what we have on the right-hand side of the equation. We then see that the equation defining y implicitly as a function of x is of the form g(x, y) = 1 where g(x, y) = x2 y 3 − 6x3 y 2 + 2xy. So, according to the formula given above, we have dy ∂g/∂x =− , dx ∂g/∂y and so, since ∂g = 2xy 3 − 18x2 y 2 + 2y ∂x we have
and
∂g = 3x2 y 2 − 12x3 y + 2x, ∂y
dy 2xy 3 − 18x2 y 2 + 2y =− 2 2 , dx 3x y − 12x3 y + 2x
as long as 3x2 y 2 − 12x3 y + 2x 6= 0. Thus, given the point (1/2, 2), we can substitute these values into our expression for y 0 (x) to see that the value of the derivative at this point is 6.
217
6
6. Functions of several variables
Activity 6.13
Suppose that y is a function of x defined implicitly by the equation x2 + 2xy = 6 − 3y 3 .
Verify that the point (x, y) = (1, 1) satisfies this equation and find the value of the derivative, y 0 (x), at this point. Extensions of the chain rule What we seen above can be extended. Suppose, for instance, that g is is a function of two variables x and y, both of which are themselves functions of two variables k and l. We can think of this as defining a composite function G(k, l) = g(x(k, l), y(k, l)) and an extension of the chain rule then assures us that ∂G ∂g ∂x ∂g ∂y = + ∂k ∂x ∂k ∂y ∂k
6
and
∂G ∂g ∂x ∂g ∂y = + . ∂l ∂x ∂l ∂y ∂l
To see why the first of these formulae works, consider that if we change k by a small amount, ∆k, whilst holding l constant, the corresponding change in G(k, l) is given by ∆G '
∂G ∆k, ∂k
but here, there are two ways in which G(k, l) = g(x(k, l), y(k, l)) can change with k. Firstly, G can change with k because g changes with x and x changes with k, let’s denote this change in G by ∆x G. In this case, we have ∆x G '
∂g ∆x, ∂x
as we are holding y constant to see how F changes with x and this means that ∆x G '
∂g ∂x ∆k, ∂x ∂k
as the change in x, ∆x, is related to a change in k with l held constant by ∆x ' xk (k, l)∆k. Secondly, G can change with k because g changes with y and y changes with k, let’s denote this change in G by ∆y G. In this case, we have ∆y G '
∂g ∆y, ∂y
as we are holding x constant to see how F changes with y and this means that ∆y G '
∂g ∂y ∆k, ∂y ∂k
as the change in y, ∆y, is related to a change in k with l held constant by ∆y ' yk (k, l)∆k.
218
6.3. Partial differentiation
Thus, as the total change in F due to these two changes is given by ∆G = ∆x G + ∆y G '
∂g ∂x ∂g ∂y ∆k + ∆k, ∂x ∂k ∂y ∂k
we can now equate our two expressions for ∆G and divide through by ∆k to get the chain rule for Gk (k, l) which we saw above. Activity 6.14 Use a similar argument to the one above to explain why the chain rule formula for Gl (k, l) works. And, in a similar manner, if we suppose that g(x, y, z) = c defines z implicitly as a function of x and y, we can use this form of the chain rule to derive the formulae ∂z ∂g/∂x =− ∂x ∂g/∂z
and
∂z ∂g/∂y =− , ∂y ∂g/∂z
which will allow us to calculate the partial derivatives of z with respect to x and y. Indeed, to see why the first of these formulae works, we consider that if we knew the function, z(x, y), that satisfied the equation g(x, y, z) = c, we could find a new function, G(x, y), of x and y only which is given by G(x, y) = g(x, y, z(x, y)). Then using the chain rule, we have ∂g dx ∂g ∂z ∂G = + . ∂x ∂x dx ∂z ∂x But, G(x, y) = c where c is a constant and so we also have ∂G =0 ∂x
as well as
dx = 1, dx
which means that we are left with 0=
∂g ∂g ∂z + . ∂x ∂z ∂x
Rearranging this then gives us ∂z ∂g/∂x =− , ∂x ∂g/∂z as long as gz (x, y, z) 6= 0. That is, zx (x, y) can easily be found by using the partial derivatives of g. (But, don’t forget the minus sign!) Activity 6.15 Use a similar argument to the one above to explain why the formula for zy (x, y) works. Activity 6.16 equation
Suppose that q is a function of k and l defined implicitly by the q 3 k + k 3 l + qk 2 l = 3.
Find the partial derivatives qk (k, l) and ql (k, l). What are the values of these partial derivatives at the point where k = 1 and l = 1? [Hint: The identity q 3 + q − 2 = (q − 1)(q 2 + q + 2) will be useful.]
219
6
6. Functions of several variables
6.3.4
An application: Homogeneous functions
Homogeneous functions are important in economics since they allow us to capture the idea of ‘returns to scale’. In this section we will see what it means for a function to be homogeneous and consider an important theorem about homogeneous functions. The former will enable us to give an economic interpretation of homogeneous production functions in terms of ‘returns to scale’ and the latter will enable us to consider the economic significance of the marginal products that can be derived from such production functions. Homogeneity and returns to scale We say that a function, f (x, y), is homogeneous of degree r if f (λx, λy) = λr f (x, y), for any λ ∈ R. Let’s start by looking at some examples of homogeneous functions.
6
Example 6.15 one.
Show that the function f (x, y) = x1/2 y 1/2 is homogeneous of degree
Replacing x and y in f (x, y) with λx and λy we get f (λx, λy) = (λx)1/2 (λy)1/2 = (λ1/2 x1/2 )(λ1/2 y 1/2 ) = λ1 x1/2 y 1/2 = λ1 f (x, y). Comparing this with the definition of homogeneity, i.e. f (λx, λy) = λr f (x, y), we see that r = 1 and so this function is homogeneous of degree one. Example 6.16 Show that the function f (x, y) = is its degree of homogeneity?
√
x+
√
y is homogeneous. What
Replacing x and y in f (x, y) with λx and λy we get p √ √ √ √ √ √ √ √ √ f (λx, λy) = λx + λy = λ x + λ y = λ( x + y) = λf (x, y). √ As λ = λ1/2 , comparing this with the definition of homogeneity, i.e. f (λx, λy) = λr f (x, y), we see that r = 1/2 and so this function is homogeneous of degree one half. Example 6.17
Show that the function f (x, y) = x + y 2 is not homogeneous.
Replacing x and y in f (x, y) with λx and λy we get f (λx, λy) = (λx) + (λy)2 = λx + λ2 y 2 .
220
6.3. Partial differentiation
Comparing this with the definition of homogeneity, i.e. f (λx, λy) = λr f (x, y), we see that there is no way of writing λx + λ2 y 2 in the form λr (x + y 2 ) for any r and so this function is not homogeneous. In particular, this means that not all functions are homogeneous. Economically, we can think of homogeneous functions as telling us about how outputs change if we ‘scale up’ our inputs. To see why, consider what happens if we scale up our inputs by a factor of λ, i.e. if we increase our bundle of inputs, (x, y), by a factor of λ > 1 we get the new bundle of inputs (λx, λy). Now if our outputs are determined by a homogeneous function, f (x, y), of degree r we can see that the output from our new bundle, (λx, λy), is given by f (λx, λy) = λr f (x, y), i.e. we will get λr times as much as we did from our old bundle, (x, y). That is, scaling inputs by λ leads to a scaling of output by λr if our output is determined by a function which is homogeneous of degree r. In particular, given a function which is homogeneous of degree one, we can see that scaling our inputs by λ > 1 — i.e. going from the bundle of inputs (x, y) to the bundle of inputs (λx, λy) — will scale our output by λ — i.e. going from an output of f (x, y) to an output of λf (x, y). That is, we get constant returns to scale, a proportional increase in inputs leads to the same proportional increase in output. Clearly, given functions of degree r > 0, this idea can be extended to cover functions with degrees r 6= 1 as follows: If r > 1, we get increasing returns to scale as λ > 1 implies that λr > λ.5 If r = 1, we get constant returns to scale as we saw above. If r < 1, we get decreasing returns to scale as λ > 1 implies that λr < λ.6 To see how this works, consider the following example. Example 6.18 A firm invests an amount of capital, k, and labour, l, in its production process and this yields a production level of q(k, l). What will be the effect on the level of production of quadrupling the amount of capital and labour invested if the production function is homogeneous of degree (a) 1/2, (b) 1 and (c) 3/2? Quadrupling the amount of capital and labour invested means increasing the investment bundle from (k, l) to (4k, 4l). So, if the production function is homogeneous of degree r, the production level will go from q(k, l) to q(4k, 4l) = 4r q(k, l), i.e. the production level will change by a factor of 4r . In particular, this means that if the production function is homogeneous of degree (a) 1/2, the change will be by a factor of 41/2 = 2 (i.e. quadrupling inputs doubles production), 5 6
That is, a proportional increase in inputs leads to a larger proportional increase in output. That is, a proportional increase in inputs leads to a smaller proportional increase in output.
221
6
6. Functions of several variables
(b) 1, the change will be by a factor of 41 = 4, (i.e. quadrupling inputs quadruples production), (c) 3/2 the change will be by a factor of 43/2 = 8, (i.e. quadrupling inputs octuples production), yielding decreasing, constant and increasing returns to scale respectively. We now turn to a useful result about homogeneous functions. Euler’s theorem and marginal products Euler’s theorem states that if f (x, y) is an homogeneous function of degree r, then x
∂f ∂f +y = rf (x, y). ∂x ∂y
This follows from a simple application of the chain rule since, using the definition of a function that is homogeneous of degree r, we have
6
f (λx, λy) = λr f (x, y), for any λ ∈ R. As such, differentiating both sides with respect to λ and using the chain rule from (6.3) on the left-hand side, we have ∂f du ∂f dv + = rλr−1 f (x, y), ∂u dλ ∂v dλ if we think of f (λx, λy) as f (u, v) with u = λx and v = λy. This then gives us ∂f ∂f +y = rλr−1 f (x, y). ∂u ∂v and, if we now set λ = 1, we get the desired result as we have u = x, v = y and λr−1 = 1. x
In this course, a question may involve verifying that Euler’s theorem holds for some given homogeneous function. As an example, let’s verify that it is true for the two homogeneous functions we considered in Examples 6.15 and 6.16. Example 6.19 In Example 6.15, we saw that the function f (x, y) = x1/2 y 1/2 is homogeneous of degree one. Verify that Euler’s theorem holds for this function. In this case we can see that ∂f 1 = x−1/2 y 1/2 ∂x 2
and
∂f 1 = x1/2 y −1/2 . ∂y 2
As such, we can see that the left-hand-side of Euler’s theorem gives us ∂f ∂f 1 −1/2 1/2 1 1/2 −1/2 1 1 x +y =x x y +y x y = x1/2 y 1/2 + x1/2 y 1/2 ∂x ∂y 2 2 2 2 = x1/2 y 1/2 = f (x, y),
and since the degree of homogeneity of this function is one, we have f (x, y) on the right-hand-side of Euler’s theorem. Thus, as these two expressions are the same, Euler’s theorem holds.
222
6.3. Partial differentiation
√ √ Example 6.20 In Example 6.16, we saw that the function f (x, y) = x + y is homogeneous of degree 1/2. Verify that Euler’s theorem holds for this function. In this case we can see that f (x, y) can be written as f (x, y) = x1/2 + y 1/2 and so, ∂f 1 = x−1/2 ∂x 2
and
∂f 1 = y −1/2 . ∂y 2
As such, we can see that the left-hand-side of Euler’s theorem gives us 1 1 −1/2 1 ∂f 1 −1/2 1 1 1/2 ∂f +y = x1/2 + y 1/2 = =x x y x + y 1/2 = f (x, y), x +y ∂x ∂y 2 2 2 2 2 2 and since the degree of homogeneity of this function is a half, we have 12 f (x, y) on the right-hand-side of Euler’s theorem. Thus, as these two expressions are the same, Euler’s theorem holds.
We now turn to the economic significance of Euler’s theorem. Consider a firm that invests an amount of capital, k, and labour, l, in its production process and this yields a production level of q(k, l). Further, assume that this production function is homogeneous of degree one, i.e. that we have constant returns to scale. Euler’s theorem then asserts that ∂q ∂q +l = q. k ∂k ∂l Now, ql gives us the marginal product of labour, i.e. it measures the change in production if we change the amount of labour. In particular, if we invest one more unit of labour, say by employing one more worker, ql tells us the resulting change in production.7 As such, it makes sense to say that this extra worker is responsible for this change in production and so, if we assume that we reward workers by giving them goods equal to the quantity they produce, it makes sense to reward this worker with a quantity of goods given by ql . Thus, if all workers produce the same amount, i.e. ql , and there are l (i.e. the amount of labour invested) workers, it makes sense that they should all be rewarded with a quantity of goods equal to ql . As such, the quantity lql represents the total quantity of goods that should be given as rewards to the workers (i.e. the labour). A similar argument applies to the quantity kqk , i.e. this should be the total quantity of goods that should be given as rewards to the providers of the capital. Consequently, Euler’s theorem tells us that these rewards should add up to the total quantity of goods produced, i.e. all the goods being produced should be distributed amongst the suppliers of capital and the providers of labour. In summary, this says:
7
But, strictly, this is only approximate since if ∆q is the change in production and ∆l is the change in labour, the relationship ∂q ∆q ∂q ' or ∆q ' ∆l, ∂l ∆l ∂l is only an approximation. As such, taking on one more worker (i.e. changing the amount of labour by one) gives ∆l = 1 and hence the change in production, ∆q, is given [approximately] by ∆q = ql . However, the argument given in these notes can be made precise if we consider the change in production due to an ‘arbitrarily small’ change in the amount of labour instead of, say, the intuitively more obvious change of ‘one worker’.
223
6
6. Functions of several variables
In a firm with constant returns to scale, if we reward each factor of production (e.g. capital and labour) at a level equal to its marginal product, then the total reward to the factors of production will be the amount produced.
6.3.5
Second-order partial derivatives
If we have a function f (x, y), we can use partial differentiation to find the new functions fx (x, y) and fy (x, y). These new functions are called the first-order partial derivatives of f . However, it is also possible to partially differentiate these new functions with respect to x and y to get the second-order partial derivatives of f . Obviously, for a function of two variables, there are four second-order partial derivatives, i.e. those that are ‘unmixed’: ∂ 2f ∂ ∂f ∂ ∂f ∂ 2f and , = = ∂x2 ∂x ∂x ∂y 2 ∂y ∂y and those that are ‘mixed’: ∂ 2f ∂ = ∂y∂x ∂y
6
∂f ∂x
∂ 2f ∂ = ∂x∂y ∂x
and
∂f . ∂y
Or, alternatively, in our more compact notation we have fxx = (fx )x ,
fyy = (fy )y ,
fxy = (fx )y
and fyx = (fy )x ,
respectively. In this course, we will find that the order of partial differentiation in the mixed second-order partial derivatives is unimportant since we will always have fxy = fyx . In particular, this fact can serve as a useful ‘check’ when we are working out second-order partial derivatives. Example 6.21
In Example 6.7, we found the first-order partial derivatives of f (x, y) = x2 y + 5xy 3 + y 2 ,
were given by fx (x, y) = 2xy + 5y 3
and
fy (x, y) = x2 + 15xy 2 + 2y.
Find the second-order partial derivatives of f . Partially differentiating fx (x, y) = 2xy + 5y 3 with respect to x and y respectively, we can see that fxx (x, y) = 2y and fxy (x, y) = 2x + 15y 2 , whereas, partially differentiating fy (x, y) = x2 + 15xy 2 + 2y with respect to x and y respectively, we can see that fyx (x, y) = 2x + 15y 2
and
fyy (x, y) = 30xy + 2.
Notice that fxy = fyx as we should expect in this course.
224
6.3. Partial differentiation
Example 6.22
In Example 6.8, we found the first-order partial derivatives of f (x, y) = 3x3 + 7xy −1 + 2y 9 ,
were given by fx (x, y) = 9x2 + 7y −1
fy (x, y) = −7xy −2 + 18y 8 .
and
Find the second-order partial derivatives of f . Partially differentiating fx (x, y) = 9x2 + 7y −1 with respect to x and y respectively, we can see that fxx (x, y) = 18x
fxy (x, y) = −7y −2 ,
and
whereas, partially differentiating fy (x, y) = −7xy −2 + 18y 8 with respect to x and y respectively, we can see that fyx (x, y) = −7y −2
and
fyy (x, y) = 14xy −3 + 144y 7 .
6
Notice that fxy = fyx as we should expect in this course. Find the second-order partial derivatives of the function in
Activity 6.17 Activity 6.10.
Activity 6.18 Find the first and second-order partial derivatives of f (x, y) = x3/4 y 1/4 . And, of course, when finding second-order partial derivatives we may need to use the chain, product and quotient rules. Example 6.23
In Example 6.9, we found the first-order partial derivatives of 2
f (x, y) = x ex+y , were given by fx (x, y) = (x + 1) ex+y
2
and
2
fy (x, y) = 2xy ex+y .
Find the second-order partial derivatives of f . To find the second-order derivatives that arise from fx (x, y), we first note that we can write it as ∂f 2 = [(x + 1) ex ] ey . ∂x 2
So, to find fxx (x, y), we treat ey as a constant and we differentiate the function (x + 1) ex using the product rule to get (x + 1) ex +1 ex . This gives us ∂ 2f 2 2 = ey [(x + 1) ex + ex ] = (x + 2) ex+y . 2 ∂x
225
6. Functions of several variables
To find fxy (x, y), we treat (x + 1) ex as a constant and we differentiate the function 2 2 ey using the chain rule to get 2y ey . This gives us ∂ 2f 2 2 = (x + 1) ex (2y ey ) = 2(x + 1)y ex+y . ∂y∂x To find the second-order derivatives that arise from fy (x, y), we first note that we can write it as ∂f 2 = 2(x ex )(y ey ). ∂y 2
So, to find fyx (x, y), we treat 2y ey as a constant and we differentiate the function x ex using the product rule to get x ex +1 ex . This gives us ∂ 2f 2 2 = 2y ey (x ex + ex ) = 2(x + 1)y ex+y . ∂x∂y To find fyy (x, y), we treat 2x ex as a constant and we differentiate the function y ey 2 2 using the chain and product rules to get y(2y ey ) + ey . This gives us
6
2
∂ 2f 2 2 2 = 2x ex (2y 2 ey + ey ) = 2x(2y 2 + 1) ex+y . 2 ∂y Notice that fxy = fyx as we should expect in this course. Activity 6.19 Activity 6.11.
Find the second-order partial derivatives of the function in
Of course, we could go on and discuss higher-order partial derivatives, but we won’t as they will not be used in this course.
6.4
Using partial derivatives
We now look at some of the useful things that partial derivatives tell us about functions of two variables. Before you start this section, you should note that this material makes use of some ideas from Chapter 2 of 173 Algebra, namely the dot product of two vectors (see Section 2.8), displacement and direction vectors (see Section 2.9), the equation of a plane (see Section 2.11), and the equation of a hyperplane (see Section 2.12). Make sure that you understand these before you proceed.
6.4.1
Tangent planes
Suppose that we have a surface whose equation is given by z = f (x, y). If c = f (a, b), then the point (a, b, c) is on this surface and, if we look at the sections given by x = a
226
6.4. Using partial derivatives
and y = b, which are parallel to the (y, z)-plane and (x, z)-plane respectively, we can find tangent lines in these planes by using the partial derivatives as these tell us how z is changing with y and x respectively at this point. In particular, if x = a, the section is given by z = f (a, y) and the tangent line is given by z = c + fy (a, b)(y − b), and this lives in the plane x = a which is parallel to the (y, z)-plane whereas if y = b, the section is given by z = f (x, b) and the tangent line is given by z = c + fx (a, b)(x − a), and this lives in the plane y = b which is parallel to the (x, z)-plane. Example 6.24 Show that the point (1, 1, 2) lies on the surface whose equation is z = x2 + y 2 . What are the equations of the tangent lines to the x = 1 and y = 1 sections at this point? The point (1, 1, 2) lies on the surface z = x2 + y 2 as 2 = 12 + 12 . Here we have z = f (x, y) with f (x, y) = x2 + y 2 and so, looking at the:
6
x = 1 section, we have fy (x, y) = 2y
=⇒
fy (1, 1) = 2,
and so the tangent line, which lives in the plane x = 1, has an equation given by z = 2 + 2(y − 1) = 2y, as we should expect since this section has an equation given by z = 1 + y 2 . This section and the tangent line are illustrated in Figure 6.10(a). y = 1 section, we have fx (x, y) = 2x
=⇒
fx (1, 1) = 2,
and so the tangent line, which lives in the plane y = 1, has an equation given by z = 2 + 2(x − 1) = 2x, as we should expect since this section has an equation given by z = 1 + x2 . This section and the tangent line are illustrated in Figure 6.10(b). In particular, note that these tangent lines ‘live’ in the planes that define the relevant sections. Indeed, as we can find two tangent lines that tell us about how the surface z = f (x, y) is changing in the x and y-directions at the point (a, b, c) by considering the y = b and x = a sections respectively, we can use these two lines to define the tangent plane to the surface at this point. The question is: How do we find the equation of this tangent plane?
227
6. Functions of several variables
x=1
z
y=1
z
z = 2y
z = 2x
2
2
1
1
O
1
O
y
(a)
1
x
(b)
Figure 6.10: Tangent lines to the (a) x = 1 and (b) y = 1 sections of the surface z = x2 +y 2
at the point (1, 1, 2) as discussed in Example 6.24.
6
Let’s assume that both of the partial derivatives, fx (x, y) and fy (x, y), are defined at the point (a, b, c). We know, from Section 2.11 of 173 Algebra, that the vector equation of a plane through this point is given by u x−a v · y − b = 0, w z−c where the vector (u, v, w) is the normal vector to the plane. Indeed, working out this dot product, we find that u(x − a) + v(y − b) + w(z − c) = 0, is the Cartesian equation of the plane. But, what are u, v and w? Well, if we assume that we have w 6= 0, i.e. the plane we are considering is not vertical, then we can write this as v u z = c − (x − a) − (y − b), w w and, to be a tangent plane, we require that the two tangent lines we found above lie in the plane. In particular, we find that when x = a, we must have z =c−
v (y − b) giving us z = c + fy (a, b)(y − b), w
and when y = b we must have z =c−
u (x − a) giving us z = c + fx (a, b)(x − a), w
which means that we have −
v = fy (a, b) and w
−
u = fx (a, b). w
This means that the Cartesian equation of the tangent plane is given by z − c = fx (a, b)(x − a) + fy (a, b)(y − b), and writing this as fx (a, b)(x − a) + fy (a, b)(y − b) − (z − c) = 0,
228
(6.4)
6.4. Using partial derivatives
we find that the vector equation of the tangent plane is fx (a, b) x−a fy (a, b) · y − b = 0. −1 z−c
(6.5)
In particular, we see that
u fx (a, b) v = fy (a, b) , w −1 is a normal vector to this tangent plane.
Example 6.25 Following on from Example 6.24, find the Cartesian and vector equations of the tangent plane to the surface z = x2 + y 2 at the point (1, 1, 2). Verify that the tangent lines to the x = 1 and y = 1 sections at this point (found in Example 6.24) lie in this tangent plane. Using what we found in Example 6.24 and (6.4), it should be clear that the Cartesian equation of the tangent plane to the surface z = x2 + y 2 at the point (1, 1, 2) is given by z − 2 = 2(x − 1) + 2(y − 1)
=⇒
z = 2x + 2y − 2,
whereas, using (6.5), its vector equation is 2 x−1 2 · y − 1 = 0. −1 z−2
Of course, if you work out the dot product in the latter, you should get the former! If we now find the x = 1 section of this tangent plane we get z = 2(1) + 2y − 2 = 2y, which is the tangent line to the x = 1 section of the surface and so this must lie in the tangent plane and, similarly, if we find the y = 1 section of this tangent plane we get z = 2x + 2(1) − 2 = 2x, which is the tangent line to the y = 1 section of the surface and so this must lie in the tangent plane too. This is illustrated in Figure 6.11. We note in passing that, if f is differentiable,8 then the tangent plane to f (x, y) at the point (a, b) gives us a linear approximation to f (x, y) at nearby points, i.e. f (x, y) ' f (a, b) + fx (a, b)(x − a) + fy (a, b)(y − b), which, using vectors, we can write as
8
x−a f (x, y) ' f (a, b) + fx (a, b), fy (a, b) . y−b
That is, if both of its partial derivatives exist.
229
6
6. Functions of several variables
Figure 6.11: The tangent plane to the surface z = x2 +y 2 at the point (1, 1, 2) as discussed
in Example 6.25. The lines in this tangent plane, which lie in the x = 1 and y = 1 planes, are the tangent lines to the x = 1 and y = 1 sections of the surface respectively.
6
This prompts us to define the derivative of f (x, y) with respect to the vector x = (x, y) to be the vector df = fx (x, y), fy (x, y) , dx so that we can write df x−a f (x, y) ' f (a, b) + . dx (a,b) y − b This then gives us something which looks like a Taylor series and we will see more of this in Section 6.4.5. But, before we do this, let’s consider another important use of what we have just seen.
6.4.2
Gradient vectors
The tangent to the surface z = f (x, y) at the point (a, b, c), where c = f (a, b), has a Cartesian equation given by z − c = fx (a, b)(x − a) + fy (a, b)(y − b). Now, if we look at the intersection of the surface and its tangent plane with the horizontal plane z = c, we find that the surface gives us the contour c = f (x, y) and the tangent plane gives us the line fx (a, b)(x − a) + fy (a, b)(y − b) = 0. Now, this line passes through the point (a, b) and, given that this line is in the tangent plane of the surface at the point (a, b, c), it should be clear that it is the tangent line of this contour at (a, b). In particular, as we can write the equation of this line as fx (a, b) x−a · = 0, (6.6) fy (a, b) y−b in vector form, we can see that the vector ∇f (a, b) =
230
fx (a, b) , fy (a, b)
6.4. Using partial derivatives
is a normal vector to the tangent line and so it is perpendicular to the contour. Example 6.26 Given that z = f (x, y) where f (x, y) = x2 + y 2 , find ∇f (1, 1). Show that this vector is perpendicular to the tangent line to the z = 2 contour of this surface at the point (1, 1) and hence deduce that it is perpendicular to this contour at this point. Here f (x, y) = x2 + y 2 and so we have fx (x, y) 2x ∇f (x, y) = = , fy (x, y) 2y and, evaluating this at the point (x, y) = (1, 1), we get 2 ∇f (1, 1) = . 2 Then, using (6.6), we see that the Cartesian equation of the tangent line to the z = 2 contour at this point9 is given by 2 x−1 · =0 =⇒ 2(x − 1) + 2(y − 1) = 0 =⇒ y = 2 − x. 2 y−1 Now, for x ∈ R, we have points (x, y) on this tangent line given by x x 0 1 = = +x , y 2−x 2 −1
T and so this line lies in the direction given by the vector 1, −1 . But, of course, 1 2 1 ∇f (1, 1) · = · = 2 + (−2) = 0, −1 2 −1 which means that ∇f (1, 1) is indeed perpendicular to this tangent line and, in particular, it will be perpendicular to the contour at this point too. This is illustrated in Figure 6.12. In general, given a function f (x, y), we call the vector fx (x, y) ∇f (x, y) = , fy (x, y)
(6.7)
the gradient of f . Indeed, we have seen that fx (a, b) and fy (a, b) allow us to see how rapidly f is changing if we move away from the point (a, b) in the x or y-direction respectively. Now, we will look at how ∇f (a, b) allows us to see how rapidly f is changing if we move away from the point (a, b) in any direction. 9
Note that (x, y) = (1, 1) gives z = f (1, 1) = 2 and so this point is on the z = 2 contour of this surface.
231
6
6. Functions of several variables
y =2−x y
2
z=2
∇f (1, 1)
1 O
1
2
x
Figure 6.12: The z = 2 contour of the surface z = x2 + y 2 and its tangent line at the
point (1, 1) as discussed in Example 6.26. Observe how the tangent line to the contour at this point is perpendicular to the vector ∇f (1, 1). (The x and y-intercepts of the contour have been omitted for clarity.)
6.4.3
6
Directional derivatives
Given the function f (x, y), we want to find its derivative, fuˆ (a, b), in the direction of the ˆ = (u1 , u2 )T .10 Of course, if u ˆ is a unit vector in the x-direction, i.e. unit vector u 1 ˆ= u we should get fuˆ (a, b) = fx (a, b), 0 ˆ is a unit vector in the y-direction, i.e. whereas if u 0 ˆ= u we should get fuˆ (a, b) = fy (a, b), 1 but the question is: What if we are not using either of these two directions? Consider the point on the surface z = f (x, y) at the point (a, b, c) where c = f (a, b). At ˆ i.e. the curve of intersection of the this point, we can find the section in the direction u, ˆ Then, surface and a plane that contains the point (a, b, c) and the vector u. geometrically, we would want to interpret fuˆ (a, b) as the gradient of the tangent line to ˆ is a unit vector, this means that we have a vector v given by this section. Now, as u u1 v = u2 , fuˆ (a, b) which lies in the plane and points in the direction of the tangent line. As such, this vector is perpendicular to the normal vector to the surface at this point and so we have u1 fx (a, b) u2 · fy (a, b) = 0. fuˆ (a, b) −1 That is, working out this dot product, we have
u1 fx (a, b) + u2 fy (a, b) − fuˆ (a, b) = 0, 10
ˆ = That is, we have apdirection u and we work with a unit vector in that direction, i.e. we use u ˆ = u21 + u22 = 1. (u1 , u2 )T where |u|
232
6.4. Using partial derivatives
or, rearranging, u1 fx (a, b) fuˆ (a, b) = u1 fx (a, b) + u2 fy (a, b) = · , u2 fy (a, b) if we rewrite this in terms of inner products. Thus, we can see that the derivative of f ˆ is given by at the point (a, b) in the direction of the unit vector u ˆ · ∇f (a, b), fuˆ (a, b) = u in terms of the gradient of f . Example 6.27 Given that z = f (x, y) with f (x, y) = x2 + y 2 , find the derivative of T f (x, y) in the direction 1, 2 at the point (1, 1). What is the derivative of f in the direction ∇f (1, 1)? We saw in Example 6.26 that the gradient of f at the point (1, 1) is given by 2 ∇f (1, 1) = . 2 So, taking the direction 1 u= 2
1 ˆ=√ we get the unit vector u 5
6
1 , 2
as |u|2 = 12 + 22 = 5 and this means that the gradient of f in the direction of this unit vector is given by 6 1 1 1 2 ˆ · ∇f (1, 1) = √ · = √ (2 + 4) = √ . fuˆ (1, 1) = u 2 5 2 5 5 Similarly, if we take the direction to be v = ∇f (1, 1), we have 1 2 2 v= so we get the unit vector vˆ = √ , 2 8 2 as |v |2 = 22 + 22 = 8 and this means that the gradient of f in the direction of this unit vector is given by 1 2 1 8 2 fvˆ (1, 1) = vˆ · ∇f (1, 1) = √ · = √ (4 + 4) = √ . 2 8 2 8 8 In particular, observe that the latter is approximately 2.83 (to 2dp) which is larger than the former which is approximately 2.68 (to 2dp). Indeed, this leads on to a useful observation about the rate at which f is changing in different directions. We know, from Section 2.9 of 173 Algebra, that if θ is the angle ˆ and ∇f (a, b), we have between the vectors u
233
6. Functions of several variables
ˆ · ∇f (a, b) = |u||∇f ˆ u (a, b)| cos θ = |∇f (a, b)| cos θ, ˆ = 1 since u ˆ is a unit vector. In particular, we can use the fact that as |u| −1 ≤ cos θ ≤ 1 to see that −|∇f (a, b)| ≤ fuˆ (a, b) ≤ |∇f (a, b)|. That is, if |∇f (a, b)| 6= 0, we can deduce that: The maximum rate of change of f at the point (a, b, c) is |∇f (a, b)| and this occurs when θ = 0, i.e. when the direction is u = ∇f (a, b). This is the direction and rate at which f increases most rapidly. The minimum rate of change of f at the point (a, b, c) is −|∇f (a, b)| and this occurs when θ = π, i.e. when the direction is u = −∇f (a, b). This is the direction and rate at which f decreases most rapidly.
6
Indeed, this allows us to see that, at the point (a, b), f is ‘steepest’ in the direction ∇f (a, b).11 Example 6.28 Illustrate that the maximum rate of change of f occurs in the direction ∇f using what we found in Example 6.27. In Example 6.27, we saw that the rate of change in the direction v = ∇f (1, 1) was T greater than the rate of change in the direction u = 1, 2 as fvˆ (1, 1) > fuˆ (1, 1),
and we can illustrate this using Figure 6.13. In particular, observe that if we want to move to the z = 4 contour from the point (1, 1) on the z = 2 contour, it is quickest to go in the direction given by ∇f (1, 1) as, if we were to go in the direction T u = 1, 2 , we would have to travel further. Consequently, the rate of change of z = f (x, y) is maximised when we go in the direction given by ∇f (1, 1) and if we go T in another direction, say u = 1, 2 , it will be smaller.
6.4.4
Implicitly defined functions of two variables
Suppose that we have a surface whose equation is given by z = f (x, y). We could, of course, write this equation as f (x, y) − z = 0 and, in this form, the equation is now g(x, y, z) = 0 if we take g to be the function of three variables given by g(x, y, z) = f (x, y) − z. Indeed, more generally, we can see that a surface can be given by an equation of the form g(x, y, z) = c where g, a function of three variables, is constrained to take the 11
ˆ This will be important in Of course, if |∇f (a, b)| = 0 we find that fuˆ (a,b) = 0 in all directions, u! Section 7.2.1.
234
6.4. Using partial derivatives
y z=4 z=2
2
(1, 2)T ∇f (1, 1)
1 O
1
2
x
Figure 6.13: The z = 2 and z = 4 contours of the surface z = x2 + y 2 and the directions
∇f (1, 1) and (1, 2)T at the point (1, 1) as discussed in Example 6.27. Observe how the quickest way to get to z = 4 contour from the point (1, 1) on the z = 2 contour is to go in the direction ∇f (1, 1). (The x and y-intercepts of the z = 2 contour have been omitted for clarity.) constant value, c. Sometimes, in such cases, we will be able to rearrange what we are given to explicitly find the equation of the surface in the form z = f (x, y). But, what if we can’t? That is, what if we can only implicitly define the function f (x, y) through the equation g(x, y, z) = c? As we shall see, with minor modifications, we will be able to discuss certain aspects of such a surface using g even if we can’t find f . Tangent planes Technically, a function g : R3 → R defines a hypersurface in R4 whose equation is given by u = g(x, y, z). And, although we can’t visualise such hypersurfaces because they ‘live’ in a four-dimensional space, we can easily extend the theory of this chapter to say things about them. For instance, if we have the point (a, b, c, d) where d = g(a, b, c), it should be clear that the Cartesian equation of the tangent hyperplane to the surface at this point is given by u − d = gx (a, b, c)(x − a) + gy (a, b, c)(y − b) + gz (a, b, c)(z − c), which is the analogue of what we saw in (6.4).12 Indeed, rewriting this as gx (a, b, c)(x − a) + gy (a, b, c)(y − b) + gz (a, b, c)(z − c) − (u − d) = 0, we can see that the vector equation of this tangent hyperplane is gx (a, b, c) x−a gy (a, b, c) y − b gz (a, b, c) · z − c = 0, −1 u−d which is the analogue of (6.5) and the vector gx (a, b, c) gy (a, b, c) gz (a, b, c) , −1 12
We could, of course, re-run the argument given in Section 6.4.1 in this new context but we refrain from doing that here.
235
6
6. Functions of several variables
is therefore one of its normal vectors as we might expect given what we saw before. Here, however, we are interested in a surface in R3 whose equation, for some constant d, is given by g(x, y, z) = d and this is the u = d contour of the corresponding hypersurface in R4 .13 In particular, we want to be able to find the tangent plane to this surface at a point (a, b, c) where g(a, b, c) = d. So, setting u = d in the Cartesian equation of the tangent hyperplane above, we get gx (a, b, c)(x − a) + gy (a, b, c)(y − b) + gz (a, b, c)(z − c) = 0,
(6.8)
and this is the Cartesian equation of the tangent plane we seek. Let’s see how this works in practice. Example 6.29 Following on from Example 6.25, find the Cartesian equation of the tangent plane to the surface z = x2 + y 2 at the point (1, 1, 2) by using the function g(x, y, z) = x2 + y 2 − z. The surface whose equation is z = x2 + y 2 can be represented by the equation g(x, y, z) = 0 with g(x, y, z) = x2 + y 2 − z and, as such, we have
6
gx (x, y, z) = 2x,
gy (x, y, z) = 2y,
gz (x, y, z) = −1.
and
Thus, using the Cartesian equation for the tangent plane at the point (a, b, c) on the surface g(x, y, z) = d in (6.8), i.e. gx (a, b, c)(x − a) + gy (a, b, c)(y − b) + gz (a, b, c)(z − c) = 0, we verify that the point (1, 1, 2) is on the surface as g(1, 1, 2) = 12 + 12 − 2 = 0 and see that 2(1)(x − 1) + 2(1)(y − 1) + (−1)(z − 2) = 0
=⇒
2x + 2y − z = 2,
is the Cartesian equation of the tangent plane to the surface at this point in agreement with what we saw in Example 6.25. But, of course, our real objective here is to see how to find a tangent plane when the function of two variables which gives the surface is only implicitly defined through an equation that involves a function of three variables as in the next example. Example 6.30 Verify that the point (1, 0, π) is on the surface whose equation is x3 + zy 3 + sin z = 1 and find the tangent plane to the surface at that point. The point (1, 0, π) is on the surface as 13 + (π)(03 ) + sin π = 1 + 0 + 0 = 1 and we can write the equation of the surface as g(x, y, z) = 1 with g(x, y, z) = x3 + zy 3 + sin z. As such, we have gx (x, y, z) = 3x2 , 13
gy (x, y, z) = 3zy 2 ,
and
gz (x, y, z) = y 3 + cos z,
In much the same way that a contour of a surface in R3 is a curve in R2 !
236
6.4. Using partial derivatives
and using (6.8), we get 3(12 )(x − 1) + 3(π)(02 )(y − 0) + (03 + cos π)(z − π) = 0
=⇒
3x − z = 3 − π,
as the Cartesian equation of the tangent plane to the surface at this point. Gradient vectors If we now write (6.8) in vector form, we get gx (a, b, c) x−a gy (a, b, c) · y − b = 0, gz (a, b, c) z−c and so we can see that the vector
(6.9)
gx (a, b, c) ∇g(a, b, c) = gy (a, b, c) , gz (a, b, c)
6
is a normal vector to the tangent plane and so it is perpendicular to the surface. Example 6.31 Following on from Example 6.29, find the vector ∇g(1, 1, 2) where g(x, y, z) = x2 + y 2 − z. Show that this vector is perpendicular to the tangent plane to the surface g(x, y, z) = 0 at the point (1, 1, 2) and hence deduce that it is perpendicular to the surface at this point. Here g(x, y, z) = x2 + y 2 − z and so we have gx (a, b, c) 2x ∇g(x, y, z) = gy (a, b, c) = 2y , gz (a, b, c) −1 and, evaluating this at the point (1, 1, 2), we get 2 2 . ∇g(1, 1, 2) = −1
Then, using (6.9), we see that the Cartesian equation of the tangent plane to the surface g(x, y, z) = 0 at this point14 is given by 2 x−1 2 · y − 1 = 0 =⇒ 2(x − 1) + 2(y − 1) − (z − 2) = 0 =⇒ 2x + 2y − z = 2. −1 z−2 Now, for x, y ∈ R, we have points (x, y, z) on this tangent plane given by x x 0 1 0 y = = 0 + x 0 + y 1 , y z −2 + 2x + 2y −2 2 2
237
6. Functions of several variables
and so this plane lies in the directions given by the vectors (1, 0, 2)T and (0, 1, 2)T . But, of course, 1 2 1 2 · 0 = 2 + 0 + (−2) = 0, ∇g(1, 1, 2) · 0 = 2 −1 2 and
0 2 0 ∇g(1, 1, 2) · 1 = 2 · 1 = 0 + 2 + (−2) = 0, 2 −1 2
which means that ∇g(1, 1, 2) is indeed perpendicular to this tangent plane and, in particular, it will be perpendicular to the surface at this point too.
6
In general, given a function g(x, y, z), we call the vector gx (x, y, z) ∇g(x, y, z) = gy (x, y, z) , gz (x, y, z)
the gradient of g and, for a function of three variables, this is the analogue of what we saw in (6.7). Of course, we could then extend what we saw in Section 6.4.3, and use this to find the directional derivatives of a function of three variables. This, in turn, would allow us to see how rapidly this function is changing if we move away from a point in a certain direction and, in particular, it would allow us to find the maximum (or minimum) rate of change of such a function and the direction in which it occurs.
6.4.5
Taylor series
We saw in Section 3.4 that a function, F (t), of one variable has a second-order Taylor series given by F (t) = F (a) + (t − a)F 0 (a) +
(t − a)2 00 F (a) + · · · , 2!
around t = a. Now, we want to derive the corresponding result for a function, f (x, y), of two variables around the point (a, b) and, from what we saw when we considered tangent planes in Section 6.4.1, we should anticipate that the first two terms of this Taylor series will be given by df x−a f (a, b) + , dx (a,b) y − b where the vector
df = fx (x, y), fy (x, y) , dx is the derivative of f (x, y) with respect to x = (x, y). So, our main concern here is what the next term will look like. 14
Note that (x, y, z) = (1, 1, 2) gives g(1, 1, 2) = 12 + 12 − 2 = 0 and so this point is on this surface.
238
6.4. Using partial derivatives
If we want to find the Taylor series for a function, f (x, y), around the point (a, b) we need to see what is happening at some nearby point (x, y). Let’s say that, in terms of a new variable t, these points are related by the equations x = a + ht
and
y = b + kt,
for some appropriately small values of the numbers ht and kt since these points are supposed to be close to one another. Indeed, this means that we can define a new function, F (t), of the single variable, t, given by F (t) = f (x(t), y(t)) where x(t) = a + ht and y(t) = b + kt, where the idea is that F (t) and its derivatives will allow us to use the Maclaurin series for F (t), i.e. t2 F (t) = F (0) + tF 0 (0) + F 00 (0) + · · · , 2! to deduce the corresponding Taylor series for f (x, y). In particular, we can see straightaway that F (0) = f (x(0), y(0)) = f (a, b), which is the first of our anticipated terms. Now we need to find the derivatives F 0 (t) and F 00 (t) to see what the other two terms are. To find F 0 (t), we use the chain rule from Section 6.3.3 to see that F 0 (t) =
∂f dx ∂f dy + = hfx (x(t), y(t)) + kfy (x(t), y(t)). ∂x dt ∂y dt
In particular, this means that F 0 (0) = hfx (x(0), y(0)) + kfy (x(0), y(0)) = hfx (a, b) + kfy (a, b), so we can see that the next term in our Taylor series will be df x−a 0 tF (0) = htfx (a, b) + ktfy (a, b) = (x − a)fx (a, b) + (y − b)fy (a, b) = , dx (a,b) y − b
which is the second of our anticipated terms.
To find the remaining term, we need to find F 00 (t) by differentiating our expression for F 0 (t) with respect to t using the chain rule. This gives us ∂fx dx ∂fx dy ∂fy dx ∂fy dy 00 F (t) = h + +k + ∂x dt ∂y dt ∂x dt ∂y dt = h hfxx (x(t), y(t)) + kfxy (x(t), y(t)) + k hfyx (x(t), y(t)) + kfyy (x(t), y(t)) ∴ F 00 (t) = h2 fxx (x(t), y(t)) + hkfxy (x(t), y(t)) + khfyx (x(t), y(t)) + k 2 fyy (x(t), y(t))
and, in particular, this means that F 00 (0) = h2 fxx (a, b) + hkfxy (a, b) + khfyx (a, b) + k 2 fyy (a, b),
239
6
6. Functions of several variables
so we can see that the next term in our Taylor series will be 1 t2 00 F (0) = (x − a)2 fxx (a, b) + (x − a)(y − b)fxy (a, b) + 2! 2! 2 (y − b)(x − a)fyx (a, b) + (y − b) fyy (a, b) . Indeed, if we now define the second derivative of f (x, y) with respect to x = (x, y) to be the matrix d2 f fxx (x, y) fxy (x, y) = , fyx (x, y) fyy (x, y) dx 2 it is easily verified that we have d2 f t2 00 1 x − a x − a, y − b F (0) = , 2! 2! dx 2 (a,b) y − b as the next term in our Taylor series.
6
Consequently, putting this all together we see that the second-order Taylor series for a function, f (x, y), of two variables around the point (a, b) is given by d2 f df 1 x−a x−a x − a, y − b f (x, y) = f (a, b) + + + ··· , dx (a,b) y − b 2! dx 2 (a,b) y − b
and these terms will be sufficient for our purposes in this course. We will see how this can be used in the next chapter, but for now, we will just use it to find an approximation to a function of two variables around a certain point. Example 6.32 Find the second-order Taylor series of the function f (x, y) = ex cos y around the point (1, 0).
The first term of our second-order Taylor series is simply f (0, 1) = e1 cos 0 = e. We also see that df = fx (x, y), fy (x, y) = ex cos y, − ex sin y , dx which means that df = e1 cos 0, − e1 sin 0 = e, 0 , dx (1,0) and so the second term of our second-order Taylor series is x−1 df x−1 = e, 0 = e(x − 1). y dx (1,0) y − 0 Lastly, we see that
d2 f = dx 2
x fxx (x, y) fxy (x, y) e cos y − ex sin y = , fyx (x, y) fyy (x, y) − ex sin y − ex cos y
which means that
240
1 d2 f e cos 0 − e1 sin 0 e 0 = = , − e1 sin 0 − e1 cos 0 0 −e dx 2 (1,0)
6.4. Learning outcomes
and so the third term of our second-order Taylor series is d2 f e 0 1 1 x − 1 x − 1 x − 1, y − 0 x − 1, y = 0 −e y 2! dx 2 (1,0) y − 0 2! e(x − 1) 1 x − 1, y = −ey 2! 1 2 2 = e(x − 1) − e y . 2! Consequently, putting this all together, we find that 1 2 2 f (x, y) ' e + e(x − 1) + e(x − 1) − e y , 2! is the second-order Taylor series of f (x, y) = ex cos y around the point (1, 0).
Activity 6.20 Find an approximation to e1.1 cos 0.2 by using the second-order Taylor series that we found in Example 6.32.
Activity 6.21 Find the second-order Taylor series in the previous example by using the Taylor series for ex about x = 1 (see Example 3.31) and the Maclaurin series for cos y (see Section 3.4.1).
Learning outcomes At the end of this chapter and having completed the relevant reading and activities, you should be able to: visualise a surface by using sections and contours; find partial derivatives; use the chain rule to find derivatives of various kinds; show that a function is homogeneous and verify Euler’s theorem; solve problems from economics-based subjects that involve partial derivatives; find tangent planes and gradient vectors; find directional derivatives and interpret what you have found; find Taylor series and use these to approximate functions of two variables.
241
6
6. Functions of several variables
Solutions to activities Solution to activity 6.1 To find the contours of the surface z = 4x + 2y − 2 when we have the given values of z, we note that: For z = −10, the curve of intersection is given by −10 = 4x + 2y − 2 which gives us y = −2x − 4. For z = 0, the curve of intersection is given by 0 = 4x + 2y − 2 which gives us y = −2x + 1. For z = 10, the curve of intersection is given by 10 = 4x + 2y − 2 which gives us y = −2x + 6.
Thus, we see from these equations that all three of the contours are straight lines. The sketch of these contours in the (x, y)-plane is illustrated in Figure 6.14. z =
6
10 z
y
= 0
6
z = − 10
1 O
−2
1 2
3 x
−4 Figure 6.14: A sketch of the z = −10, z = 0 and z = 10 contours of the surface z = 4x + 2y − 2 in the (x, y)-plane for Activity 6.1.
Solution to activity 6.2 To find the z = −25 contour of the surface z = −x2 − y 2 we need to find the curve of intersection which, in this case, is simply −x2 − y 2 = −25
=⇒
x2 + y 2 = 25.
This is the equation of a circle, centred on the origin, with a radius of five. To find the z = c contours in the three cases indicated we just need to find out what the curve −x2 − y 2 = c =⇒ x2 + y 2 = −c, looks like in the three cases. So, we have:
If c > 0, there are no contours as we have −c < 0 and we know that x2 + y 2 ≥ 0 for all values of x and y.
242
6.4. Solutions to activities
If c = 0, the contour is the point (0, 0) as this is the only solution to the equation x2 + y 2 = 0. √ If c < 0, the contour is a circle, centred on the origin, with a radius of −c as we have −c > 0.
In particular, notice that z = 0 is the smallest value of z that arises from a point on this surface. Solution to activity 6.3 To find these sections of the surface z = 4x + 2y − 2 we need to find the curves of intersection, which in this case, are given by: For the (x, z)-section, we have y = 0 and so the curve of intersection is given by z = 4x − 2 and this is a straight line in the (x, z)-plane. For the (y, z)-section, we have x = 0 and so the curve of intersection is given by z = 2y − 2 and this is a straight line in the (y, z)-plane.
These sections are illustrated in Figure 6.15.
z
6 z
z = 4x − 2 O
1 2
z = 2y − 2 x
−2
O
1
y
−2
(a)
(b)
Figure 6.15: A sketch of the (a) (x, z)-section and (b) the (y, z)-section of the surface
z = 4x + 2y − 2 for Activity 6.3. Solution to activity 6.4 To find these sections of the surface z = −x2 − y 2 we need to find the curves of intersection, which in this case, are given by: For the (x, z)-section, we have y = 0 and so the curve of intersection is given by z = −x2 and this is a parabola in the (x, z)-plane. For the (y, z)-section, we have x = 0 and so the curve of intersection is given by z = −y 2 and this is a parabola in the (y, z)-plane.
These sections are illustrated in Figure 6.16.
243
6. Functions of several variables
z
z
O
O
x z=
−x2
y z=
(a)
−y 2
(b)
Figure 6.16: A sketch of (a) the (x, z)-section and (b) the (y, z)-section of the surface
z = −x2 − y 2 for Activity 6.4. Solution to activity 6.5 To find these sections of the surface z = x − y + 4 we need to find the curves of intersection, which in this case, are given by:
6
For the x = 0 section, we have x = 0 and so the curve of intersection is given by z = −y + 4 and this is a straight line in the (y, z)-plane. Of course, this is just the (y, z)-section we found in Example 6.3! For the x = 2 section, we have x = 2 and so the curve of intersection is given by z = 2 − y + 4 = −y + 6 and this is a straight line. For the x = 4 section, we have x = 4 and so the curve of intersection is given by z = 4 − y + 4 = −y + 8 and this is a straight line.
Observe that only the first of these sections ‘lives’ in the (y, z)-plane but, as illustrated in Figure 6.17, we can also sketch the other two in this plane to get a feel for how the surface is changing when we look at the sections x = c for different values of c. z 8 6 4
4 = 2 x = 0 x = x O
4
6
8 y
Figure 6.17: The x = 0, x = 2 and x = 4 sections of the surface z = 4x + 2y − 2 for
Activity 6.5.
244
6.4. Solutions to activities
Solution to activity 6.6 To find the y = −2, 0, 2 sections of the surface z = 4x + 2y − 2 we need to find the curves of intersection, which in this case, are given by: For the y = −2 section, we have y = −2 and so the curve of intersection is given by z = 4x − 4 − 2 = 4x − 6 and this is a straight line. For the y = 0 section, we have y = 0 and so the curve of intersection is given by z = 4x − 2 and this is a straight line in the (y, z)-plane. Of course, this is just the (x, z)-section we found in Activity 6.3 and it is the only one that ‘lives’ in the (x, z)-plane! For the y = 2 section, we have y = 2 and so the curve of intersection is given by z = 4x + 4 − 2 = 4x + 2 and this is a straight line.
These sections are illustrated in Figure 6.18(a).
Similarly, to find the x = −2, 0, 2 sections of the surface z = 4x + 2y − 2 we need to find the curves of intersection, which in this case, are given by: For the x = −2 section, we have x = −2 and so the curve of intersection is given by z = −8 + 2y − 2 = 2y − 10 and this is a straight line. For the x = 0 section, we have x = 0 and so the curve of intersection is given by z = 2y − 2 and this is a straight line in the (y, z)-plane. Of course, this is just the (y, z)-section we found in Activity 6.3 and it is the only one that ‘lives’ in the (y, z)-plane! For the x = 2 section, we have x = 2 and so the curve of intersection is given by z = 8 + 2y − 2 = 2y + 6 and this is a straight line.
These sections are illustrated in Figure 6.18(b).
2 −
2
=
0
= x
6
x
=
x
2
z
y=2 y=0 y=− 2
z
O − 12 12 −2
3 2
x
−3
O 1 −2
5
y
−10
−6
(a)
(b)
Figure 6.18: A sketch of (a) the y = −2, 0, 2 sections and (b) the x = −2, 0, 2 sections of
the surface z = 4x + 2y − 2 for Activity 6.6.
245
6
6. Functions of several variables
Solution to activity 6.7 To find these sections of the surface z = x2 + y 2 we need to find the curves of intersection, which in this case, are given by: For the y = 0 section, we have y = 0 and so the curve of intersection is given by z = x2 and this is a parabola in the (x, z)-plane. Of course, this is just the (x, z)-section we found in Example 6.4! For the y = 1 section, we have y = 1 and so the curve of intersection is given by z = x2 + 1 and this is a parabola. For the y = 2 section, we have y = 2 and so the curve of intersection is given by z = x2 + 4 and this is a parabola. Observe that only the first of these sections ‘lives’ in the (x, z)-plane but, as illustrated in Figure 6.9, we can also sketch the other two in this plane to get a feel for how the surface is changing when we look at the sections y = c for different values of c.
6
z
y=2
4 y=1 1 O
y=0 x
Figure 6.19: The y = 0, y = 1 and y = 2 sections of the surface z = x2 +y 2 for Activity 6.7.
Solution to activity 6.8 To find the y = 0, 1, 2 sections of the surface z = −x2 − y 2 we need to find the curves of intersection, which in this case, are given by: For the y = 0 section, we have y = 0 and so the curve of intersection is given by z = −x2 and this is a parabola in the (y, z)-plane. Of course, this is just the (x, z)-section we found in Activity 6.4 and it is the only one that ‘lives’ in the (x, z)-plane! For the y = 1 section, we have y = 1 and so the curve of intersection is given by z = −x2 − 1 and this is a parabola. For the y = 2 section, we have y = 2 and so the curve of intersection is given by z = −x2 − 4 and this is a parabola.
These sections are illustrated in Figure 6.20(a).
Similarly, to find the x = 0, 1, 2 sections of the surface z = −x2 − y 2 we need to find the curves of intersection, which in this case, are given by:
246
6.4. Solutions to activities
For the x = 0 section, we have x = 0 and so the curve of intersection is given by z = −y 2 and this is a parabola in the (y, z)-plane. Of course, this is just the (y, z)-section we found in Activity 6.4 and it is the only one that ‘lives’ in the (y, z)-plane! For the x = 1 section, we have x = 1 and so the curve of intersection is given by z = −1 − y 2 and this is a parabola. For the x = 2 section, we have x = 2 and so the curve of intersection is given by z = −4 − y 2 and this is a parabola.
These sections are illustrated in Figure 6.20(b). z
z
O −1
y=0
x
O −1
y=1 −4
x=0
y
x=1
6
−4 y=2
(a)
x=2
(b)
Figure 6.20: A sketch of (a) the y = 0, 1, 2 sections and (b) the x = 0, 1, 2 sections of the
surface z = −x2 − y 2 for Activity 6.8. Solution to activity 6.9 The partial derivative of f (x, y) with respect to y, i.e. the result of differentiating f (x, y) with respect to y whilst holding x constant, is going to be another function of x and y. This function of x and y is what is denoted by the symbols in (6.2). What does this partial derivative mean? In effect, what we have done when we consider the function f (x, y) for some fixed value of x, say x0 , is to look at the section of the curve z = f (x, y) we get when x = x0 , i.e. the section given by the equation z = f (x0 , y) which lies in a plane that has x = x0 and is parallel to the (y, z)-plane. Then, when we differentiate f (x0 , y) with respect to y, we are finding the gradient of this section, i.e. it tells us how z = f (x0 , y) is varying with y. Consequently, this partial derivative is telling us something about the gradient of the surface when we are at the point (x0 , y) and we are ‘looking’ in the y-direction. Solution to activity 6.10 Given the function f (x, y) = 2x + x3 y −
y3 x y3 + = 2x + x3 y − xy −1 + , y 2 2
we hold y constant and differentiate with respect to x, to see that ∂f 1 = 2 + 3x2 y − y −1 = 2 + 3x2 y − , ∂x y
247
6. Functions of several variables
and we hold x constant and differentiate with respect to y, to see that ∂f 3 x 3 = x3 + xy −2 + y 2 = x3 + 2 + y 2 . ∂y 2 y 2 These are the sought after partial derivatives fx (x, y) and fy (x, y) respectively. Solution to activity 6.11 Given the function f (x, y) =
p
x2 + y 2 = (x2 + y 2 )1/2 ,
we hold y constant and differentiate with respect to x using the chain rule to get ∂f 1 x = (x2 + y 2 )−1/2 (2x) = p , ∂x 2 x2 + y 2
and we hold x constant and differentiate with respect to y using the chain rule to get ∂f 1 y = (x2 + y 2 )−1/2 (2y) = p . ∂y 2 x2 + y 2
6
These are the sought after partial derivatives fx (x, y) and fy (x, y) respectively. Solution to activity 6.12 Here f (x, y) = x2 y, x(t) = 2 + 3t and y(t) = t2 + 1. In this case, if we again let F (t) = f (x(t), y(t)), the chain rule states that ∂f dx ∂f dy dF = + . dt ∂x dt ∂y dt As such, using this, we can see that dF = (2xy)(3) + (x2 )(2t) = 2x(3y + xt), dt and so, substituting our expressions for x(t) and y(t), we get dF = 2(2 + 3t)[3(t2 + 1) + (2 + 3t)t] = 2(2 + 3t)(6t2 + 2t + 3). dt To check this, we note that F (t) = f (x(t), y(t)) = (2 + 3t)2 (t2 + 1), which, using the product and chain rules, gives us dF = [2(2 + 3t)(3)](t2 + 1) + (2 + 3t)2 (2t) = 2(2 + 3t)[3(t2 + 1) + t(2 + 3t)], dt and this agrees with our earlier answer.
248
6.4. Solutions to activities
Solution to activity 6.13 We have a function, y(x), which is defined implicitly by the equation x2 + 2xy + 3y 3 = 6, and we notice that, at the point (x, y) = (1, 1) we have (1)2 + 2(1)(1) + 3(1)3 = 6, and so this point does indeed satisfy the equation. To find its derivative at this point we note that we have g(x, y) = c where g(x, y) = x2 + 2xy + 3y 3 and we use the fact that
dy ∂g =− dx ∂x
to get
and
c = 6,
∂g , ∂y
dy 2x + 2y 2(x + y) =− =− , 2 dx 2x + 9y 2x + 9y 2
6
as long as 2x + 9y 2 6= 0. And, clearly, at the point (1, 1), this gives us dy 4 2(1 + 1) =− , =− dx (1,1) 2+9 11
as the value of the derivative. Solution to activity 6.14
We have G(k, l) = g(x(k, l), y(k, l)) and we want to explain why the chain rule formula for Gl (k, l) works. To do this, consider that if we change l by a small amount, ∆l, whilst holding k constant, the corresponding change in G(k, l) is given by ∆G '
∂G ∆l, ∂l
but here, there are two ways in which G(k, l) = g(x(k, l), y(k, l)) can change with l. Firstly, G can change with l because g changes with x and x changes with l, let’s denote this change in G by ∆x G. In this case, we have ∆x G '
∂g ∆x, ∂x
as we are holding y constant to see how F changes with x and this means that ∆x G '
∂g ∂x ∆l, ∂x ∂l
as the change in x, ∆x, is related to a change in l with k held constant by ∆x ' xl (k, l)∆l.
249
6. Functions of several variables
Secondly, G can change with l because g changes with y and y changes with l, let’s denote this change in G by ∆y G. In this case, we have ∆y G '
∂g ∆y, ∂y
as we are holding x constant to see how F changes with y and this means that ∆y G '
∂g ∂y ∆l, ∂y ∂l
as the change in y, ∆y, is related to a change in l with k held constant by ∆y ' yl (k, l)∆l.
Thus, as the total change in F due to these two changes is given by ∆G = ∆x G + ∆y G '
∂g ∂x ∂g ∂y ∆l + ∆l, ∂x ∂l ∂y ∂l
we can now equate our two expressions for ∆G and divide through by ∆l to get the chain rule for Gl (k, l) which we wanted.
6
Solution to activity 6.15 To see why the formula for zy (x, y) works, we consider that if we knew the function, z(x, y), that satisfied the equation g(x, y, z) = c, we could find a new function, G(x, y), of x and y only which is given by G(x, y) = g(x, y, z(x, y)). Then using the chain rule, we have ∂g dy ∂g ∂z ∂G = + . ∂y ∂y dy ∂z ∂y But, G(x, y) = c where c is a constant and so we also have ∂G =0 ∂x
as well as
dy = 1, dy
which means that we are left with 0=
∂g ∂g ∂z + . ∂y ∂z ∂y
Rearranging this then gives us the formula we require that, i.e. ∂z ∂g/∂y =− , ∂y ∂g/∂z as long as gz (x, y, z) 6= 0. Solution to activity 6.16 We have a function, q(k, l), which is defined implicitly by the equation q 3 k + k 3 l + qk 2 l = 3, and we want to find its partial derivatives with respect to k and l. To do this, we rewrite the equation as g(q, k, l) = c so that we have, say, g(q, k, l) = q 3 k + k 3 l + qk 2 l
250
and
c = 3,
6.4. Solutions to activities
and use the formulas ∂q ∂g =− ∂k ∂k
∂g ∂q
∂q ∂g =− ∂l ∂l
and
∂g , ∂q
to see that the partial derivatives are ∂q q 3 + 3k 2 l + 2qkl =− ∂k 3q 2 k + k 2 l
and
∂q k 3 + qk 2 =− 2 , ∂l 3q k + k 2 l
provided that 3q 2 k + k 2 l 6= 0.15 Now, to evaluate these partial derivatives at the point where (k, l) = (1, 1), we need to find the corresponding value of q. This can be done by noting that, when we have k = 1 and l = 1, the equation becomes q 3 + q − 2 = 0, and, using the hint, we see that this equation can be written as (q − 1)(q 2 + q + 2) = 0. Indeed, since
2 1 7 q +q+2= q+ + > 0, 2 4 for all q ∈ R, we see that q = 1 is the only solution to this equation. Thus, the point we are interested in has coordinates (k, l, q) = (1, 1, 1) and, at this point, we have 2
1+3+2 6 3 ∂q =− =− =− ∂k 3+1 4 2
and
∂q 1+1 2 1 =− =− =− , ∂l 3+1 4 2
as the values of the partial derivatives at this point. Solution to activity 6.17 In Activity 6.10, we saw that the function f (x, y) = 2x + x3 y −
y3 x y3 + = 2x + x3 y − xy −1 + , y 2 2
had partial derivatives given by ∂f = 2 + 3x2 y − y −1 ∂x
and
∂f 3 = x3 + xy −2 + y 2 . ∂y 2
So, partially differentiating fx (x, y) with respect to x and y respectively, we get fxx (x, y) = 6xy
and fxy (x, y) = 3x2 + y −2 = 3x2 +
1 , y2
whereas, partially differentiating fy (x, y) with respect to x and y respectively, we get fyx (x, y) = 3x2 + y −2 = 3x2 +
1 y2
and fyy (x, y) = −2xy −3 + 3y = −2
x + 3y. y3
Notice that fxy = fyx as we should expect in this course. 15
Notice that, in particular, we can never have k = 0 here as this does not satisfy the equation q k + k 3 l + qk 2 l = 3. 3
251
6
6. Functions of several variables
Solution to activity 6.18 Given the function f (x, y) = x3/4 y 1/4 , we partially differentiate with respect to x and y respectively to get 3 fx (x, y) = x−1/4 y 1/4 4
1 and fy (x, y) = x3/4 y −3/4 , 4
as the first-order partial derivatives. Then, for the second-order partial derivatives, we note that partially differentiating fx (x, y) with respect to x and y respectively, we get fxx (x, y) = −
3 −5/4 1/4 x y 16
and fxy (x, y) =
3 −1/4 −3/4 x y , 16
whereas, partially differentiating fy (x, y) with respect to x and y respectively, we get fyx (x, y) =
3 −1/4 −3/4 x y 16
and fyy (x, y) = −
3 3/4 −7/4 x y . 16
Notice that fxy = fyx as we should expect in this course.
6
Solution to activity 6.19 In Activity 6.11, we saw that the function p f (x, y) = x2 + y 2 = (x2 + y 2 )1/2 , had partial derivatives given by
∂f = x(x2 + y 2 )−1/2 ∂x
and
∂f = y(x2 + y 2 )−1/2 . ∂y
So, partially differentiating fx (x, y) with respect to x using the product and chain rules we get 1 2 (x2 + y 2 ) − x2 y2 2 2 −1/2 2 −3/2 fxx (x, y) = (1)(x +y ) +(x) − (x + y ) (2x) = = , 2 (x2 + y 2 )3/2 (x2 + y 2 )3/2 and partially differentiating fx (x, y) with respect to y using the chain rule we get xy 1 2 2 −3/2 fxy (x, y) = x − (x + y ) (2y) = − 2 . 2 (x + y 2 )3/2 Similarly, partially differentiating fy (x, y) with respect to x using the chain rule we get 1 2 xy 2 −3/2 fyx (x, y) = y − (x + y ) (2x) = − 2 . 2 (x + y 2 )3/2 and partially differentiating fy (x, y) with respect to y using the product and chain rules we get 1 2 (x2 + y 2 ) − y 2 x2 2 2 −1/2 2 −3/2 fyy (x, y) = (1)(x + y ) + (y) − (x + y ) (2y) = = . 2 (x2 + y 2 )3/2 (x2 + y 2 )3/2 Notice that fxy = fyx as we should expect in this course.
252
6.4. Exercises
Solution to activity 6.20 To find an approximation to e1.1 cos 0.2 using the second-order Taylor series in Example 6.32, we have 1 2 2 1.1 e(1.1 − 1) − e(0.2) = 1.085 e, e cos 0.2 ' e + e(1.1 − 1) + 2! and, using the value of e, we find that e1.1 cos 0.2 ' 2.949 to 3dp. Indeed, as the point (1.1, 0.2) is close to the point (1, 0) we expect this to be a good approximation. Of course, the exact value of e1.1 cos 0.2 is 2.944 to 3dp and so we can see that our approximation agrees with this to 1dp. Solution to activity 6.21 As we saw in Example 3.31, the second-order Taylor series for ex around x = 1 is ex ' e +(x − 1) e +
(x − 1)2 e, 2!
and as we saw in Section 3.4.1, the second-order Maclaurin series (i.e. the Taylor series around y = 0) of cos y is y2 cos y ' 1 − . 2! This means that, around the point (1, 0), we would have (x − 1)2 y2 x e cos y ' e +(x − 1) e + e 1− , 2! 2! and, multiplying out the brackets and discarding terms which are more than second-order in (x − 1) and y since these are small around the point (1, 0), we get ex cos y ' e +(x − 1) e +
y2 (x − 1)2 e−e , 2! 2!
which is the same as what we found in Example 6.32.
Exercises Exercise 6.1 Find the first and second-order partial derivatives of the function f (x, y) = 2xy + x2a y a , where a is a constant. If this function satisfies the equation x2
2 ∂ 2f 2∂ f − 2y − 18f (x, y) + 36xy = 0, ∂x2 ∂y 2
find all possible values of a.
253
6
6. Functions of several variables
Exercise 6.2 For some numbers α, β and γ, a function, f , takes the form x2α + y β f (x, y) = 2 . x + yγ If f is homogeneous of degree four, find the values of α, β and γ. Having found these values, verify that the function satisfies Euler’s theorem. Exercise 6.3 Suppose that R(p, q) = eq+p and that p is a positive function of q defined implicitly by the equation q 2 p + p2 q + qp = 3. Given that r(q) = R(q, p(q)), use the chain rule to find its derivative, r0 (q), when q = 1. Exercise 6.4
6
A function f : R2 → R is defined by f (x, y) = x2 − 2y 2 , and the point P has coordinates (1, −1). (a) Find the direction and rate at which f increases most rapidly at P . (b) Find the rate of change of f at P in the direction (1, 1)T . (c) Verify that the point P is on the curve x2 − 2y 2 = −1, and find the Cartesian equation of the tangent line to this curve at this point. Exercise 6.5 A function f : R3 → R is defined by f (x, y, z) = ln(xy + z). (a) Find the gradient of f at the point (a, b, c). (b) Verify that the point (1, 1, 0) is on the surface ln(xy + z) = 0, and find the normal vector and the tangent plane to the surface at this point. (c) Consider the points, (x, y, z), at which the rate of increase of f in the direction (x/2.y/2, z)T is equal to two. Show that all of these points lie on the surface with equation x2 + y 2 + 4z 2 = 1.
254
6.4. Solutions to exercises
Solutions to exercises Solution to exercise 6.1 Given that f (x, y) = 2xy + x2a y a where a is a constant, its first and second-order partial derivatives are given by ∂f = 2y +2ax2a−1 y a ∂x
=⇒
∂ 2f = 2a(2a−1)x2a−2 y a ∂x2
=⇒
∂ 2f = a(a − 1)x2a y a−2 ∂y 2
and
∂ 2f = 2+2a2 x2a−1 y a−1 , ∂y∂x
and ∂f = 2x + ax2a y a−1 ∂y
and
∂ 2f = 2 + 2a2 x2a−1 y a−1 . ∂x∂y
Observe, in particular, that fxy (x, y) = fyx (x, y) as we should expect in this course. If this function satisfies the equation x2
2 ∂ 2f 2∂ f − 2y − 18f (x, y) + 36xy = 0, ∂x2 ∂y 2
we can substitute in the relevant terms to see that we must have 2 2a−2 a 2 2a a−2 2a a x 2a(2a − 1)x y − 2y a(a − 1)x y − 18 2xy + x y + 36xy = 0, which can be tidied up to give us 2a(2a − 1)x2a y a − 2a(a − 1)x2a y a − 36xy − 18x2a y a + 36xy = 0, and, after further simplification, we get 2(a2 − 9)x2a y a = 0. Consequently, as x, y ∈ R, we must have a2 = 9 which means that a = ±3 are the possible values of a if f has to satisfy the given equation. Solution to exercise 6.2 For the function
x2α + y β , x2 + y γ to be homogeneous of degree four for some numbers α, β and γ, we require that f (x, y) =
f (λx, λy) =
(λx)2α + (λy)β , (λx)2 + (λy)γ
is equal to λ4 f (x, y). But, in order for this to happen, we must find that the numerator is homogeneous, i.e. we have 2α = β so that (λx)2α + (λy)β = (λx)β + (λy)β = λβ (xβ + y β ), giving us a numerator whose degree of homogeneity is β = 2α.
255
6
6. Functions of several variables
denominator is homogeneous, i.e. we have γ = 2 so that (λx)2 + (λy)γ = (λx)2 + (λy)2 = λ2 (x2 + y 2 ), giving us a denominator whose degree of homogeneity is γ = 2. overall degree of homogeneity is four, i.e. we must find that β β (λx)2α + (λy)β λβ (xβ + y β ) β−2 x + y =λ = 2 2 = λβ−2 f (x, y), 2 γ 2 2 2 (λx) + (λy) λ (x + y ) x +y
is equal to λ4 f (x, y). That is, we must have β − 2 = 4 so that β = 6.
Consequently, we find that α = 3 (since 2α = β), β = 6 and γ = 2 so that our sought after homogeneous function is x6 + y 6 f (x, y) = 2 . x + y2 To verify that Euler’s theorem holds for this function, we need to show that x
6
∂f ∂f +y = 4f (x, y). ∂x ∂y
To do this, we use the quotient rule to see that (6x5 )(x2 + y 2 ) − (x6 + y 6 )(2x) ∂f = ∂x (x2 + y 2 )2
and
∂f (6y 5 )(x2 + y 2 ) − (x6 + y 6 )(2y) = , ∂y (x2 + y 2 )2
which means that we have 5 2 ∂f ∂f (6x5 )(x2 + y 2 ) − (x6 + y 6 )(2x) (6y )(x + y 2 ) − (x6 + y 6 )(2y) x +y =x +y ∂x ∂y (x2 + y 2 )2 (x2 + y 2 )2 =
6x6 (x2 + y 2 ) − 2x2 (x6 + y 6 ) + 6y 6 (x2 + y 2 ) − 2y 2 (x6 + y 6 ) (x2 + y 2 )2
=
6(x6 + y 6 )(x2 + y 2 ) − 2(x2 + y 2 )(x6 + y 6 ) (x2 + y 2 )2
4(x6 + y 6 ) x2 + y 2 = 4f (x, y), =
as required. Solution to exercise 6.3 Given that, r(q) = R(q, p(q)), the chain rule tells us that dr ∂R dq ∂R dp ∂R ∂R dp = + = + , dq ∂q dq ∂p dq ∂q ∂p dq and so, as R(q, p) = eq+p , we have dr dp q+p q+p dp q+p = e +e =e 1+ . dq dq dq
256
6.4. Solutions to exercises
Now we need to calculate p0 (q) given that p = p(q) is defined through the equation q 2 p + p2 q + qp = 3. To do this, we let G(q, p) be the function defined by G(q, p) = q 2 p + p2 q + qp, so that the given equation is now G(q, p) = 3. With this, we then have dp ∂G ∂G =− , dq ∂q ∂p where
∂G = 2qp + p2 + p and ∂q
∂G = q 2 + 2pq + q, ∂p
which gives us dp 2qp + p2 + p =− 2 , dq q + 2pq + q
6
provided that q 2 + 2pq + q 6= 0. To take stock, so far, we have found that dr dp q+p =e 1+ and dq dq
2qp + p2 + p dp =− 2 , dq q + 2pq + q
and we need to evaluate this at the point where q = 1. In particular, we now need to find the value of p that corresponds to q = 1 if p = p(q) is the positive function of q defined implicitly by the equation q 2 p + p2 q + qp = 3. That is, if we set q = 1 in this equation we get p + p2 + p = 3
=⇒
p2 + 2p − 3 = 0
=⇒
(p + 3)(p − 1) = 0,
i.e. the possible values of p are −3 and 1. But, we are told that p is a positive function of q and so we reject p = −3 and take the point where q = 1 and p = 1 to be the one we are interested in. Then, at this point, we find that 2+1+1 dp =− = −1 dq 1+2+1
=⇒
dr = e1+1 (1 + [−1]) = 0, dq
i.e. r0 (q) = 0 when q = 1. Solution to exercise 6.4 For (a), the function f (x, y) = x2 − 2y 2 has a gradient vector given by 2x ∇f = , −4y
257
6. Functions of several variables
and so at the point P , i.e. (1, −1), we have
2 ∇f (1, −1) = , 4
and this is the direction in which f is increasing most rapidly at P . We then find that √ √ |∇f (1, −1)| = 4 + 16 = 20, is the rate of change of f in this direction and so this is the rate at which f increases most rapidly. For (b), a unit vector in the direction v = (1, 1)T is vˆ = ( √12 , √12 )T and so √ 1 1 1 6 2 = √ (2 + 4) = √ = 3 2, fvˆ (1, −1) = vˆ · ∇f (1, −1) = ·√ 4 2 1 2 2 is the rate of change of f at P in the direction v .
6
For (c), the point P is on the curve as 12 − 2(−1)2 = 1 − 2 = −1. To find the equation of the tangent line to the curve at this point, we use (6.6), to see that x−1 2 x−1 ∇f (1, −1) · = 0 =⇒ · = 0 =⇒ 2(x − 1) + 4(y + 1) = 0, y+1 4 y+1 i.e. x + 2y = −1 is the Cartesian equation of the tangent line to the curve at P . Solution to exercise 6.5 For (a), given the function f (x, y, z) = ln(xy + z) we have y fx y/(xy + z) 1 , x f x/(xy + z) ∇f (x, y, z) = = = y xy + z 1 fz 1/(xy + z) and so the gradient vector is
at the point (a, b, c).
b 1 a , ∇f (a, b, c) = ab + c 1
For (b), we see that the point (1, 1, 0) is on the surface as ln([1][1] + 0) = ln 1 = 0 and the normal vector to the surface at this point is 1 1 1 1 ∇f (1, 1, 0) = = 1 . (1)(1) + 0 1 1
Then, using (6.8), we have x−1 1 x−1 ∇f (1, 1, 0)· y − 1 = 0 =⇒ 1 · y − 1 = 0 =⇒ 1(x−1)+1(y−1)+1(z−0) = 0, z−0 1 z−0
258
6.4. Solutions to exercises
i.e. x + y + z = 2 is the Cartesian equation of the tangent plane to surface at the point (1, 1, 0). For (c), we note that at all points, (x, y, z), we have y 1 x , ∇f (x, y, z) = xy + z 1
and that the direction v = (x/2, y/2, z)T can be written as x 1 y , v= 2 2z
which means that a unit vector in this direction is given by x 1 y . vˆ = p 2 2 2 x + y + 4z 2z
6
The rate of increase of f in the direction of the unit vector vˆ at a point (x, y, z) is then given by fvˆ (x, y, z), i.e. we have vˆ · ∇f (x, y, z) =
xy + xy + 2z 2(xy + z) 2 p p = =p , (xy + z) x2 + y 2 + 4z 2 (xy + z) x2 + y 2 + 4z 2 x2 + y 2 + 4z 2
where we have just found the dot product of the two vectors vˆ and ∇f (x, y, z). Consequently, when fvˆ (x, y, z) = 2, we have points (x, y, z) that satisfy the equation, 2
2= p
x2
as required.
+
y2
+
4z 2
=⇒
x2 + y 2 + 4z 2 = 1,
259
6. Functions of several variables
6
260
Chapter 7 Two-variable optimisation Essential reading (For full publication details, see Chapter 1.) Binmore and Davies (2002) Sections 4.6, 4.7, 6.3–6.8. Anthony and Biggs (1996) Chapter 13, parts of Chapters 14 and 21. Further reading Simon and Blume (1994) parts of Chapter 17, 18 and 19. Adams and Essex (2010) parts of Sections 13.1–13.3.
7
Aims and objectives The objectives of this chapter are as follows. To use partial derivatives to solve problems where a function needs to be optimised. To solve problems where a function needs to be optimised subject to a constraint. Specific learning outcomes can be found near the end of this chapter.
7.1
Introduction
Having seen how to find partial derivatives and gained some insight into what they tell us about a function of two variables in the last chapter, we now see how they can be used to optimise such a function. In particular, we will see how the first-order partial derivatives allow us to find the stationary points of a function and its second-order partial derivatives allow us to see whether such a point is a maximum or a minimum. We will also see how to optimise a function of two variables in cases where the variables are constrained, i.e. they are required to satisfy some extra condition known as a constraint.
7.2
Unconstrained optimisation
We start by considering unconstrained optimisation, i.e. we are looking for the places where a function of two variables, f (x, y), attains its maximum or minimum values when x and y are independent and free to take any values in R2 .
261
7. Two-variable optimisation
7.2.1
Stationary points
Suppose we have a surface z = f (x, y) whose tangent plane at the point (a, b, c) where c = f (a, b) is given by (6.4), i.e. z − c = fx (a, b)(x − a) + fy (a, b)(y − b). We define a stationary point of this function to be any point where the tangent plane to the function is horizontal and so, in this case, the tangent plane would have to be z = c. But, if this is the case, it means that we must have fx (a, b)(x − a) + fy (a, b)(y − b) = 0, for all x, y ∈ R which, in turn, means that we must have fx (a, b) = 0
and
fy (a, b) = 0.
Thus, we find that the point (x, y) = (a, b) is a stationary point of the function f (x, y) if both first-order partial derivatives of the function are zero at that point. Consequently, in order to find the stationary points of a function, f (x, y), we must find all points (x, y) that satisfy the equations
7
fx (x, y) = 0
and
fy (x, y) = 0,
simultaneously. Example 7.1 Find the stationary points of the function f (x, y) = x4 + 2x2 y + 2y 2 + y. The first-order partial derivatives of this function are fx (x, y) = 4x3 + 4xy
and
fy (x, y) = 2x2 + 4y + 1.
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous equations 4x3 + 4xy = 0
and
2x2 + 4y + 1 = 0.
If we start by looking at the first equation, this gives us 4x3 + 4xy = 0
=⇒
4x(x2 + y) = 0
=⇒
x = 0 or y = −x2 .
And so, to satisfy the second equation with: x = 0 we must have 2(0)2 + 4y + 1 = 0 i.e. (0, −1/4) is a stationary point.
262
=⇒
1 y=− , 4
7.2. Unconstrained optimisation
y = −x2 we must have 2x2 + 4(−x2 ) + 1 = 0
2x2 = 1
=⇒
=⇒
x2 =
1 2
=⇒
1 x = ±√ , 2
which in turn gives us
2 1 1 =− , y = − ±√ 2 2 √ √ i.e. (1/ 2, −1/2) and (−1/ 2, −1/2) are stationary points.
Consequently, the points 1 , 0, − 4
1 1 √ ,− 2 2
1 1 −√ , − , 2 2
and
are stationary points of this function. Example 7.2 Find the stationary points of the function f (x, y) = 4x3 − 60xy + 5y 2 + 400y − 35. The first-order partial derivatives of this function are fx (x, y) = 12x2 − 60y
fy (x, y) = −60x + 10y + 400.
and
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous equations 12x2 − 60y = 0
− 60x + 10y + 400 = 0.
and
We start by simplifying these equations to get, x2 − 5y = 0
− 6x + y + 40 = 0,
and
and then notice that the first equation gives us y = x2 /5. Substituting this into the second equation then allows us to see that −6x+
x2 +40 = 0 =⇒ x2 −30x+200 = 0 =⇒ (x−20)(x−10) = 0 =⇒ x = 10 or x = 20, 5
and, since y = x2 /5, we have y=
102 = 20 5
or
y=
202 = 80, 5
respectively. Thus, this function has two stationary points, namely the points (10, 20) and (20, 80). Activity 7.1 Find the stationary points of the function f (x, y) = x2 − 4x + y 2 + 4y + 8.
263
7
7. Two-variable optimisation
Activity 7.2
Find the stationary points of the function f (x, y) = 3x3 + 9x2 − 72x + 2y 3 − 12y 2 − 126y + 19.
In particular, notice that at a stationary point, i.e. at a point, (a, b), where fx (a, b) = 0
and
fy (a, b) = 0,
we can see that the gradient vector of the function becomes fx (a, b) 0 ∇f (a, b) = = = 0. fy (a, b) 0 That is, if we are at a stationary point, we can see that the rate of change of f in any ˆ is zero as direction given by the unit vector u ˆ · ∇f (a, b) = u ˆ · 0 = 0, fuˆ (a, b) = u which means that at a stationary point, the rate of change of f is zero in all directions.
7
We have now seen how to find the stationary points of a function, f (x, y), but what do they look like? Generally speaking, we will find that there are three kinds of stationary point — namely local minima, saddle points and local maxima — and these are illustrated in Figure 7.1(a), (b) and (c) respectively. We now consider what criteria we can use to determine exactly what kind of stationary point we have found. x
x
x
y
y
y
(a) local minimum
(b) saddle point
(c) local maximum
Figure 7.1: Each of these surfaces has the indicated kind of stationary point at (0, 0, 0).
7.2.2
Classifying stationary points
Let’s say that we have found that (a, b) is a stationary point of the function, f (x, y). This means that fx (a, b) = 0 and fy (a, b) = 0, and so, in particular, the derivative of f at this point is given by df = fx (a, b), fy (a, b) = 0 0 = 0. dx (a,b)
264
7.2. Unconstrained optimisation
However, we saw in Section 6.4.5, that the second-order Taylor series of the function f (x, y) around the point (a, b) is given by d2 f 1 df x−a x−a x − a, y − b + + ··· , f (x, y) = f (a, b) + dx (a,b) y − b 2! dx 2 (a,b) y − b
and so, as (a, b) is a stationary point, we have
d2 f 1 x − a x − a, y − b f (x, y) − f (a, b) = + ··· , 2! dx 2 (a,b) y − b
provided that the point (x, y) is sufficiently close to the point (a, b). Consequently, if we let K(x, y) be the quantity d2 f x − a x − a, y − b , dx 2 (a,b) y − b
we can see that: If, for all (x, y) close to [but not equal to] (a, b), we have: K(x, y) > 0, then f (x, y) > f (a, b) for such points and so the function always lies above the horizontal tangent plane at (a, b). This means that the stationary point is a local minimum as in Figure 7.1(a). K(x, y) < 0, then f (x, y) < f (a, b) for such points and so the function always lies below the horizontal tangent plane at (a, b). This means that the stationary point is a local maximum as in Figure 7.1(c). However, if we find that there are some points (x, y) close to [but not equal to] (a, b) that make K(x, y) > 0 and others that make K(x, y) < 0, we see that at some points we have f (x, y) > f (a, b) and so the function lies above the horizontal tangent plane and at other points we have f (x, y) < f (a, b) and so the function lies below the horizontal tangent plane. Indeed, as we saw in Figure 7.1(b), this is exactly what happens when we have a saddle point. Now, it turns out that,1 if we use the definition of the second derivative matrix, we have fxx (a, b) fxy (a, b) x−a K(x, y) = x − a, y − b , fyx (a, b) fyy (a, b) y−b which means that, performing the matrix multiplications we get, K(x, y) = (x − a)2 fxx (a, b) + 2(x − a)(y − b)fxy (a, b) + (y − b)2 fyy (a, b), 1
This is most easily done if we show that the second derivative of f (x, y) at the point (a, b), i.e. the matrix d2 f fxx (a, b) fxy (a, b) = , fyx (a, b) fyy (a, b) dx 2 (a,b)
is positive definite or negative definite as in Binmore and Davies (2002) Section 6.3. But you won’t encounter these concepts until you study 175 Further Linear Algebra and so we merely motivate the result that follows here.
265
7
7. Two-variable optimisation
if we assume, as usual, that fxy (a, b) = fyx (a, b). Then, taking out a factor of fxx (a, b) and completing the square, we get2 # " 2 fxx (a, b)fyy (a, b) − [fxy (a, b)]2 fxy (a, b) + (y − b)2 . K(x, y) = fxx (a, b) (x − a) + fxx (a, b) [fxx (a, b)]2 At this point we define the Hessian of f (x, y) to be the function H(x, y) = fxx (x, y)fyy (x, y) − [fxy (x, y)]2 , so that, finally, we have K(x, y) = fxx (a, b)
"
fxy (a, b) (x − a) + fxx (a, b)
2
# H(a, b) + (y − b)2 . [fxx (a, b)]2
Now, if we look at this carefully, we see that: If H(a, b) > 0 and fxx (a, b) > 0, then K(x, y) > 0 for all (x, y) close to [but not equal to] (a, b) and so, as we saw above, this means that the stationary point (a, b) is a local minimum. If H(a, b) > 0 and fxx (a, b) < 0, then K(x, y) < 0 for all (x, y) close to [but not equal to] (a, b) and so, as we saw above, this means that the stationary point (a, b) is a local maximum.
7
Indeed, if we find that H(a, b) < 0, we can see that there will be some points (x, y) close to [but not equal to] (a, b) that make K(x, y) > 0 and others that make K(x, y) < 0. In this case, as we saw above, this means that the stationary point (a, b) is a saddle point. In summary, we have now motivated the following method for classifying our stationary points: If (a, b) is a stationary point of the function, f (x, y), and the Hessian is defined to be the function H(x, y) = fxx (x, y)fyy (x, y) − [fxy (x, y)]2 , then If H(a, b) > 0 and fxx (a, b) > 0, then this stationary point is a local minimum. If H(a, b) > 0 and fxx (a, b) < 0, then this stationary point is a local maximum. If H(a, b) < 0, then this stationary point is a saddle point. In particular, if H(a, b) = 0, we can draw no conclusions about the nature of the stationary point by using this method. Let’s look at some examples of how this works in practice. 2
Technically, we have assumed that fxx (a, b) 6= 0 here, but if this was not the case we could present a slightly different argument to deal with this problem. However, as we are just trying to motivate what follows instead of providing a rigorous argument for it, we will skip these technicalities here.
266
7.2. Unconstrained optimisation
Example 7.3
Classify the stationary points we found in Example 7.1.
Using the first-order partial derivatives we found in Example 7.1, we find that the second-order partial derivatives are fxx (x, y) = 12x2 + 4y,
fxy (x, y) = 4x = fyx (x, y)
and
fyy (x, y) = 4,
and, as such, the Hessian is given by H(x, y) = (12x2 + 4y)(4) − (4x)2 = 48x2 + 16y − 16x2 = 16(2x2 + y). Evaluating this at each of the stationary points we then find that: At (0, −1/4), the Hessian is H(0, −1/4) = 16(−1/4) < 0, and so this is a saddle point. √ At (1/ 2, −1/2), the Hessian is √ √ H(1/ 2, −1/2) = 16(1/2) > 0 and fxx (1/ 2, −1/2) = 6 − 2 > 0, so this is a local minimum. √ At (−1/ 2, −1/2), the Hessian is √ √ H(−1/ 2, −1/2) = 16(1/2) > 0 and fxx (−1/ 2, −1/2) = 6 − 2 > 0, so this is a local minimum. Thus, the stationary points we found in Example 7.1, i.e. 1 1 1 1 1 √ ,− 0, − , and −√ , − , 4 2 2 2 2 are a saddle point and two local minima respectively. Example 7.4
Classify the stationary points we found in Example 7.2.
Using the first-order partial derivatives we found in Example 7.2, we find that the second-order partial derivatives are fxx (x, y) = 24x,
fxy (x, y) = −60 = fyx (x, y)
and
fyy (x, y) = 10,
and, as such, the Hessian is given by H(x, y) = (24x)(10) − (−60)2 = 240x − 3600 = 240(x − 15). Evaluating this at each of the stationary points we then find that: At (10, 20), the Hessian is H(10, 20) = 240(−5) < 0,
267
7
7. Two-variable optimisation
and so this is a saddle point. At (20, 80), the Hessian is H(20, 80) = 240(5) > 0
and
fxx (20, 80) = 24(20) > 0,
so this is a local minimum. Thus, the stationary points (10, 20) and (20, 80) are a saddle point and a local minimum respectively. Activity 7.3
Classify the stationary points we found in Activity 7.1.
Activity 7.4
Classify the stationary points we found in Activity 7.2.
Lastly, we have remarked above that in cases where the Hessian is zero at a stationary point, the method that we have used so far fails. Indeed, in such cases, the stationary point could be a local minimum, a local maximum or a saddle point and, to determine which, we would have to think more carefully about what is happening. Let’s consider an example of a function where this kind of problem occurs.
7
Example 7.5 Find the stationary point of the function f (x, y) = x3 − y 3 and show that we can’t determine its nature using the method above. What kind of stationary point do we have here? The first-order partial derivatives of this function are fx (x, y) = 3x2
and
fy (x, y) = −3y 2 .
So, clearly, the only stationary point is at (0, 0). The second-order partial derivatives of this function are given by fxx (x, y) = 6x,
fxy (x, y) = 0 = fyx (x, y)
and
fyy (x, y) = −6y,
and, as such, the Hessian is given by H(x, y) = (6x)(−6y) − 02 = −36xy. Indeed, evaluating this at the stationary point gives H(0, 0) = 0 and so the method we used above fails. However, if we consider the surface z = f (x, y), notice that the y = 0 section of our function gives z = f (x, 0) = x3 . As such, if we look at this section around the stationary point (0, 0) where z = f (0, 0) = 0, we can see that if x > 0, we have f (x, 0) > f (0, 0) and so this stationary point can’t be a local maximum, whereas if x < 0, we have f (x, 0) < f (0, 0) and so this stationary point can’t be a local minimum.
268
7.2. Unconstrained optimisation
Indeed, if we look at the x = 0 section of our function, i.e. z = f (0, y) = −y 3 , this leads us to a similar conclusion. In fact, looking at the sections, we can see that this is a kind of saddle point, albeit one which ‘looks different’ to the one that we saw before in Figure 7.1(b), and it is illustrated in Figure 7.2.
100
100
50
50
200
100 0 -4
-2
0 0
2
4
-4
-2
0
x
2
4
0
y
-50
-50
-100
-100
-100 4
-200
2 0
x
-2 4
2
0
-4 -2
-4
y
(a)
(b)
(c)
Figure 7.2: Some useful pictures for Example 7.5. (a) The y = 0 section, z = f (x, 0) = x3 .
(b) The x = 0 section, z = f (0, y) = −y 3 . (c) The surface z = f (x, y) = x3 −y 3 displaying a ‘different kind’ of saddle point at (0, 0, 0).
Activity 7.5 Find the stationary point of the function
7
f (x, y) = (x − 1)4 + (y − 1)4 , and show that we can’t determine its nature using the method above. What kind of stationary point do we have here?
7.2.3
Convex and concave functions
As we saw in Section 4.3.2, it is often useful to know whether a function is convex or concave. In particular, we will see that, if a function, f (x, y), is convex (or concave) for all (x, y) ∈ R2 , then a local minimum (or local maximum) is actually a global minimum (or global maximum), i.e. we can find the smallest (or largest) value that the function can attain. To see how this works, consider that, in the case of a function of one variable, f (x), we saw in Section 4.3.2 that f (x) is convex on R if it lies above all of its tangent lines, and f (x) is concave on R if it lies below all of its tangent lines. So, analogously, we say that a function of two variables, f (x, y), is convex on R2 if it lies above all of its tangent planes, and concave on R2 if it lies below all of its tangent planes. As an example of what this means, it should be clear from what we can see of the surfaces illustrated in Figure 7.1, that in:
269
7. Two-variable optimisation
(a) where we have a local minimum, the function is convex because it lies above all of its tangent planes (b) where we have a saddle point, the function is neither convex nor concave as, considering the horizontal tangent plane at (0, 0, 0), some of the function lies above this tangent plane and the rest of it lies below this tangent plane. (c) where we have a local maximum, the function is concave because it lies below all of its tangent planes. We now want to develop a way of determining whether a function is convex or concave on R2 . Suppose that we have a function f (x, y) that is convex. As we saw in Section 6.4.1, at any point (a, b), the tangent plane to this function has a Cartesian equation given by df x−a z = f (a, b) + , dx (a,b) y − b
7
and, as this function is convex, it must be the case that for all (x, y) ∈ R2 , the function lies above this tangent plane, i.e. we must have df x−a f (x, y) ≥ f (a, b) + . dx (a,b) y − b
However, using the second-order Taylor series for f (x, y) around the point (a, b), this means that we have d2 f df 1 df x−a x−a x−a x − a, y − b f (a, b)+ + ≥ f (a, b)+ , dx (a,b) y − b 2! dx 2 (a,b) y − b dx (a,b) y − b
which simplifies to give us
d2 f x−a x − a, y − b ≥ 0, dx 2 (a,b) y − b
and this just asserts that K(x, y) ≥ 0 using our notation from Section 7.2.2. However, using what we saw before, this means that we require H(x, y) ≥ 0
and
fxx (x, y) ≥ 0,
and this is, therefore, our condition for convexity.3 Activity 7.6 Using an argument similar to the one above, explain why a concave function requires that H(x, y) ≥ 0 and fxx ≤ 0. The upshot of this is that we can now see that a function, f (x, y), is convex on R2 if, for all (x, y) ∈ R2 , H(x, y) ≥ 0 and fxx (x, y) ≥ 0, and concave on R2 if, for all (x, y) ∈ R2 , H(x, y) ≥ 0 and fxx (x, y) ≤ 0. 3
Again, we have glossed over any complications in our derivation that would occur if fxx (x, y) = 0 for some point, (x, y).
270
7.2. Unconstrained optimisation
Note, in particular, that when testing for convexity or concavity, we can have H(x, y) = 0 even though we must have H(x, y) 6= 0 when we are classifying stationary points using the method of the previous section. But, it should be clear that if a function, f (x, y), has a stationary point and it is convex, then that stationary point is a global minimum. concave, then that stationary point is a global maximum. That is, we now have a way of determining whether a local minimum (or a local maximum) is a global minimum (or a global maximum). Example 7.6 Show that the function f (x, y) = x2 + y 2 has a global minimum at the point (0, 0, 0). The first-order partial derivatives of this function are fx (x, y) = 2x
and
fy (x, y) = 2y.
At a stationary point, we must have fx (x, y) = 0 and fy (x, y) = 0, i.e. we must have x = 0 and y = 0. Indeed, as z = f (0, 0) = 0, this means that we have a stationary point at (0, 0, 0). The second-order partial derivatives of this function are fxx (x, y) = 2,
fxy (x, y) = 0 = fyx (x, y)
and
7 fyy (x, y) = 2,
and, as such, the Hessian is given by H(x, y) = (2)(2) − 02 = 4. So, at the stationary point, we have H(0, 0) = 4 > 0 and fxx (0, 0) = 2 > 0 which means that this is a local minimum. But, in fact, we have H(x, y) = 4 ≥ 0 and fxx (x, y) = 2 ≥ 0 for all (x, y) ∈ R2 here and so this function is actually convex on R2 , i.e. the local minimum we have found here is actually a global minimum. In particular, notice that this should have been obvious since we have z = f (0, 0) = 0 at the stationary point and for all other x, y ∈ R, we have z = f (x, y) = x2 + y 2 > 0, i.e. f (x, y) ≥ f (0, 0) for all x, y ∈ R. Consequently, it should be clear that this function has a global minimum at (0, 0) and this minimum value is zero. Lastly, we note that these conditions can also be used to determine the regions in the (x, y)-plane where a function is convex, concave or neither as the next example shows.
271
7. Two-variable optimisation
Example 7.7 Determine the regions in the (x, y)-plane where the function, f (x, y) = x2 − y 3 is convex, concave or neither. The first-order partial derivatives of this function are fx (x, y) = 2x
and
fy (x, y) = −3y 2 ,
and so the second-order partial derivatives of this function are fxx (x, y) = 2,
fxy (x, y) = 0 = fyx (x, y)
and
fyy (x, y) = −6y,
which means that the Hessian is given by H(x, y) = (2)(−6y) − 02 = −12y. As such, we see that: When y > 0, H(x, y) < 0 and so the function is neither convex nor concave. When y ≤ 0, H(x, y) ≥ 0 and fxx (x, y) = 2 ≥ 0 and so the function is convex.
7
The surface z = f (x, y) defined by this function is illustrated in Figure 7.3. In particular, observe that this function has a stationary point at (0, 0, 0) and that, even though our method for classifying this point fails here (as H(0, 0) = 0), it is clearly a saddle point.
Figure 7.3: The surface z = f (x, y) where f (x, y) = x2 − y 3 from Example 7.7. Observe
that this function is convex when y ≤ 0 but that it is neither convex nor concave when y > 0. Let’s now look at some applications of this material.
7.2.4
Applications
Optimisation problems are very common in economics and we now introduce two ways in which they can arise in that subject. The first is their use in cost minimisation and the second will be another instance of profit maximisation.
272
7.2. Unconstrained optimisation
Cost minimisation Suppose a firm is using quantities x and y of two commodities and this incurs a cost given by the cost function, C(x, y). One might reasonably ask: What quantities should they be using if they want to minimise their costs? Example 7.8 A data processing company employs both senior and junior programmers. A particularly large project will cost C(x, y) = 2000 + 2x3 − 12xy + y 2 , pounds, where x and y represent the number of junior and senior programmers used respectively. How many employees of each kind should be assigned to the project in order to minimise its cost? What is this minimum cost? To minimise the cost, we need to find the stationary points of C(x, y) and determine which of them gives us a minimum. So, as before, we start by finding the first-order partial derivatives of C(x, y), i.e. Cx (x, y) = 6x2 − 12y
Cy (x, y) = −12x + 2y.
and
At a stationary point, both of these first-order partial derivatives are zero, i.e. we must have Cx (x, y) = 0 and Cy (x, y) = 0. Thus, to find the stationary points, we have to solve the simultaneous equations 6x2 − 12y = 0
− 12x + 2y = 0.
and
We start by simplifying these equations to get x2 − 2y = 0
and
− 6x + y = 0,
and then notice that the second equation gives us y = 6x. Substituting this into the first equation then allows us to see that x2 − 2(6x) = 0
=⇒
x2 − 12x = 0
=⇒
x(x − 12) = 0
=⇒
x = 0 or x = 12,
and, since y = 6x, we have y = 6(0) = 0
or
y = 6(12) = 72,
respectively. Thus, the cost function, C(x, y), has two stationary points, namely the points (0, 0) and (12, 72). To classify these stationary points, we look at the second-order partial derivatives of C(x, y), which are Cxx (x, y) = 12x,
Cxy (x, y) = −12 = Cyx (x, y)
and
Cyy (x, y) = 2,
and, as such, the Hessian is given by H(x, y) = (12x)(2) − (−12)2 = 24x − 144 = 24(x − 6). Evaluating this at each of the stationary points we then find that:
273
7
7. Two-variable optimisation
At (0, 0), the Hessian is H(0, 0) = 24(−6) < 0, and so this is a saddle point. At (12, 72), the Hessian is H(12, 72) = 24(+6) > 0
and
Cxx (12, 72) = 12(12) > 0,
so this is a local minimum. Consequently, to minimise the cost we want to use 12 junior and 72 senior programmers. If we do this we find that the minimum cost is given by C(12, 72) = 2000 + 3456 − 10368 + 5184 = 272, i.e. the minimum cost is £272.4 Profit maximisation
7
We now describe the problem of maximising the profit of a firm which makes two products, X and Y. Generally, if pX and pY are the selling prices of one unit of X and one unit of Y respectively, then the total revenue, TR(x, y), obtained from producing amounts x of product X and y of product Y is TR(x, y) = xpX + ypY . Of course, there are a number of ways in which the prices pX and pY may be related to the quantities x and y. For instance: If the goods were related, pX and pY could both depend on x and y (e.g. if we were considering a music company producing an album on both CD and cassette). If the goods were unrelated, pX and pY could depend only on x and y respectively (e.g. a pharmaceuticals company producing paracetamol and insulin). The firm will also have a joint total cost function, TC(x, y), which tells us how much it costs to produce x units of X and y units of Y. Clearly, given TR(x, y) and TC(x, y), we can consider the profit function of the firm, π(x, y), which is given by π(x, y) = TR(x, y) − TC(x, y) = xpX + ypY − TC(x, y), and we can maximise this function of x and y using the techniques described above. Let’s look at an example. Example 7.9 Suppose that a firm is the sole supplier of X and Y (in other words, it has a monopoly on these goods) and that the demands for X and Y, in tonnes, are given by x = 2 − 2pX + pY and y = 13 + pX − 2pY , 4
Which, thinking about it, is far less than the value of C(x, y) at the other stationary point since C(0, 0) = 2000.
274
7.3. Constrained optimisation
respectively when each tonne of X and Y sells at a price, in pounds, of pX and pY , respectively. If the joint total cost function of the firm is TC(x, y) = 5 + x2 − xy + y 2 , find the quantities of X and Y the firm should produce in order to maximise its profit. What are the corresponding prices? What is the maximum profit? We start by rearranging the equations to find expressions for pX and pY .5 The first equation tells us that pY = x − 2 + 2pX and so substituting this into the second equation yields y = 13 + pX − 2(x − 2 + 2pX ) =⇒ y = 13 + pX − 2x + 4 − 4pX =⇒ 3pX = 17 − 2x − y. As such, we have
17 − 2x − y , 3 and so substituting this into pY = x − 2 + 2pX , we find that pX =
pY = x−2+2
17 − 2x − y 3x − 6 + 34 − 4x − 2y 28 − x − 2y =⇒ pY = =⇒ pY = . 3 3 3
Consequently, the profit function in this case is given by
∴
π(x, y) = xpX + ypY − TC(x, y) 17 − 2x − y 28 − x − 2y =x +y − (5 + x2 − xy + y 2 ) 3 3 1 2 2 2 2 = (17x − 2x − xy) + (28y − xy − 2y ) − (15 + 3x − 3xy + 3y ) 3 1 2 2 π(x, y) = − 15 + 17x + 28y − 5x − 5y + xy , 3
and we can now maximise this profit function using the method above.
Activity 7.7 Finish the problem started in Example 7.9. That is, find the values of x and y that maximise the profit function π(x, y) found in the example, the corresponding prices pX and pY , and the maximum profit.
7.3
Constrained optimisation
We now turn our attention to the problem of constrained optimisation, i.e. the problem of optimising a function, f (x, y), in the case where the values of x and y we are considering are constrained by the requirement that they must lie in some region, R, of R2 . In particular, we will see that the optimal point we seek will
5
Note that if the price of X was fixed and the price of Y was increased, then the demand for X would rise and the demand for Y would fall. This is the behaviour one might expect if X and Y were two related commodities, e.g. if they were two different types of chocolate bar.
275
7
7. Two-variable optimisation
either be a point inside the region, in which case it will be a stationary point of f (x, y) that happens to be in the region, or it will be a point on the boundary of the region, in which case it need not be a stationary point of f (x, y) even though it optimises this function over points in the region. Of course, in the former case, we can find and classify the stationary point in the region using the method in the previous section and then, checking that this point is more optimal than any point on the boundary of the region, we will have our answer. Let’s look at a quick example. Example 7.10 Minimise the function f (x, y) = (x − 1)2 + (y − 1)2 given that (x, y) must lie in the region defined by the inequalities x ≥ 0, y ≥ 0 and x + y ≤ 3. The first-order partial derivatives of this function are fx (x, y) = 2(x − 1)
and
fy (x, y) = 2(y − 1),
and so, setting these equal to zero, we see that (1, 1) is the only stationary point of this function. The second-order partial derivatives of this function are fxx (x, y) = 2,
7
fxy (x, y) = 0 = fyx (x, y)
and
fyy (x, y) = 2,
which means that the Hessian is given by H(x, y) = (2)(2) − 02 = 4, and so we see that H(1, 1) = 4 > 0 and fxx (1, 1) = 2 > 0 which means that this point is a local minimum. Indeed, as this point satisfies the inequalities given above,6 this point is in the specified region and so f (1, 1) = 0 is a candidate for the minimum value of f (x, y) for (x, y) that lie in the region. However, we must check that nothing ‘odd’ is happening due to the points on the boundary of the region and to do this we note that: If we are on the x = 0 boundary of the region (so, technically, 0 ≤ y ≤ 3) we have f (0, y) = 1 + (y − 1)2 ≥ 1 > 0. If we are on the y = 0 boundary of the region (so, technically, 0 ≤ x ≤ 3) we have f (x, 0) = (x − 1)2 + 1 ≥ 1 > 0. If we are on the x + y = 3 boundary of the region we have x = 3 − y (and, technically, 0 ≤ y ≤ 3) which means that
3 f (3 − y, y) = (2 − y) + (y − 1) = 2y − 6y + 5 = 2 y − 2 2
2
2
if we complete the square, but this means that f (3 − y, y) ≥
1 2
2
1 + , 2
> 0.
Thus, we can’t find values of f (x, y) as small as f (1, 1) = 0 on any of the boundaries of the region and so the minimum value of f (x, y) for points in this region is zero and this occurs at the point (1, 1).
276
7.3. Constrained optimisation
Activity 7.8
Explain why the answer we found in the previous example is obvious!
However, in what follows we will be more interested in solving constrained optimisation problems where the optimal point occurs on the boundary of the region since the methods we have developed so far will not help us in that case.
7.3.1
Finding optimal points on the boundary of a region
Generally speaking, when the optimal point occurs on the boundary of a region, we will be able to find it by considering the contours of the function we are optimising in relation to the region we are optimising the function over. Indeed, when doing this, we will find that we are in one of the two cases below. The optimal point is at a ‘corner’ of the boundary The following example should clarify what we should do in this case. Example 7.11 Maximise the function f (x, y) = x2 + y 2 given that (x, y) must lie in the region defined by the inequalities x ≥ 0, y ≥ 0 and x + 2y ≤ 4. We start by sketching the region which is the shaded triangle in Figure 7.4(a) and some typical contours of the surface z = f (x, y). Indeed, notice that here, the contour z = c has equation x2 + y 2 = c, √ and so it will be a circle of radius c centred on the origin. In the figure, we have sketched the z = 4 and z = 16 contours and, in particular, we notice that as the contours move away from the origin, the value of z increases as indicated by the arrow. Now, to find the maximum value of f (x, y) in this region we need a point which both lies in the region, and gives us the largest value of z. That is, in this case, we want the point (4, 0) which is a ‘corner’ of the boundary. In particular, notice that with this point on the z = 16 contour: we get a higher value of z than we do from any point on a contour with z < 16 (like, say, the z = 2 contour), and we can’t have any point on a contour with z > 16 as none of these contours will give us a point in the region. That is, the point (4, 0) which gives us z = 16 must indeed maximise the function f (x, y) given that (x, y) must lie in the specified region. 6
That is, the point (1, 1) clearly satisfies the inequalities x ≥ 0 and y ≥ 0 as well as the inequality x + y ≤ 3 since 1 + 1 = 2 < 3.
277
7
7. Two-variable optimisation
2 O
y
d i n i rec cr ti ea on si n o g f z
4
d i n i rec cr ti ea on si n o g f z
y
z = 16 z=4
2
2
4
x
(a)
(x∗ , y ∗ )
O
4
z=2 z=1 x
(b)
Figure 7.4: (a) The region for Example 7.11 is the shaded triangle and the z = 4 and
z = 16 contours are indicated. (b) The region for Example 7.12 is the same shaded triangle and the z = 1 and z = 2 contours are indicated. Note, in both cases, the direction in which z increases. The optimal point is on the boundary but it isn’t a ‘corner’ This is the case that is going to concern us the most and so, for the moment, we just look at an example to see what is happening before we come to the recommended method for solving such problems.
7
Example 7.12 Maximise the function f (x, y) = xy given that (x, y) must lie in the region defined by the inequalities x ≥ 0, y ≥ 0 and x + 2y ≤ 4. We start by sketching the region which is the shaded triangle in Figure 7.4(b) and some typical contours of the surface z = f (x, y). Indeed, notice that here, the contour z = c has equation xy = c, and so it will be a rectangular hyperbola with the x and y-axes as its asymptotes. In the figure, we have sketched the z = 1 and z = 2 contours and, in particular, we notice that as the contours move away from the origin, the value of z increases as indicated by the arrow. Now, to find the maximum value of f (x, y) in this region we need a point which both lies in the region, and gives us the largest value of z. That is, in this case, we want the point (x∗ , y ∗ ) which is not a ‘corner’ of the boundary. In particular, notice that with this point on the z = 2 contour: we get a higher value of z than we do from any point on a contour with z < 2 (like, say, the z = 1 contour), and we can’t have any point on a contour with z > 2 as none of these contours will give us a point in the region. That is, the point (x∗ , y ∗ ) which gives us z = 2 must indeed maximise the function f (x, y) given that (x, y) must lie in the specified region. But, how do we find this point?
278
7.3. Constrained optimisation
One way to find this point is to see that it is a point where, for some constant c, we have a contour f (x, y) = c which is both tangential to the line x + 2y = 4, and touching the line x + 2y = 4. Indeed, as the gradient of f (x, y) = c is given by dy ∂f /∂x y =− =− , dx ∂f /∂y x as we saw in Section 6.3.3 and the gradient of the line x + 2y = 4 is given by y =2−
x 2
=⇒
dy 1 =− , dx 2
the first condition means that we must have a point which satisfies the equation −
y 1 =− x 2
=⇒
y=
x , 2
whereas the second condition means that we must have a point which satisfies the equation x + 2y = 4. Solving these equations simultaneously, we find that this gives us the point (x∗ , y ∗ ) = (2, 1).7 Now, in such cases, we could always proceed in this way but, as we shall see in a moment, there is a way of turning this idea into a much more general method. And, it is this new method that we will generally use in such cases.
7.3.2
The method of Lagrange multipliers
Suppose that we have been asked to optimise the function, f (x, y), given that (x, y) must lie in some region and, by looking at the contours as above, we have determined that the optimal point occurs on the boundary given by some equation g(x, y) = 0. In particular, we are concerned with the case where the optimal point is not a ‘corner’ of the boundary, i.e. we want a point where, for some constant c, the contour f (x, y) = c is both tangential to the boundary given by g(x, y) = 0, and touching the boundary given by g(x, y) = 0. Now, for tangency, we require that the gradient of the contour f (x, y) = c, i.e. dy fx (x, y) =− , dx fy (x, y) is equal to the gradient of the boundary given by g(x, y) = 0, i.e. dy gx (x, y) =− , dx gy (x, y) 7
And, at this point, z = f (2, 1) = 2 as expected from above. But, in general, we would not know the optimal value of z = f (x, y) beforehand. We have just used it here to help illustrate what is going on.
279
7
7. Two-variable optimisation
where we have used what we saw in Section 6.3.3 twice. But, if these are equal, we have −
gx (x, y) fx (x, y) =− fy (x, y) gy (x, y)
=⇒
fx (x, y) fy (x, y) = , gx (x, y) gy (x, y)
and we denote this common value by λ, i.e. we have λ=
fy (x, y) fx (x, y) = . gx (x, y) gy (x, y)
Rearranging this we then get two equations, namely fx (x, y) − λgx (x, y) = 0 and fy (x, y) − λgy (x, y) = 0, or, more simply, ∂ f (x, y) − λg(x, y) = 0 and ∂x
∂ f (x, y) − λg(x, y) = 0. ∂y
So, any point which satisfies these two equations is a point where the contour f (x, y) = c is tangential to the boundary g(x, y) = 0. We also note that the equation ∂ f (x, y) − λg(x, y) = 0 =⇒ g(x, y) = 0, ∂λ
7
and so, any point which satisfies this equation lies on the boundary. Consequently, we define the Lagrangean to be the function L(x, y, λ) = f (x, y) − λg(x, y), and we call λ the Lagrange multiplier. In particular, the point we seek will be amongst the stationary points of the Lagrangean since it must satisfy the equations ∂L = 0, ∂x
∂L = 0 and ∂y
∂L = 0, ∂λ
which we have derived above. In such cases, we call the function we are optimising, f (x, y), the objective function and we call the equation of the boundary, which must be written in the form g(x, y) = 0, the constraint. Let’s see how we can use this method to solve the constrained optimisation problem we saw in Example 7.12. Example 7.13 Solve the constrained optimisation problem in Example 7.12 using the method of Lagrange multipliers. We have already seen that the optimal point we seek occurs when the function f (x, y) = xy is tangential to the boundary given by the line x + 2y = 4. Writing the equation of the line in the form g(x, y) = x + 2y − 4 = 0 we see that the Lagrangean is L(x, y, λ) = xy − λ(x + 2y − 4), where λ is the Lagrange multiplier. We now find the stationary points of the Lagrangean by finding its first-order partial derivatives, i.e. Lx (x, y, λ) = y − λ,
280
Ly (x, y, λ) = x − 2λ and Lλ (x, y, λ) = −(x + 2y − 4),
7.3. Constrained optimisation
and setting them equal to zero to get the equations y − λ = 0,
x − 2λ = 0 and x + 2y − 4 = 0.
We now eliminate λ from the first two equations to get λ=y=
x 2
=⇒
y=
x , 2
and this, as you should expect is our tangency condition from Example 7.12. On the other hand, the third equation is just x + 2y = 4, which, as you should expect, is our constraint. Solving these two equations simultaneously, we then get the point (2, 1) as the only solution and so this must be the optimal point we seek in agreement with what we found in Example 7.12. Obviously, at this point, we find that f (1, 2) = 2 is the maximum value of f subject to the constraint. Sometimes we will see questions where we are just asked to use this method to solve a constrained optimisation problem. In such cases, we will be given the objective function, f (x, y), and the constraint, g(x, y) = 0, which we should be using. In particular, unless we are explicitly asked to look at contours, we will just apply the method and assume that the answer we find is the appropriate kind of optimal point.8 Let’s look at an example of such a problem. Example 7.14
Given the function f (x, y) = 160x − 3x2 − 2xy − 2y 2 + 120y − 18,
find the maximum value of f (x, y) subject to the constraint x + y = 34. We write the constraint x + y = 34 as x + y − 34 = 0 so that it is in the form g(x, y) = 0 with g(x, y) = x + y − 34. This allows us to write the Lagrangean as L(x, y, λ) = 160x − 3x2 − 2xy − 2y 2 + 120y − 18 − λ(x + y − 34), where λ is the Lagrange multiplier. To find the stationary points of the Lagrangean we find its first-order partial derivatives, i.e. Lx (x, y, λ) = 160 − 6x − 2y − λ, Ly (x, y, λ) = −2x − 4y + 120 − λ and Lλ (x, y, λ) = −(x + y − 34), and set them equal to zero to get the equations 160 − 6x − 2y − λ = 0,
−2x − 4y + 120 − λ = 0
and
x + y − 34 = 0.
8
Although, sometimes, the Lagrangean may have several stationary points and, if that happens, it should be fairly straightforward to see which of these is the one we want.
281
7
7. Two-variable optimisation
The first two equations give us λ = 160 − 6x − 2y
and
λ = −2x − 4y + 120,
and so we can eliminate λ to get 160 − 6x − 2y = −2x − 4y + 120
=⇒
2y = 4x − 40
=⇒
y = 2x − 20,
whereas the third equation gives us x + y = 34 which is, of course, just our constraint. So, as this gives y = 34 − x, we can use it and the y = 2x − 20 that we have just found to eliminate y and get 34 − x = 2x − 20
=⇒
3x = 54
=⇒
x = 18.
And, if x = 18, then the constraint y = 34 − x gives us y = 34 − 18 = 16. Thus, the point (18, 16) is the only stationary point of the Lagrangean and so it must be the optimal point we seek. Thus, the maximum of f (x, y) subject to the constraint g(x, y) = 0 is f (18, 16) = 2, 722. Note that, although we have only used this method to find maxima in the examples above, it will find minima as well and we will see an example of this when we consider cost minimisation problems in Section 7.3.4.
7
7.3.3
The meaning of the Lagrange multiplier
In addition to allowing us to solve certain constrained optimisation problems, the method of Lagrange multipliers has another use which will be important when we come to consider its applications in Section 7.3.4. To see this, consider that, when we are asked to optimise f (x, y) subject to the constraint g(x, y) = c where c is a constant we would proceed as follows. Writing the constraint in the form g(x, y) − c = 0, we have the Lagrangean L(x, y, λ) = f (x, y) − λ(g(x, y) − c), where λ is the Lagrange multiplier. Its first-order partial derivatives are given by Lx (x, y, λ) = fx (x, y) − λgx (x, y), Ly (x, y, λ) = fy (x, y) − λgy (x, y) and Lλ (x, y, λ) = −(g(x, y) − c) and we find that the stationary points occur when we set these equal to zero to get the equations fx (x, y) − λgx (x, y) = 0,
fy (x, y) − λgy (x, y) = 0 and g(x, y) − c = 0.
Now, the first two equations tell us that fx (x, y) = λgx (x, y) and fy (x, y) = λgy (x, y), and, clearly, neither of these depend on c. However, when we solve these equations in the standard way and use the constraint, g(x, y) = c, we find the point (x∗ , y ∗ ) which
282
7.3. Constrained optimisation
optimises f (x, y) subject to the constraint. Of course, since we have used the constraint to find the point (x∗ , y ∗ ), the values of x and y we found will depend on c, i.e. we have the functions x∗ = x(c) and y ∗ = y(c) of c. In particular, this means that the optimal value of f (x, y) subject to the constraint that we have found also depends on c, let’s call this F (c), i.e. we have F (c) = f (x∗ , y ∗ ) = f (x(c), y(c)). Now, if we differentiate this with respect to c using the chain rule (see Section 6.3.3), we have ∂f dx ∂f dy dF = + , dc ∂x dc ∂y dc so that, using our expressions for fx (x, y) and fy (x, y) above, we get dF ∂g dx ∂g dy ∂g dx ∂g dy =λ +λ =λ + . dc ∂x dc ∂y dc ∂x dc ∂y dc However, given the constraint g(x, y) = c, we see that differentiating both sides with respect to c we get ∂g dx ∂g dy + = 1, ∂x dc ∂y dc where we have used the chain rule again on the left-hand-side. Putting these last two equations together, we find that dF = λ, dc i.e. the Lagrange multiplier is the rate of change of the optimal value of f (x, y) subject to the constraint g(x, y) = c with respect to c. In particular, if we allowed our constraint to change from g(x, y) = c to g(x, y) = c + ∆c we would find that the change in the optimal value of f (x, y) subject to this constraint, i.e. F (c), is given by ∆F 'λ ∆c
=⇒
∆F ' λ∆c,
provided that ∆c is suitably small. Let’s see how this works in the context of Example 7.14. Example 7.15 Using what we found in Example 7.14, find λ and hence find the approximate change in the maximum value of f (x, y) subject to the constraint x + y = 34 if the constraint is changed to x + y = 35. We have found that the maximum value of f (x, y) subject to the constraint x + y = 34 is f (18, 16) = 2, 722. As this occurs at the point (18, 16) we can use either of the first two equations we found in Example 7.14 to find λ so, using the first, we have 160 − 6x − 2y − λ = 0 =⇒ λ = 160 − 6(18) − 2(16) = 20. Consequently, using the theory above, we have a change in the constraint from x + y = 34 to x + y = 35 which gives ∆c = 1 and so the change in the maximum value of f (x, y) subject to this constraint is approximately 20. We now turn to some applications of constrained optimisation in economics.
283
7
7. Two-variable optimisation
7.3.4
Applications
Constrained optimisation problems are very common in economics and we now introduce two ways in which they can arise in that subject. The first is their use when a consumer wants to maximise their utility subject to a constraint imposed by their budget and the second is when a firm wants to minimise its costs subject to a constraint on its level of production. Utility maximisation subject to a budget constraint Suppose that a consumer is interested in buying some combination of two goods. Let’s say the price of the first good is p1 per unit, the price of the second good is p2 per unit and the consumer has an amount M to spend on them. Indeed, if he wants to purchase the bundle, (x1 , x2 ), which contains quantities x1 and x2 of the first and second good respectively, it will cost him p 1 x1 + p 2 x2 , and he can afford this bundle if he satisfies the budget constraint given by p1 x1 + p2 x2 ≤ M,
M/p2
M/p2
ut
ili
n g
io ct cr
+
ea
sin
re
x1 p1
di
ty
x2 of
x2
x2 p2
in
7
where x1 , x2 ≥ 0 as they represent quantities. This gives us a budget set, i.e. the set of all bundles that the consumer can afford given the prices of the goods and his budget. Indeed, geometrically, the bundles he can afford are contained in the triangular region illustrated in Figure 7.5(a).
= M
O
M/p1 (a)
x1
O
M/p1
x1
(b)
Figure 7.5: (a) The budget set for our consumer. (b) Adding three contours, u(x1 , x2 ) =
c, where the direction in which u(x1 , x2 ) is increasing is as indicated. Clearly, we are interested in the point which is indicated in the figure. Now, if his utility function is u(x1 , x2 ), the consumer wants to maximise this subject to the constraint that he must be able to afford the bundle. That is, he must maximise u(x1 , x2 ) subject to the constraint that the bundle he chooses is in the budget set. Let’s assume that, in this case, the utility function has contours u(x1 , x2 ) = c, where c is a constant,9 that look like the ones illustrated in Figure 7.5(b) and that the direction of 9
These contours are called indifference curves as each point on such a contour gives our consumer the same utility, i.e. he will be indifferent between the bundles represented by points on the same contour.
284
7.3. Constrained optimisation
increasing utility is as indicated. Indeed, we observe in this case that the maximum value of u(x1 , x2 ) subject to the constraint imposed by the budget set occurs at the point indicated, i.e. a point where we have a contour of u(x1 , x2 ) which is both tangential to the line p1 x1 + p2 x2 = M , and touching the line p1 x1 + p2 x2 = M . As such, we could use the method of Lagrange multipliers to solve this problem, i.e. we would write the constraint as p1 x1 + p2 x2 − M = 0 and use the Lagrangean L(k, l, λ) = u(x1 , x2 ) − λ(p1 x1 + p2 x2 − M ), to find the point (x∗1 , x∗2 ) which maximises the consumer’s utility subject to the constraint. Indeed, having done this, we can define the function U (M ) = u(x∗1 , x∗2 ), which tells us the maximum utility of the consumer given his budget, M . In particular, using the theory in Section 7.3.3, we see that the value of the Lagrange multiplier we get from solving the equations will satisfy dU = λ, dM i.e. it gives us the consumer’s marginal utility of [budgetary] money if he is purchasing in a way that maximises his utility subject to his budget set. Let’s look at an example. Example 7.16 Suppose cats cost £2 each and dogs cost £1 each. If a consumer has a utility function given by u(x1 , x2 ) = x21 x22 , when he buys x1 cats and x2 dogs, how many cats and dogs should he buy if he wants to maximise his utility given that he has £M to spend? Find, U (M ), the maximum utility he can attain if he has a budget of M and verify that U 0 (M ) = λ where λ is the Lagrange multiplier. In this case, the budget set will be the region defined by the inequalities 2x1 + x2 ≤ M, and x1 , x2 ≥ 0 which looks like the one in Figure 7.5(a) whereas the contours u(x1 , x2 ) = c where u(x1 , x2 ) = x21 x22 look like the ones sketched in Figure 7.5(b). As such, we are in the situation described above and so we need to maximise u(x1 , x2 ) subject to the constraint that 2x1 + x2 = M
=⇒
2x1 + x2 − M = 0,
if we want the constraint in the right form. Thus, we have the Lagrangean L(x1 , x2 , λ) = x21 x22 − λ(2x1 + x2 − M ),
285
7
7. Two-variable optimisation
and we seek the points which simultaneously satisfy the equations Lx1 (x1 , x2 , λ) = 0, Lx2 (x1 , x2 , λ) = 0 and Lλ (x1 , x2 , λ) = 0. The first-order partial derivatives of L(x1 , x2 , λ) are Lx1 (x1 , x2 , λ) = 2x1 x22 − 2λ, Lx2 (x1 , x2 , λ) = 2x21 x2 − λ and Lλ (x1 , x2 , λ) = − (2x1 + x2 − M ) , and we set these equal to zero to yield the equations 2x1 x22 − 2λ = 0,
2x21 x2 − λ = 0
2x1 + x2 − M = 0.
and
We now solve these by eliminating λ from the first two equations, i.e. we get λ = x1 x22 = 2x21 x2
x1 x2 (x2 − 2x1 ) = 0
=⇒
=⇒
x2 = 2x1 ,
where we reject the solutions where x1 = 0 and x2 = 0 as these give a utility of zero which, clearly, won’t give us the maximum we seek. We then use this new relationship between x1 and x2 in the third equation, which is just the constraint 2x1 + x2 = M , to get 2x1 + 2x1 = M
7
=⇒
4x1 = M
=⇒
x1 =
M , 4
and then, using this in the equation x2 = 2x1 , we get x2 = M/2. Thus, these values of x1 and x2 maximise our consumer’s utility if he has a budget of M and his maximum utility is then given by 2 2 M M M M M4 , , = = U (M ) = u 4 2 4 2 64 which means that
4M 3 M3 = . 64 16 Of course, we can also find the value of λ using, say, the equation 2 M M3 M 2 = , λ = x1 x2 =⇒ λ = 4 2 16 U 0 (M ) =
which verifies that U 0 (M ) = λ.
Activity 7.9 Another consumer has a budget of £4 to buy cats and dogs at the prices in Example 7.16 and her utility function is u(x1 , x2 ) = 3x1 + x2 when she buys x1 cats and x2 dogs. Sketch the budget set and some contours u(x1 , x2 ) = c where c is a constant for this consumer. How many cats and dogs should she buy if she wants to maximise her utility given her budget?
286
7.3. Constrained optimisation
Cost minimisation subject to a production constraint Suppose that capital costs £v per unit and labour costs £w per unit. This means that a firm which uses an amount k of capital and l of labour will incur costs given by the cost function C(k, l) = vl + wk. Also suppose that these inputs allow the firm to produce an amount given by the production function, q(k, l). We want to ask: How much capital and labour should the firm use if it needs to produce an amount Q of its product? That is, we want to solve the constrained optimisation problem minimise C(k, l) subject to the constraint q(k, l) = Q, where k, l ≥ 0 as they are quantities. Let’s assume that, in this case, the constraint q(k, l) = Q looks like the curve in Figure 7.6(a) for k, l ≥ 0. If we also sketch some contours of the cost function,10 we can identify the direction in which costs are
co g
sin ea
de
cr
di
re
ct
io
n
st
of
l
l
7 O
k
O
(a)
k (b)
Figure 7.6: (a) The constraint q(k, l) = Q. (b) Adding three contours, C(k, l) = c, where
the direction in which C(k, l) is decreasing is as indicated. Clearly, we are interested in the point which is indicated in the figure. decreasing as indicated in Figure 7.6(b). Indeed, we observe in this case that the minimum value of C(k, l) subject to the constraint q(k, l) = Q occurs at the point indicated, i.e. a point where we have a contour of C(k, l) which is both tangential to the constraint q(k, l) = Q, and touching the constraint q(k, l) = Q. As such, we could use the method of Lagrange multipliers to solve this problem, i.e. we would write the constraint as q(k, l) − Q = 0 and use the Lagrangean L(k, l, λ) = C(k, l) − λ(q(k, l) − Q), to find the point (k ∗ , l∗ ) which minimises the costs subject to the constraint. Indeed, having done this, we can define the function ˆ C(Q) = C(k ∗ , l∗ ), 10
These contours are called isocosts as each point on such a contour costs the firm the same amount of money.
287
7. Two-variable optimisation
which tells us the minimum cost of producing an amount, Q. In particular, using the theory in Section 7.3.3, we see that the value of the Lagrange multiplier we get from solving the equations will satisfy dCˆ = λ, dQ i.e. it gives us the marginal cost of the firm if it is producing in a way that minimises its costs subject to the constraint that it is producing an amount, Q. Let’s look at an example. Example 7.17 Suppose capital, k, costs £16 per unit and labour, l, costs £1 per unit. If a firm can produce an amount given by the production function q(k, l) = 10k 1/4 l1/4 , ˆ what values of k and l will minimise the cost of producing Q units? Find, C(Q), the 0 ˆ minimum cost of producing Q and verify that C (Q) = λ where λ is the Lagrange multiplier. In this case, the constraint q(k, l) = Q will look like the curve in Figure 7.6(a) for k, l ≥ 0 and so we are in the situation described above. Indeed, here the cost function is C(k, l) = 16k + l,
7
and, writing the constraint in the form q(k, l) − Q = 0, we get the Lagrangean L(k, l, λ) = 16k + l − λ(q(k, l) − Q). We seek the points which simultaneously satisfy the equations Lk (k, l, λ) = 0, Ll (k, l, λ) = 0 and Lλ (k, l, λ) = 0 so we find the first-order partial derivatives of L(k, l, λ), i.e. 10 − 3 1 k 4l4 , Lk (k, l, λ) = 16 − λ 4 10 1 − 3 Ll (k, l, λ) = 1 − λ k 4 l 4 and 4 1 1 4 4 Lλ (k, l, λ) = − 10k l − Q , and set these equal to zero to yield the equations 3 1 5 16 − λk − 4 l 4 = 0, 2
5 1 3 1 − λk 4 l− 4 = 0 2
and
1
1
10k 4 l 4 − Q = 0.
We now solve these by eliminating λ from the first two equations, i.e. we get 3 1 5 l4 2 k4 16 − λ 3 = 0 =⇒ λ = 16 , 2 k4 5 l 14 from the first equation, and 1
5 k4 1− λ 3 =0 2 l4
288
=⇒
3 2 l4 λ= , 5 k 14
7.3. Learning outcomes
from the second equation. As such, we can equate these expressions for λ to get 3 3 2 k4 2 l4 16 1 = 5 l4 5 k 14
=⇒
16k = l.
We then use this new relationship between k and l in the third equation, which is just the constraint 10k 1/4 l1/4 = Q, to get 1
1
Q = 10k 4 (16k) 4
=⇒
1
Q = 20k 2
=⇒
1
k2 =
Q 20
=⇒
k=
Q2 , 400
and then, using this in the equation k = 16l, we get 2 Q Q2 l = 16 = . 400 25 Thus, these values of k and l minimise the cost of producing Q units. The minimum cost is then given by 2 2 2 2 Q Q Q Q 2Q2 ˆ C(Q) =C , = 16 + = , 400 25 400 25 25 and so, we have
4Q . Cˆ 0 (Q) = 25 Of course, we can also find the value of λ using, say, the equation 3 2 l4 λ= 5 k 41
=⇒
7
3 2 (Q2 /25) 4 4Q λ= , 1 = 5 (Q2 /400) 4 25
which verifies that Cˆ 0 (Q) = λ.
Learning outcomes At the end of this chapter and having completed the relevant reading and activities, you should be able to: find and classify the stationary points of a function of two variables; solve problems from economics-based subjects that involve unconstrained optimisation; optimise a function in the presence of constraints; solve problems from economics-based subjects that involve constrained optimisation.
289
7. Two-variable optimisation
Solutions to activities Solution to activity 7.1 The first-order partial derivatives of the function are fx (x, y) = 2x − 4
and
fy (x, y) = 2y + 4.
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous equations 2x − 4 = 0
and
2y + 4 = 0.
But, clearly, the first of these equations gives x = 2 and the second gives y = −2. Thus, (2, −2) is the only stationary point of f (x, y). Solution to activity 7.2 The first-order partial derivatives of the function are fx (x, y) = 9x2 + 18x − 72
7
and
fy (x, y) = 6y 2 − 24y − 126.
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous equations 9x2 + 18x − 72 = 0
and
6y 2 − 24y − 126 = 0.
Now, notice that the first equation contains no ‘y’s and the second equation contains no ‘x’s. As such, the first equation tells us everything there is to know about x, i.e. 9x2 + 18x − 72 = 0 =⇒ x2 + 2x − 8 = 0 =⇒ (x + 4)(x − 2) = 0 =⇒ x = −4 or x = 2, whereas the second equation tells us everything we need to know about y, i.e. 6y 2 − 24y − 126 = 0 =⇒ y 2 − 4y − 21 = 0 =⇒ (y + 3)(y − 7) = 0 =⇒ y = −3 or y = 7. As such, since we can take any of the x values with any of the y values we can see that this function has four stationary points, namely (−4, −3), (−4, 7), (2, −3) and (2, 7). Solution to activity 7.3 Using the first-order partial derivatives we found in Activity 7.1, we find that the second-order partial derivatives are fxx (x, y) = 2,
fxy (x, y) = 0 = fyx (x, y)
and
fyy (x, y) = 2.
As these are constants, they take these values at the stationary point (and, indeed, at all other points). Thus, we can see that the Hessian at the stationary point is given by H(2, −2) = (2)(2) − (0)2 = 4 > 0 so this is a local minimum.
290
and
fxx (2, −2) = 2 > 0,
7.3. Solutions to activities
Solution to activity 7.4 Using the first-order partial derivatives we found in Activity 7.2, we find that the second-order partial derivatives are fxx (x, y) = 18x + 18,
fxy (x, y) = 0 = fyx (x, y)
and
fyy (x, y) = 12y − 24,
and, as such, the Hessian is given by H(x, y) = (18x + 18)(12y − 24) − 02 = 216(x + 1)(y − 2). Evaluating this at each of the stationary points we find that: At (−4, −3), the Hessian is H(−4, −3) = 216(−3)(−5) > 0
fxx (−4, −3) = 18(−4) + 18 < 0,
and
so this is a local maximum. At (−4, 7), the Hessian is H(−4, 7) = 216(−3)(+5) < 0, and so this is a saddle point.
7
At (2, −3), the Hessian is H(2, −3) = 216(+3)(−5) < 0, and so this is a saddle point. At (2, 7), the Hessian is H(2, 7) = 216(+3)(+5) > 0
and
fxx (2, 7) = 18(2) + 18 > 0,
so this is a local minimum. Thus, the stationary point (−4, −3) is a local maximum, (−4, 7) and (2, −3) are saddle points and (2, 7) is a local minimum. Solution to activity 7.5 The first-order partial derivatives of this function are fx (x, y) = 4(x − 1)3
and
fy (x, y) = 4(y − 1)3 .
So, clearly, the only stationary point is at (1, 1) as this is the only point that makes fx (x, y) = 0 and fy (x, y) = 0. The second-order partial derivatives of this function are given by fxx (x, y) = 12(x − 1)2 ,
fxy (x, y) = 0 = fyx (x, y)
and
fyy (x, y) = 12(y − 1)2 ,
and, as such, the Hessian is given by H(x, y) = [12(x − 1)2 ][12(y − 1)2 ] − 02 = 144(x − 1)2 (y − 1)2 .
291
7. Two-variable optimisation
Indeed, evaluating this as the stationary point gives H(1, 1) = 0 and so the method we used above fails. However, if we consider the surface z = f (x, y), notice that we have z = f (1, 1) = 0 at the stationary point and for all other x, y ∈ R, we have z = f (x, y) = (x − 1)4 + (y − 1)4 > 0, i.e. f (x, y) ≥ f (1, 1) for all x, y ∈ R. Consequently, it should be clear that this function has a local minimum at (1, 1) and this minimum value is zero.11 Solution to activity 7.6 Suppose that we have a function f (x, y) that is concave. As we saw in Section 6.4.1, at any point (a, b), the tangent plane to this function has a Cartesian equation given by df x−a , z = f (a, b) + dx (a,b) y − b
7
and, as this function is concave, it must be the case that for all (x, y) ∈ R2 , the function lies below this tangent plane, i.e. we must have df x−a . f (x, y) ≤ f (a, b) + dx (a,b) y − b
However, using the second-order Taylor series for f (x, y) around the point (a, b), this means that we have d2 f df 1 df x−a x−a x−a x − a, y − b f (a, b)+ + ≤ f (a, b)+ , dx (a,b) y − b 2! dx 2 (a,b) y − b dx (a,b) y − b
which simplifies to give us
d2 f x − a x − a, y − b ≤ 0, dx 2 (a,b) y − b
and this just asserts that K(x, y) ≤ 0 using our notation from Section 7.2.2. However, using what we saw before, this means that we require H(x, y) ≥ 0
and
fxx (x, y) ≤ 0,
and this is, therefore, our condition for concavity.12 Solution to activity 7.7 We have found that the profit function is given by 1 2 2 − 15 + 17x + 28y − 5x − 5y + xy , π(x, y) = 3 11
Actually, this is not only a local minimum, it is a global minimum as this is truly the smallest value the function can take for x, y ∈ R. 12 Again, we have glossed over any complications in our derivation that would occur if fxx (x, y) = 0 for some point, (x, y).
292
7.3. Solutions to activities
and, to maximise this, we need to find its stationary points and determine which of them gives us a maximum. So, we start by finding the first-order partial derivatives of π(x, y), i.e. 1 1 and πy (x, y) = 28 − 10y + x . πx (x, y) = 17 − 10x + y 3 3 At a stationary point, both of these first-order partial derivatives are zero, i.e. we must have πx (x, y) = 0 and πy (x, y) = 0. Thus, to find the stationary points, we have to solve the simultaneous equations 10x − y = 17
and
x − 10y = −28.
We start by noticing that the first equation gives us y = 10x − 17 and so, substituting this into the second equation, we get x − 10 10x − 17 = −28 =⇒ −99x = −198 =⇒ x = 2, and then, using y = 10x − 17 again, we get y = 3. Thus, the profit function, π(x, y), has (2, 3) as its only stationary point. To classify this stationary point, we look at the second-order partial derivatives of π(x, y), which are πxx (x, y) = −
10 , 3
πxy (x, y) =
1 = πyx (x, y) 3
and
πyy (x, y) = −
10 , 3
7
and, as such, the Hessian is given by 2 10 10 1 100 1 H(x, y) = − − = 11. − − = 3 3 3 9 9 Clearly, at (2, 3), we have H(2, 3) > 0 and fxx (2, 3) < 0, which means that the stationary point we have found is indeed a local maximum. Consequently, to maximise its profit, the firm should produce 2 tonnes of X and 3 tonnes of Y so that it can sell them at prices, in pounds, of pX =
17 − 2(2) − 3 10 = ' 3.33 3 3
and
pY =
28 − 2 − 2(3) 20 = ' 6.67, 3 3
respectively and, in doing so, the firm will make a maximum profit of 1 44 2 2 π(2, 3) = − 15 + 17(2) + 28(3) − 5(2) − 5(3) + (2)(3) = ' 14.67, 3 3 pounds. Solution to activity 7.8 Of course, this should have been obvious either by noting that f (x, y) = (x − 1)2 + (y − 1)2 ≥ 0, for all points (x, y) ∈ R2 with a minimum of zero at (1, 1);
293
7. Two-variable optimisation
or by observing that as H(x, y) = 4 > 0 and fxx (x, y) = 2 > 0 for all points (x, y) ∈ R2 , we see that this function is convex and so the stationary point (1, 1) we found above is a global minimum. Then, using either of these facts, we see that we have found the minimum of f (x, y) for all (x, y) ∈ R2 and so it must be the minimum in the given region too since it is in that region. Solution to activity 7.9 Given the prices in Example 7.16 and the consumer’s budget of £4, we see that the budget set is given by 2x1 + x2 ≤ 4, where x1 , x2 ≥ 0 as they are quantities. This is sketched in Figure 7.7(a).
We are now asked to sketch some contours u(x1 , x2 ) = c where c is a constant and u(x1 , x2 ) = 3x1 + x2 ,
7
for this consumer. Indeed, looking at the budget set, it makes sense to choose the contours where c = 4 and c = 6 and these are illustrated in Figure 7.7(b). This allows us to see the direction of increasing utility, which is indicated in the figure, and allows us to see that the point (2, 0) is the one where we get the highest utility if we are constrained to stay within the budget set. Consequently, this consumer should buy two cats and no dogs if she wants to maximise her utility subject to her budget constraint. x2
x2
6 4
4 1
c=
2x + 4
4
= 2
6
c=
x2 O
of on tility i t u ec d i r si n g a re in c
x1
O
4 3
(a)
2
x1
(b)
Figure 7.7: The sketches for Activity 7.9. (a) The budget set for our consumer. (b) Adding
two contours, u(x1 , x2 ) = c, where c = 4 and c = 6. The direction in which u(x1 , x2 ) is increasing is as indicated and we are interested in the point which is indicated in the figure.
Exercises Exercise 7.1 The function f (x, y) = x2 ln y − y ln y,
is defined for y > 0 and all x ∈ R. Find its stationary points and classify them.
294
7.3. Exercises
Exercise 7.2 Consider the function f (x, y) = xα+1 y β−1 , for x, y > 0 and some constants α and β. For what values of α and β is this function convex? Sketch the region(s) in the (α, β)-plane that correspond to these values of α and β. Exercise 7.3 Suppose that a firm can sell its product in a domestic and a foreign market and that the inverse demand functions for these two markets are p1 = 30 − 4q1
and
p2 = 50 − 5q2 ,
where p1 and p2 are the prices (in pounds) if they sell quantities q1 and q2 (in tonnes) in the domestic and foreign markets respectively. Given that the total cost function of the firm (in pounds) is TC(q) = 10 + 10q, where q is the quantity produced (in tonnes) and that the firm has a monopoly in both markets, find the quantities it should sell in these markets if they want to maximise their profit. What are the corresponding prices? What is the maximum profit?
7
Exercise 7.4 Use the method of Lagrange multipliers to optimise the function f (x, y) = x3/8 y 2/3 , subject to the constraint x2 + y 2 = 25 where x, y > 0. By sketching the constraint and some contours of f , justify your use of the method of Lagrange multipliers and determine whether the point you have found maximises or minimises f subject to the constraint. Exercise 7.5 Given an amount of capital, k, and labour, l, a firm produces a quantity of goods, q(k, l), where q(k, l) = ln k + ln l, for k, l > 0. Suppose that each unit of capital costs £2 and each unit of labour costs £3. Use the method of Lagrange multipliers to find the values of k and l that maximise the firm’s production given that their total budget for capital and labour is £M . Hence show that the maximum production the firm can achieve given a budget of £M is given by M Q(M ) = 2 ln √ , 2 6 and verify that Q0 (M ) = λ where λ is the Lagrange multiplier.
295
7. Two-variable optimisation
Solutions to exercises Solution to exercise 7.1 Given that f (x, y) = x2 ln y − y ln y, for y > 0 and all x ∈ R, we see that the first-order partial derivatives of this function are fx (x, y) = 2x ln y
and
fy (x, y) =
x2 − (ln y + 1), y
where we have used the product rule when finding fy (x, y). At a stationary point, both of the first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous equations x2 2x ln y = 0 and − ln y − 1 = 0. y If we start by looking at the first equation, this gives us x ln y = 0
7
=⇒
x = 0 or ln y = 0
=⇒
x = 0 or y = 1.
And so, to satisfy the second equation with: x = 0 we must have 0 − ln y − 1 = 0
=⇒
ln y = −1
=⇒
y = e−1 ,
i.e. (0, e−1 ) is a stationary point. y = 1 we must have x2 − ln 1 − 1 = 0 1
=⇒
x2 = 1
=⇒
x = ±1,
i.e. (1, 1) and (−1, 1) are stationary points. Consequently, the points (0, e−1 ), (1, 1) and (−1, 1) are stationary points of this function. To classify these stationary points, we note that the second-order partial derivatives are fxx (x, y) = 2 ln y,
fxy (x, y) =
2x x2 1 = fyx (x, y) and fyy (x, y) = − 2 − , y y y
and, as such, the Hessian is given by 2 2 x 1 2x 2(x2 + y) ln y + 4x2 H(x, y) = (2 ln y) − 2 − − =− . y y y y2 Evaluating this at each of the stationary points we then find that:
296
7.3. Solutions to exercises
At (0, e−1 ), the Hessian is H(0, e−1 ) = −
2 e−1 ln(e−1 ) = 2 e > 0 and fxx (0, e−1 ) = 2 ln(e−1 ) = −2 < 0, −2 e
as ln(e−1 ) = −1 and so this is a local maximum. At (1, 1), the Hessian is
4 H(1, 1) = − < 0, 1 as ln 1 = 0 and so this is a saddle point.
At (−1, 1), the Hessian is
4 H(−1, 1) = − < 0, 1 as ln 1 = 0 and so this is a saddle point.
Thus, the stationary points (0, e−1 ), (1, 1) and (−1, 1) are a local maximum and two saddle points respectively. Solution to exercise 7.2 We have, for x, y > 0, the function f (x, y) = xα+1 y β−1 whose first-order partial derivatives are fx (x, y) = (α + 1)xα y β−1
7
fy (x, y) = (β − 1)xα+1 y β−2 ,
and
and so its second-order partial derivatives are fxx (x, y) = α(α + 1)xα−1 y β−1 , fxy (x, y) = (α + 1)(β − 1)xα y β−2 = fyx (x, y),
and
fyy (x, y) = (β − 1)(β − 2)xα+1 y β−3 .
The Hessian for this function can then be written as H(x, y) = α(α + 1)(β − 1)(β − 2) − (α + 1)2 (β − 1)2 x2α y 2(β−2) ,
and, for f (x, y) to be convex, we need H(x, y) ≥ 0 and fxx (x, y) ≥ 0, i.e. we need α(α + 1)xα−1 y β−1 ≥ 0
=⇒
α(α + 1) ≥ 0,
(?)
as x, y > 0, and α(α + 1)(β − 1)(β − 2) − (α + 1)2 (β − 1)2 x2α y 2(β−2) ≥ 0, which gives
(α+1)(β −1) [α(β − 2) − (α + 1)(β − 1)] ≥ 0
=⇒
(α+1)(β −1)(1−α−β) ≥ 0,
(†)
as x, y > 0. So, in order to satisfy both of these inequalities, we have either α ≥ −1: so that α ≥ 0 from (?) which means that we have α ≥ 0 and, from (†),
297
7. Two-variable optimisation
• β ≥ 1 and α + β ≤ 1 (but we can’t have α ≥ 0, β ≥ 1 and α + β ≤ 1!), or
• β ≤ 1 and α + β ≥ 1 (see region A in Figure 7.8 );
or α ≤ −1: so that α ≤ 0 from (?) which means that we have α ≤ −1 and, from (†), • β ≥ 1 and α + β ≥ 1 (see region B in Figure 7.8), or • β ≤ 1 and α + β ≤ 1 (see region C in Figure 7.8).
A sketch of the corresponding regions in the (α, β)-plane is illustrated in Figure 7.8.
β
α
B
+ β = 1
2 1
−1
7
O
1
A
α
C
Figure 7.8: The sketch for Exercise 7.2.
Solution to exercise 7.3 Here the firm is a monopoly and so, as it is the sole supplier of its product in both markets, when it supplies quantities q1 and q2 to the domestic and foreign markets respectively, the prices will be given by the inverse demand functions p1 = 30 − 4q1
and
p2 = 50 − 5q2 ,
respectively.13 This means that their total revenue is given by TR(q1 , q2 ) = p1 q1 + p2 q2 = (30 − 4q1 )q1 + (50 − 5q2 )q2 , and their total costs are given by TC(q) = 10 + 10q
=⇒
TC(q1 , q2 ) = 10 + 10(q1 + q2 ),
as q = q1 + q2 is the quantity being produced. As such, their profit function is π(q1 , q2 ) = TR(q1 , q2 ) − TC(q1 , q2 ) = 20q1 + 40q2 − 4q12 − 5q22 − 10, 13
Note that the situation described here, where a producer charges different prices in different markets, is sometimes known as price discrimination.
298
7.3. Solutions to exercises
and we need to find the values of q1 and q2 that maximise this. To do this, we see that the first-order partial derivatives of π(q1 , q2 ) are πq1 (q1 , q2 ) = 20 − 8q1
and
πq2 (q1 , q2 ) = 40 − 10q2 ,
and so, as a stationary point occurs when πq1 (q1 , q2 ) = 0 and πq2 (q1 , q2 ) = 0, we need to solve the simultaneous equations 20 − 8q1 = 0
and
40 − 10q2 = 0.
But, of course, the first equation gives q1 = 5/2 and the second equation gives q2 = 4 which means that (5/2, 4) is the only stationary point of π(q1 , q2 ). To check that this is a maximum, we look at the second-order partial derivatives of π(q1 , q2 ), which are πq1 q1 (q1 , q2 ) = −8,
πq1 q2 (q1 , q2 ) = 0 = πq2 q1 (q1 , q2 )
and
πq2 q2 (q1 , q2 ) = −10,
and, as such the Hessian is given by H(x, y) = (−8)(−10) − 02 = 80. Clearly, at (5/2, 4), we have H(5/2, 4) > 0 and πq1 q1 (5/2, 4) < 0 which means that the stationary point we have found is indeed a local maximum. Consequently, to maximise its profit, the firm should supply 5/2 tonnes of its product to the domestic market and 4 tonnes of its product to the foreign market so that it can sell them at prices, in pounds, of 5 = 20 and p2 = 50 − 5(4) = 30, p1 = 30 − 4 2
respectively and, in doing so, the firm will make a maximum profit of 2 5 5 + 40(4) − 4 − 5(4)2 − 10 = 95, π(5/2, 4) = 20 2 2 pounds. Solution to exercise 7.4
Writing the constraint in the form x2 + y 2 − 25 = 0, we get the Lagrangean L(x, y, λ) = x3/8 y 2/3 − λ(x2 + y 2 − 25), and we seek the points which simultaneously satisfy the equations Lx (x, y, λ) = 0, Ly (x, y, λ) = 0 and Lλ (x, y, λ) = 0. So we find the first-order partial derivatives of L(x, y, λ), i.e. 3 Lx (x, y, λ) = x−5/8 y 2/3 − 2xλ, 8 2 Ly (x, y, λ) = x3/8 y −1/3 − 2yλ and 3 Lλ (x, y, λ) = −(x2 + y 2 − 25),
299
7
7. Two-variable optimisation
and set these equal to zero to yield the equations 2 3/8 −1/3 x y − 2yλ = 0 3
3 −5/8 2/3 x y − 2xλ = 0, 8
x2 + y 2 − 25 = 0.
and
We now solve these by eliminating λ from the first two equations, i.e. we get 2/3 3 3 −5/8 2/3 y x y − 2xλ = 0 =⇒ λ= , 8 16 x13/8 from the first equation, and 2 3/8 −1/3 x y − 2yλ = 0 3
=⇒
1 λ= 3
x3/8 y 4/3
,
from the second equation. As such, we can equate these expressions for λ to get 2/3 3 y 1 x3/8 16 = =⇒ y 2 = x2 . 13/8 4/3 16 x 3 y 9 We then use this new relationship between x and y in the third equation, which is just the constraint x2 + y 2 = 25, to get
7
x2 +
16 2 x = 25 9
=⇒
25 2 x = 25 9
=⇒
x2 = 9
=⇒
x = 3,
as x > 0. Then, using this in the equation y 2 = 16x2 /9, we get y2 =
16 2 (3 ) = 16 9
=⇒
y = 4,
as y > 0. Thus, x = 3 and y = 4 will optimise f (x, y) subject to the constraint. The constraint is x2 + y 2 = 15 and this is a circle of radius five centred on the origin which, for x, y > 0, is illustrated in Figure 7.9(a). The objective function, f (x, y) = x3/8 y 2/3 has contours f (x, y) = c, where c is a constant, that look a bit like rectangular hyperbolae as illustrated in Figure 7.9(b). The direction in which f (x, y) is increasing is indicated in this figure along with the point we found above using the Lagrange multiplier method — i.e. a point where we have a contour of f (x, y) which is both tangential to the constraint and touching the constraint. Having seen this, it should be clear that this point will maximise f subject to the constraint. Solution to exercise 7.5 The firm has £M to spend on capital and labour where each unit of capital costs £2 and each unit of labour costs £3. As such, the cost of using k units of capital and l units of labour is 2k + 3l and this gives us the constraint 2k + 3l = M .14 So, to maximise the quantity q(k, l) = ln k + ln l, 14
Strictly, the constraint is 2k + 3l ≤ M where k, l > 0, but we can see that if we chose a point where 2k + 3l < M , we could not maximise the quantity produced since, spending more on capital and labour to get a point where 2k + 3l = M , we would get a larger quantity. This should make sense if you consider the discussion of budget constraints in Section 7.3.4.
300
y
y
5
5 4
O
x
5
i n d i re cr e a c ti o si n n g of f( x, y)
7.3. Solutions to exercises
O
3
(a)
x
5 (b)
Figure 7.9: The sketches for Exercise 7.4. (a) The constraint x2 + y 2 = 25 for x, y > 0. (b)
Adding three contours, f (x, y) = c, where the direction in which f (x, y) is increasing is as indicated. Clearly, we are interested in the point (3, 4) which is indicated in the figure. that the firm can produce subject to the constraint 2k + 3l = M where k, l > 0 we use the Lagrangean L(k, l, λ) = ln(k) + ln(l) − λ(2k + 3l − M ). We seek the points which simultaneously satisfy the equations Lk (k, l, λ) = 0, Ll (k, l, λ) = 0 and Lλ (k, l, λ) = 0. The first-order derivatives of L(k, l, λ) are Lk (k, l, λ) =
1 − 2λ, k
Ll (k, l, λ) =
1 − 3λ l
and
Lλ (k, l, λ) = −(2k + 3l − M ),
and we set these equal to zero to yield the equations 1 − 2λ = 0, k
1 − 3λ = 0 l
and
2k + 3l − M = 0.
We now solve these by eliminating λ from the first two equations, i.e. we get λ=
1 1 = 2k 3l
=⇒
3l = 2k
=⇒
3 k = l. 2
We then use this new relationship between k and l in the third equation, which is just the constraint 2k + 3l = M , to get 3 M 2 l + 3l = M =⇒ 6l = M =⇒ l= , 2 6 and then, using this in the equation k = 3l/2, we get k=
3 M M × = . 2 6 4
Thus the values of k and l that maximise q(k, l) subject to the constraint are k = M/4 and l = M/6. In this case, the maximum production achievable, given a budget of £M , is 2 M M M M M M Q(M ) = q , = ln + ln = ln = 2 ln √ , 4 6 4 6 24 2 6
301
7
7. Two-variable optimisation
as required. Further, we can find the value of λ using, say, the equation 1 4 2 1 =⇒ λ= = , λ= 2k 2 M M and we can see that Q(M ) = 2 ln can be written as √ Q(M ) = 2 ln M − 2 ln 2 6
M √ 2 6
=⇒
,
Q0 (M ) =
2 , M
which verifies that Q0 (M ) = λ.
Note: Although this question is similar to what we saw in Example 7.17, notice that here we are maximising production subject to a budget constraint whereas in Example 7.17 we were minimising costs subject to a production constraint. In particular, this means that you should always read the question carefully to ensure that you are using the correct objective function and constraint! Further, we were not asked to justify the assertion that the optimal point we found was a maximum here and so we haven’t, but sometimes, as in Exercise 7.4, we will be asked to provide such a justification.
7
302
Chapter 8 Differential equations Essential reading (For full publication details, see Chapter 1.) Binmore and Davies (2002) Sections 12.1–12.4 and 12.7–12.8. Anthony and Biggs (1996) Chapters 27 and 28. Further reading Simon and Blume (1994) Sections 24.1–24.3 and Section 25.3. Adams and Essex (2010) Sections 3.7 and 7.9, parts of Sections 17.1–17.2, 17.4–17.6. Aims and objectives
8
The objectives of this chapter are as follows. To see different types of differential equation and solve them using the given methods. To use differential equations to solve problems from economics-based subjects. Specific learning outcomes can be found near the end of this chapter.
8.1
Introduction: What is a differential equation?
A differential equation is an equation which contains at least one derivative of an unknown function. In this course, we will be concerned with ordinary differential equations (or ODEs), i.e. those which involve functions of only one independent variable.1 It is often convenient to classify ODEs according to how the highest order derivative it contains appears in it. That is, we say that the order of an ODE is given by the order of the highest-order derivative it contains. 1
If the differential equation involves a function with more than one independent variable, then it would contain at least one partial derivative of the function and we would have a partial differential equation.
303
8. Differential equations
degree of an ODE is given by the algebraic degree of the highest-order derivative it contains. On the whole, we will be concerned with ODEs which are first or second-order and of the first degree. Activity 8.1 Determine the order and degree of the following ODEs involving the unknown function, y(x). (a)
(b)
(c)
2
d2 y . dx2 2 2 dy dy =x . dx dx2 2 2 d3 y dy . =x 3 dx dx2 dy dx
=x
Given an ODE, we usually want to solve it. That is, we want to find the unknown function in a form which does not involve any derivatives, and when we have found the function in this form we call it a solution to the ODE. In general, we will find that any given ODE has many solutions and so we get a general solution, i.e. we find the unknown function up to some arbitrary constants that are not determined by the ODE itself. Let’s look at a very simple example of an ODE (i.e. one that can be solved by direct integration) to see how things work.
8 Example 8.1
Solve the ODE
dy = 2x + 1. dx
This is a first-order ODE of degree one and it is very easy to solve because we can just integrate both sides to see that Z Z dy dx = (2x + 1) dx =⇒ y = x2 + x + c, dx where c is an arbitrary constant. As the independent variable is x and the dependent variable is y, the unknown function here is y(x), this gives us y(x) = x2 + x + c. We call this the general solution to the ODE as any solution to the ODE will have this form and each of these solutions arises from a different value of the arbitrary constant, c. In addition to an ODE, we may also be given conditions which give us extra information about the function we are interested in. Given this information, we can find a particular solution, i.e. a solution to the ODE that also satisfies the given conditions.
304
8.1. Introduction: What is a differential equation?
Example 8.2 Find the solution to the ODE in Example 8.1 that also satisfies the condition y(0) = 1. We know that all solutions to the ODE in the previous example have the form y(x) = x2 + x + c. If, in addition, we want a solution that satisfies the condition y(0) = 1, we can set x = 0 in both sides of this expression and use the condition to get y(0) = 02 + 0 + c
=⇒
1 = c.
That is, if we want to satisfy the condition y(0) = 1 as well, we must take c = 1 in the general solution. Consequently, y(x) = x2 + x + 1, is the particular solution to the ODE given that y(0) = 1. Of course, it should be clear from this example that, when we apply different conditions to the general solution, we can get different values of c and hence different particular solutions. Activity 8.2 Find the particular solutions to the ODE in Example 8.1 that also satisfy the conditions (a) y(0) = 0, (b) y(0) = −1 and (c) y(2) = 7. Indeed, we solved simple ODEs that looked like this when we considered marginal functions in Section 5.4.1. Further, as the following example shows, we can also solve simple higher-order ODEs by direct integration. Example 8.3
Solve the ODE
d2 y = 6x + 2. dx2
This is a second-order ODE of degree one and, once again, we can begin to solve it by integrating both sides to see that Z 2 Z dy dy dx = (6x + 2) dx =⇒ = 3x2 + 2x + c, 2 dx dx but this does not give us a solution as we still have a derivative in our expression. However, if we integrate both sides again, we get Z Z dy dx = (3x2 + 2x + c) dx =⇒ y = x3 + x2 + cx + d, dx where d is another arbitrary constant. As the independent variable is x and the dependent variable is y, the unknown function here is y(x), this gives us y(x) = x3 + x2 + cx + d. This is the general solution to the ODE as any solution to the ODE will have this form and each of these solutions arises from different values of the arbitrary constants, c and d.
305
8
8. Differential equations
Of course, if we find that there are several arbitrary constants in the general solution of an ODE, such as c and d in the general solution to the second-order ODE in Example 8.3, we will need more conditions in order to determine these constants and hence find a particular solution.
Example 8.4 Find the solution to the ODE in Example 8.3 that also satisfies the conditions y(0) = 1 and y 0 (0) = 2. We know that all solutions to the ODE in the previous example have the form y(x) = x3 + x2 + cx + d. If, in addition, we want a solution that satisfies the condition y(0) = 1, we can set x = 0 in both sides of this expression and use the condition to get y(0) = 03 + 02 + c(0) + d
=⇒
1 = d.
We also know that y 0 (x) = 3x2 + 2x + c, and so, if we want a solution that satisfies the condition y 0 (0) = 2, we can set x = 0 in both sides of this expression and use the condition to get y 0 (0) = 3(02 ) + 2(0) + c
=⇒
2 = c.
Thus, we see that
8
y(x) = x3 + x2 + 2x + 1, is the particular solution to the ODE given that y(0) = 1 and y 0 (0) = 2.
More generally, we won’t be able to solve ODEs by direct integration and so the procedure for solving an ODE will usually involve identifying its type and applying the relevant method. In what follows, we shall see how the form of an ODE allows us to choose the method that will enable us to solve it in cases where direct integration can’t be used.
8.2
First-order ODEs
In this section we will consider some methods that will allow us to solve certain first-order ODEs of degree one. That is, certain ODEs that have the form dy = f (x, y), dx where f (x, y) is some given function of the independent variable, x, and the dependent variable, y.
306
8.2. First-order ODEs
8.2.1
Separable first-order ODEs
A first-order ODE of degree one that can be written in the form M (x) = N (y)
dy , dx
is called a separable ODE. This is because, in such cases, we have been able to ‘separate’ the variables so that all occurrences of x occur on the left-hand-side and all occurrences of y occur on the right-hand-side. ODEs of this type can be solved by integrating both sides to get Z Z Z Z dy M (x) dx = N (y) dy, M (x) dx = N (y) dx =⇒ dx using the integration by substitution formula from Section 5.2.3. If we now determine these integrals, we will find the general solution to the ODE. Example 8.5
Find the general solution to the ODE
dy = 2x(y − 1). dx
This ODE is separable as it can be written as 2x =
1 dy , y − 1 dx
with M (x) = 2x and N (y) = (y − 1)−1 . Using the method described above, we write this as Z Z dy and determine the integrals to get x2 + c = ln |y − 1|, 2x dx = y−1 where c is an arbitrary constant. Taking exponentials of both sides, this gives us |y − 1| = ex
2 +c
2
= ec ex .
Now, both sides of this expression are non-negative because of the modulus on the left-hand-side and the exponentials on the right-hand-side. This means that, if we want to remove the modulus, we must allow the possibility that the right-hand-side can give us a negative quantity, i.e. we have y − 1 = ± ec ex
2
=⇒
2
y = 1 ± ec ex .
Then, as the independent variable is x and the dependent variable is y, the unknown function here is y(x), so this gives us the general solution 2
y(x) = 1 + A ex , where A ∈ R is an arbitrary constant.2 Of course, having found the general solution to the ODE in this example, we can also find particular solutions if we are given some conditions. 2
Here we have replaced ± ec with a new constant A ∈ R which can take any value.
307
8
8. Differential equations
Activity 8.3 Find the particular solutions to the ODE in Example 8.5 given the conditions (a) y(0) = 2 and (b) y(0) = 0. What value of y(1) will give the same particular solution as the one you found in (a)?
8.2.2
Linear first-order ODEs
A first-order ODE of degree one that can be written in the form dy + P (x)y = Q(x), dx is called a linear ODE. The procedure for solving such an ODE involves finding an integrating factor, µ(x), given by µ(x) = e
R
P (x) dx
,
R
where, here, P (x) dx is just any antiderivative of P (x). Once we have this, we multiply both sides of the ODE by the integrating factor to get µ(x)
dy + µ(x)P (x)y = µ(x)Q(x). dx
(8.1)
Now, observe that dµ d = dx dx
8
e
R
P (x) dx
=
e
R
P (x) dx
P (x) = µ(x)P (x),
if we use the chain rule3 and so, using the product rule, we have d dy dµ dy µ(x)y(x) = µ(x) + y(x) = µ(x) + µ(x)P (x)y(x), dx dx dx dx which is the left-hand-side of (8.1). As such, we can write (8.1) as Z d µ(x)y(x) = µ(x)Q(x) =⇒ µ(x)y(x) = µ(x)Q(x) dx, dx and if we determine the integral on the right-hand-side, we can then find the general solution to the ODE. 3
When using the chain rule here, we should find that Z d P (x) dx = P (x). dx
To see why, note that if c is an arbitrary constant and F (x) is an antiderivative of P (x), i.e. F 0 (x) = P (x), we have Z Z d d P (x) dx = F (x) + c =⇒ P (x) dx = F (x) + c = F 0 (x) = P (x), dx dx as expected.
308
8.2. First-order ODEs
Example 8.6
Find the general solution of the ODE x
dy − 2y = 6. dx
This ODE is linear as it can be written as dy 2 6 − y= , dx x x with P (x) = −2/x and Q(x) = 6/x. Using the method above, we start by finding the integrating factor, µ(x), by determining the integral Z Z 2 P (x) dx = − dx = −2 ln |x| + c, x and so we see that −2 ln x is an antiderivative of −2/x. This means that the integrating factor is −2 µ(x) = e−2 ln x = eln x = x−2 , and so we have Z Z −2 µ(x)y(x) = µ(x)Q(x) dx =⇒ x y(x) = 6x−3 dx =⇒ x−2 y(x) = −3x−2 + c, where c is an arbitrary constant. As such, we find that y(x) = −3 + cx2 , is the general solution to our linear ODE. Activity 8.4
8
Observe that the ODE in Example 8.6 can also be written as 2 1 dy = . x y + 3 dx
Verify that the answer we found in that example is correct by solving this separable ODE using the method in Section 8.2.1. Let’s now consider another example where the ODE is linear, but not separable.
Example 8.7
Find the general solution of the ODE
dy = y + ex . dx
This ODE is linear as it can be written as dy − y = ex , dx which is linear with P (x) = −1 and Q(x) = ex . Using the method above, we start by finding the integrating factor, µ(x), by determining the integral Z Z P (x) dx = −1 dx = −x + c,
309
8. Differential equations
and so we see that −x is an antiderivative of −1. This means that the integrating factor is µ(x) = e−x , and so we have Z µ(x)y(x) = µ(x)Q(x) dx
=⇒
e
−x
y(x) =
Z
dx
=⇒
e−x y(x) = x + c,
where c is an arbitrary constant. As such, we find that y(x) = (x + c) ex , is the general solution to our linear ODE. Activity 8.5
8.2.3
Verify that the ODE in the previous example is not separable.
Homogeneous first-order ODEs
As we saw in Section 6.3.4, a function f (x, y) is homogeneous of degree r if f (λx, λy) = λr f (x, y). Using this, we say that a first-order ODE of the form M (x, y) + N (x, y)
8
dy = 0, dx
is homogeneous of degree n if the functions M and N are both homogeneous of degree n. The procedure for solving such an ODE involves making the substitution y = xv(x) to ‘separate’ the variables v and x so that we can solve it using the method in Section 8.2.1. Example 8.8
Find the general solution of the ODE xy + y 2 − xy
dy = 0. dx
Here we have functions M and N where M (x, y) = xy + y 2
and N (x, y) = −xy,
and, clearly, they are both homogeneous of degree 2. As such, we introduce a new function, v(x), such that y(x) = xv(x)
=⇒
dy dv = v(x) + x , dx dx
if we use the product rule. Using this, our ODE becomes dv 2 2 2 x v+v −x v v+x = 0. dx
310
8.2. First-order ODEs
Cancelling common factors and simplifying this then becomes the separable ODE dv 1 = , dx x which we solve using the method in Section 8.2.1, i.e. Z Z dx dv = =⇒ v(x) = ln |x| + c, x where c is an arbitrary constant. Consequently, using y(x) = xv(x), we find that y(x) = x(ln |x| + c), is the general solution to our homogeneous ODE. Activity 8.6
Observe that the ODE in Example 8.8 can also be written as dy y − = 1. dx x
Verify that the answer we found in that example is correct by solving this linear first-order ODE using the method in Section 8.2.2. Let’s now consider another example where the ODE is homogeneous, but not linear. Example 8.9
Find the general solution of the ODE x4 + 5y 4 − 4xy 3
dy = 0. dx
8
Here we have functions M and N where M (x, y) = x4 + 5y 4
and N (x, y) = −4xy 3 ,
and, clearly, they are both homogeneous of degree 4. As such, we introduce a new function, v(x), such that y(x) = xv(x)
=⇒
dy dv = v(x) + x , dx dx
if we use the product rule. Using this, our ODE becomes dv 4 4 4 4 3 x + 5x v − 4x v v + x = 0. dx Cancelling common factors and simplifying this then becomes the separable ODE dv 1 + v4 = , dx 4xv 3 which we solve using the method in Section 8.2.1, i.e. Z Z 4v 3 dx dv = =⇒ ln |1 + v 4 | = ln |x| + c, 1 + v4 x
311
8. Differential equations
where c is an arbitrary constant. So, taking exponentials of both sides, this gives us |1 + v 4 | = eln |x|+c = ec eln |x| = ec |x|, so that removing the modulus signs and replacing the arbitrary constant ec > 0 with A ∈ R, we get v 4 + 1 = Ax =⇒ v = ±(Ax − 1)1/4 , for some arbitrary constant, A. Consequently, using y(x) = xv(x), we find that y(x) = ±x(Ax − 1)1/4 , is the general solution to our homogeneous ODE. Activity 8.7
Verify that the ODE in Example 8.9 is not linear.
Homogeneous ODEs are not the only examples of ODEs that can be solved using the methods above after some judicious substitution. In this course, if a novel substitution is needed to make a given ODE solvable, it will usually be given. See, for example, Exercise 8.2.
8.3
8
Second-order ODEs
In this section we will consider some methods that will allow us to solve certain second-order ODEs where all occurrences of y and its derivatives are of degree one. In particular, we will be concerned with such ODEs that have the form a
dy d2 y + b + cy = f (x), dx2 dx
where a, b and c are constants and f (x) is some given function of the independent variable, x. ODEs of this form are often said to have ‘constant coefficients’ referring to the constants multiplying y and its derivatives on the left-hand-side. The method for solving such second-order ODEs is as follows.
8.3.1
Homogeneous second-order ODEs
If the function, f (x), on the right-hand-side of our second-order ODE with constant coefficients is zero, i.e. if our ODE has the form a
dy d2 y +b + cy = 0, 2 dx dx
we say that it is homogeneous.4 To solve such an ODE, let’s suppose that any solution must have the form y(x) = A ekx , (8.2) 4
Note that this is a different use of the word ‘homogeneous’ to the one in Sections 6.3.4 and 8.2.3. That is, this is an homogeneous equation whereas in Section 6.3.4 we had homogeneous functions and in Section 8.2.3 we had an ODE which was ‘made up’ from two such functions in a certain way.
312
8.3. Second-order ODEs
where k is a number to be determined and A is an arbitrary constant. Differentiating this twice, we find that dy = Ak ekx dx
and
d2 y = Ak 2 ekx , dx2
and substituting this into the equation we get a(Ak 2 ekx ) + b(Ak ekx ) + c(A ekx ) = 0. Now, we can cancel the A as it is arbitrary and the ekx as it is always non-zero, which leaves us with the auxiliary equation ak 2 + bk + c = 0. If we solve the auxiliary equation, we can determine the values of k in (8.2) that yield solutions. Of course, when solving a quadratic equation such as this, there are three different cases that can arise, i.e. we can get: Two real solutions: If the solutions are k = α and k = β, then we get solutions of the form y(x) = A eαx and y(x) = B eβx , where A and B are arbitrary constants. As such, we find that y(x) = A eαx +B eβx , is the general solution of the second-order ODE. One real solution: If the solution is k = α (twice), then we get solutions of the form y(x) = A eαx
and
y(x) = Bx eαx ,
where A and B are arbitrary constants. As such, we find that y(x) = (A + Bx) eαx , is the general solution of the second-order ODE. √ No real solutions: If the solutions are k = γ ± δ −1, then, using material which is beyond the scope of this course,5 we find that γx y(x) = e A cos(δx) + B sin(δx) , is the general solution of the second-order ODE. Let’s illustrate these three cases by looking at some examples. 5
If you are interested, this case involves complex numbers which are discussed in Chapter 13 of Binmore and Davies (2002). If you read this, you will then be able to understand the discussion of this type of solution in Section 14.5 of Binmore and Davies (2002). However, as we are not dealing with such things here, you are advised to wait until you tackle complex numbers properly in 175 Further Linear Algebra.
313
8
8. Differential equations
Example 8.10
Find the general solution of the ODE y 00 − y 0 − 2y = 0.
As the right-hand-side of this second-order ODE with constant coefficients is zero, it is homogeneous. Its auxiliary equation is given by k2 − k − 2 = 0
=⇒
(k − 2)(k + 1) = 0,
and so we have two real solutions given by k = 2 and k = −1. As such, the theory above dictates that y(x) = A e2x +B e−x , where A and B are arbitrary constants, is the general solution to this homogeneous second-order ODE. Example 8.11
Find the general solution of the ODE y 00 + 4y 0 + 4y = 0.
As the right-hand-side of this second-order ODE with constant coefficients is zero, it is homogeneous. Its auxiliary equation is given by k 2 + 4k + 4 = 0
=⇒
(k + 2)2 = 0,
and so we have one real solution given by k = −2. As such, the theory above dictates that y(x) = (A + Bx) e−2x , where A and B are arbitrary constants, is the general solution to this homogeneous second-order ODE.
8
Example 8.12
Find the general solution of the ODE y 00 − 2y 0 + 2y = 0.
As the right-hand-side of this second-order ODE with constant coefficients is zero, it is homogeneous. Its auxiliary equation is given by √ √ k 2 −2k+2 = 0 =⇒ (k−1)2 +1 = 0 =⇒ k−1 = ± −1 =⇒ k = 1± −1. and so we get no real solutions for k. As such, the theory above dictates that we take γ = 1 and d = 1, so that x y(x) = e A cos(x) + B sin(x) , where A and B are arbitrary constants, is the general solution to this homogeneous second-order ODE.
8.3.2
Non-homogeneous second-order ODEs
If the function, f (x), on the right-hand-side of our second-order ODE with constant coefficients is non-zero, i.e. it has the form a
314
dy d2 y + b + cy = f (x), dx2 dx
8.3. Second-order ODEs
with f (x) 6= 0, then we say that it is non-homogeneous. To solve such an ODE, we use the following method. We solve the corresponding homogeneous ODE, to find the function, yc (x), which is often called the complementary function. That is, we solve a
d2 yc dyc + cyc = 0, +b 2 dx dx
using the auxiliary equation, as in Section 8.3.1, to find yc (x). We then seek a function, yp (x), which is often called the particular integral, that satisfies the non-homogeneous ODE. That is, we want to find a function, yp (x), that satisfies d2 y p dyp + cyp = f (x), a 2 +b dx dx and we will see how to do this in a moment. Then, having found the complementary function and a particular integral, the general solution to our non-homogeneous ODE is given by y(x) = yc (x) + yp (x). That is, the general solution we seek, y(x), is the sum of the two functions we have found. In particular, observe that the complementary function will contain the two arbitrary constants that make y(x) a general solution whereas the particular integral guarantees that y(x) will give us the correct right-hand-side, i.e. f (x), when we substitute it into the ODE. Finding particular integrals To find the particular integral for a given second-order ODE, we look at f (x) and start by taking yp (x) to be a general function of the same form. For instance, if we find that f (x) = a for some constant a we take yp (x) = α. f (x) = a + bx for some constants a and b we take yp (x) = α + βx. f (x) = a + bx + cx2 for some a, b and c we take yp (x) = α + βx + γx2 . et cetera. f (x) = a erx for some constant a we take yp (x) = α erx . f (x) = (a + bx) erx for some constants a, b and r we take yp (x) = (α + βx) erx . et cetera. f (x) = a sin(rx) for some constants a and r we take yp (x) = α sin(rx) + β cos(rx). f (x) = a cos(rx) for some constants a and r we take yp (x) = α sin(rx) + β cos(rx). et cetera.
315
8
8. Differential equations
Then, by substituting the appropriate general function into our non-homogeneous second-order ODE, we can find the values of the relevant ‘Greek letters’ and this will then give us the specific function, yp (x), that will play the role of the particular integral in our solution. Applying the method Let’s consider an example to see how we would go about determining the particular integral in some of the cases listed above and how we would use this to find the general solution of a non-homogeneous second-order ODE. Example 8.13 In Example 8.10 we saw that the general solution to the homogeneous second-order ODE y 00 − y 0 − 2y = 0 was given by y(x) = A e2x +B e−x , where A and B are arbitrary constants. Find the general solution to the non-homogeneous second-order ODE y 00 − y 0 − 2y = f (x), when (i) f (x) = 8, (ii) f (x) = 6x and (iii) f (x) = 20 e3x . We know that the complementary function, yc (x), for this non-homogeneous second-order ODE is given by the general solution to the homogeneous second-order ODE. As such, we know that
8
yc (x) = A e2x +B e−x , where A and B are arbitrary constants. Our first task is to find the particular integral, yp (x), for each choice of f (x). Once we have this, we can then find the general solution, y(x), of the relevant non-homogeneous second-order ODE by simply taking y(x) = yc (x) + yp (x). For (i), we have f (x) = 8 and so we take yp (x) = α where α is a constant that has to be determined. To find α, we note that yp0 (x) and yp00 (x) are both zero which means that substituting them into the non-homogeneous second-order ODE, we get 0 − 0 − 2α = 8
=⇒
α = −4.
Thus, yp (x) = −4 is the sought after particular integral and the general solution to our non-homogeneous second-order ODE is y(x) = A e2x +B e−x −4, using y(x) = yc (x) + yp (x). For (ii), we have f (x) = 6x and so we take yp (x) = α + βx where α and β are constants that have to be determined. To find α and β, we note that yp0 (x) = β and yp00 (x) = 0 which means that substituting them into the non-homogeneous second-order ODE yields 0 − β − 2(α + βx) = 6x
316
=⇒
−2βx − (2α + β) = 6x.
8.3. Second-order ODEs
Now these two expressions must be the same and so, looking at the coefficient of x on both sides, we see that β must be −3. Similarly, looking at the constant term on both sides we see that −2α − β must be zero, so as β = −3, this means that α must be 3/2. Thus, yp (x) = 32 − 3x is the sought after particular integral and the general solution to our non-homogeneous second-order ODE is 3 y(x) = A e2x +B e−x + − 3x, 2 using y(x) = yc (x) + yp (x). For (iii), we have f (x) = 20 e3x and so we take yp (x) = α e3x where α is a constant that has to be determined. To find α, we note that yp0 (x) = 3α e3x and yp00 (t) = 9α e3x which means that substituting them into the non-homogeneous second-order ODE yields 9α e3x −3α e3x −2(α e3x ) = 20 e3x
=⇒
4α e3x = 20 e3x
=⇒
α = 5.
Thus, yp (x) = 5 e3x is the sought after particular integral and the general solution to our non-homogeneous second-order ODE is y(x) = A e2x +B e−x +5 e3x , using y(x) = yc (x) + yp (x). A complication Although we won’t spend much time on such things, observe that if the function, f (x), in our non-homogeneous second-order ODE prompts us to try a particular integral, yp (x), that is ‘part’ of the complementary function — i.e. we can find values of the arbitrary constants in yc (x) that make yc (x) = yp (x) — we have to be more subtle when we choose our particular integral. However, this subtlety usually involves doing nothing more than multiplying what we’d normally choose to be our particular integral by x. Let’s return to our previous example to see how this works. Example 8.14 Following on from Example 8.13, find the general solution to the non-homogeneous second-order ODE y 00 − y 0 − 2y = f (x), when f (x) = 18 e2x . We know that the complementary function, yc (x), for this non-homogeneous second-order ODE is given by yc (x) = A e2x +B e−x , where A and B are arbitrary constants. Our task is to find the particular integral, yp (x), in the case where f (x) = 18 e2x so that we can deduce the relevant general solution.
317
8
8. Differential equations
Note: Here we would normally try yp (x) = α e2x but this is ‘part’ of the complementary function since, taking A = α and B = 0, we have yp (x) = yc (x)! Our first reaction in this case would be to take yp (x) = α e2x where α is a constant that has to be determined. To find α, we note that yp0 (x) = 2α e2x and yp00 (x) = 4α e2x which means that substituting them into the non-homogeneous second-order ODE, we get 4α e2x −2α e2x −2(α e2x ) = 18 e2x .
But now, the left-hand-side turns out to be zero,6 meaning that this equation for α has no solutions! That is, we can’t determine α if we use this general form for yp (x)! Thus, the particular integral in this case can’t have the general form yp (x) = α e2x as we can’t find an α that will make it work. So, following the advice above, we try the next best thing which is our original choice multiplied by x. That is, we try yp (x) = αx e2x where α is a constant that has to be determined. To find α, we note that writing yp (x) as (αx)(e2x ) we can use the product rule to get yp0 (x) = (α)(e2x ) + (αx)(2 e2x ) = (α + 2αx)(e2x ), and yp00 (x) = (2α)(e2x ) + (α + 2αx)(2 e2x ) = (4α + 4αx)(e2x ). So, substituting these into the non-homogeneous second-order ODE, we get (4α + 4αx)(e2x ) − (α + 2αx)(e2x ) − 2(αx)(e2x ) = 18 e2x
8
=⇒
3α e2x = 18 e2x ,
which means that α can now be determined and is actually equal to 6. Thus, yp (x) = 6x e2x is the sought after particular integral and so the general solution to our non-homogeneous second-order ODE is y(x) = A e2x +B e−x +6x e2x , using y(x) = yc (x) + yp (x). Another example of this complication arises in Question 3(b) of the sample examination paper in Appendix A.
8.4
Systems of first-order ODEs
We now turn our attention to systems of first-order ODEs. For instance, we may be asked to find the functions y1 (x) and y2 (x) that simultaneously satisfy the ODEs dy1 = f1 (y1 , y2 , x) and dx 6
dy2 = f2 (y1 , y2 , x), dx
Actually, this shouldn’t be a surprise since, taking A = α and B = 0 in our complementary function, we still have a solution to the homogeneous second-order ODE and so putting this into the left-hand-side must yield zero!
318
8.4. Systems of first-order ODEs
where we are given the functions f1 and f2 . Generally, y1 and y2 will appear on the right-hand-sides of both these first-order ODEs and, in such cases, we say that they are coupled as we can’t solve one of them without using information contained in the other. The procedure that we shall use to solve these involves rewriting the system of first-order ODEs as a second-order ODE which can then be solved using the method outlined in the previous section.
8.4.1
Simple systems of first-order ODEs
A simple system of coupled first-order ODEs will only involve linear combinations of y1 (x) and y2 (x) on the right-hand-side, i.e. it will have the form dy1 = ay1 (x) + by2 (x) and dx
dy2 = cy1 (x) + dy2 (x), dx
for some constants a, b, c and d. The procedure for solving this involves differentiating the first equation (say) with respect to x so that we get dy1 dy2 d2 y1 =a +b , 2 dx dx dx and then, using the second equation, we find that d2 y1 dy1 = a + b (cy1 (x) + dy2 (x)) , dx2 dx which means that we have d2 y1 dy1 −a − bcy1 (x) − bdy2 (x) = 0. 2 dx dx
8
Now, the first equation can be rearranged to give by2 (x) =
dy1 − ay1 (x), dx
and so, if we substitute this in, we end up with d2 y 1 dy1 − (a + d) − (bc − ad)y1 (x) = 0, 2 dx dx which is an homogeneous second-order ODE with constant coefficients which we can solve using the method in Section 8.3.1 to find y1 (x). Of course, having done this, we can then use the first of the original equations (say) to find y2 (x). Let’s look at an example to see how this works. Example 8.15 Find the functions y1 (x) and y2 (x) that satisfy the system of first-order ODEs given by dy1 = 2y1 + 4y2 dx
and
dy2 = 3y1 + 3y2 , dx
with the conditions y1 (0) = 5 and y2 (0) = −2.
319
8. Differential equations
We will solve this by rewriting this system as a second-order ODE in y1 (x). To do this we note that, rearranging the first ODE gives us 1 dy1 y2 = − 2y1 , (8.3) 4 dx
and if we differentiate this with respect to x we get dy2 1 d2 y1 dy1 = . −2 dx 4 dx2 dx
Consequently, if we substitute these two expressions into the second ODE, we get 3 dy1 1 d2 y 1 dy1 = 3y1 + − 2y1 , −2 4 dx2 dx 4 dx and this can be rearranged to get
d2 y1 dy1 − 6y1 = 0, − 5 dx2 dx which is our sought after second-order ODE in y1 (x). As it is an homogeneous second-order ODE with constant coefficients, this can be solved using the method in Section 8.3.1. The auxiliary equation is k 2 − 5k − 6 = 0
=⇒
(k + 1)(k − 6) = 0,
which has two real solutions given by k = −1 and k = 6 which means that the general solution for y1 (x) is y1 (x) = A e−x +B e6x ,
8
for arbitrary constants A and B. To find the general solution for y2 (x), we note that using (8.3) and the fact that we get
y10 (x) = −A e−x +6B e6x ,
1 1 −x 6x −x 6x −x 6x [−A e +6B e ] − 2[A e +B e ] = − 3A e +4B e y2 (x) = , 4 4
in terms of the same arbitrary constants A and B as before. Thus, the general solution to this system of first-order ODEs is 3 y1 (x) = A e−x +B e6x and y2 (x) = − A e−x +B e6x , 4 for arbitrary constants A and B.
However, we are also given the conditions y1 (0) = 5 and y2 (0) = −2 which imply that 3 5 = A + B and − 2 = − A + B. 4 Solving these two equations simultaneously, say by subtracting one from the other, we see that 7 = 7A/4 which gives A = 4 and then, the first equation gives B = 1. Consequently, we find that y1 (x) = 4 e−x + e6x
and y2 (x) = −3 e−x + e6x ,
is the particular solution of this system of first-order ODEs given the conditions y1 (0) = 5 and y2 (0) = −2.
320
8.4. Systems of first-order ODEs
It is worth noting that systems of equations of the form encountered here can also be solved using diagonalisation in much the same way as systems of difference equations are solved in Section 11.2 of 173 Algebra.
8.4.2
Other systems of first-order ODEs
Systems of first-order ODEs become more complicated when they involve more complicated functions on the right-hand-side. The method for solving them remains the same, but a little more care must be taken as the following example illustrates. Example 8.16 Find the functions y1 (x) and y2 (x) that satisfy the system of first-order ODEs given by dy1 dy2 = −4y1 + 2y2 and = −2y1 + 4x2 + 4, dx dx with the conditions y1 (0) = 1 and y2 (0) = 7/2. We will solve this by rewriting this system as a second-order ODE in y1 (x). To do this we note that, rearranging the first ODE gives us 1 dy1 + 4y1 , y2 = (8.4) 2 dx and if we differentiate this with respect to x we get 1 d2 y1 dy1 dy2 = +4 . dx 2 dx2 dx Consequently, if we substitute this derivative into the second ODE, we get dy1 1 d2 y 1 +4 = −2y1 + 4x2 + 4, 2 dx2 dx
8
and this can be rearranged to get d2 y 1 dy1 +4 + 4y1 = 8x2 + 8, (8.5) 2 dx dx which is our sought after second-order ODE in y1 (x). As it is a non-homogeneous second-order ODE with constant coefficients, this can be easily solved using the method of Section 8.3.2. In particular: The homogeneous second-order ODE that corresponds to (8.5) is d2 y1 dy1 +4 + 4y1 = 0, 2 dx dx and so the auxiliary equation is k 2 + 4k + 4 = 0
=⇒
(k + 2)2 = 0,
which has one real solution given by k = −2 (twice). Consequently, the complementary function for y1 (x) is y1 (x) = (A + Bx) e−2x , where A and B are arbitrary constants.
321
8. Differential equations
The right-hand-side of (8.5) is a quadratic and this suggests that we try a particular integral of the form y1 (x) = αx2 + βx + γ. We differentiate this twice to get y10 (x) = 2αx + β
and
y100 (x) = 2α,
so that, on substituting these into (8.5), our equation becomes 2α + 4(2αx + β) + 4(αx2 + βx + γ) = 8x2 + 8. Then, equating the coefficients of the terms on both sides we see that, from the x2 term, we get 4α = 8 =⇒ α = 2, which means that, from the x term, we get 8α + 4β = 0
=⇒
β = −4,
and so, from the constant term, we get 2α + 4β + 4γ = 8
=⇒
γ = 5.
Consequently, we see that y1 (x) = 2x2 − 4x + 5,
8
is the particular integral for y1 (x). The general solution to (8.5) is then given by the sum of its complementary function and its particular integral, i.e. we have y1 (x) = (A + Bx) e−2x +2x2 − 4x + 5, where A and B are arbitrary constants. We can now use this to find the general solution for y2 (x) since, using (8.5) and the fact that dy1 = B e−2x −2(A + Bx) e−2x +4x − 4 = (B − 2A − 2Bx) e−2x +4x − 4, dx we get 1 −2x −2x 2 y2 (x) = [(B − 2A − 2Bx) e +4x − 4] + 4[(A + Bx) e +2x − 4x + 5] . 2 So, simplifying this, we find that y2 (x) = ( 12 B + A + Bx) e−2x +4x2 − 6x + 8, is the corresponding general solution for y2 (x) in terms of the same arbitrary constants A and B as before.
322
8.5. Applications of ODEs
Once we have these general solutions, we can use the initial conditions y1 (0) = 1 and y2 (0) = 7/2 to get the equations 1=A+5
7 2
and
= 12 B + A + 8,
which give us A = −4 and, hence, B = −1. Consequently, using these values, we find that y1 (x) = −(4 + t) e−2x +2x2 − 4x + 5 and y2 (x) = −( 29 + x) e−2x +4x2 − 6x + 8, are the sought after particular solutions.
8.5
Applications of ODEs
Differential equations are used widely in economics-based subjects and, in Section 5.4.1, we saw a very simple application when we considered marginal functions. Here, we will consider a few more examples that are a bit more sophisticated.
8.5.1
Determining demand functions from elasticities
In Section 3.3.3, we saw that the elasticity of demand, ε(p), is defined by ε(p) = −
p dq , q dp
where q = q D (p) is the demand function. If we know the elasticity of demand, we can use this and our knowledge of ODEs to determine the demand function. Example 8.17 Suppose that the elasticity of demand is a constant, i.e. ε(p) = r for all p and r is a positive constant. Find the demand function if q D (1) = 2. Using the definition of the elasticity of demand, this gives us −
p dq =r q dp
1 dq r =− , q dp p
=⇒
and so this is a separable first-order ODE. Solving this using the method in Section 8.2.1, we write this as Z Z 1 r dq = − dp and determine the integrals to get ln |q| = −r ln |p| + c, q p where c is an arbitrary constant. Then, rewriting this as ln |q| = ln |p|−r + c, we can take exponentials of both sides, to get q = eln |p|
−r +c
= ec p−r ,
323
8
8. Differential equations
where we can remove the modulus signs since, economically, q and p are both positive. Then, using the fact that q D (1) = 2, we see that ec = 2 and so q = q D (p) =
2 , pr
is the sought after demand function. Activity 8.8 How does the demand function found in Example 8.17 behave as p → 0+ and as p → ∞?
8.5.2
Continuous price adjustment
Suppose that the price of some commodity varies continuously with time and that its initial price is not equal to its equilibrium price. We might expect that, as time progresses, the price of the commodity will tend to its equilibrium price but to be sure, we need to have a model of how the price of the commodity is varying with time. One such model involves looking at how the rate of change of the price of the commodity is related to the excess of demand over supply. Suppose that the price of the commodity as a function of time is p(t) and that the market for this commodity is governed by the demand function, q D (p), and the supply function, q S (p). This means that, at any time, t, as the price is p(t), the quantity being demanded is given by q D (p(t)) and the quantity being supplied is given by q S (p(t)). As such, we can define the excess of demand over supply to be the function of p(t) given by ∆(p(t)) = q D (p(t)) − q S (p(t)),
8
i.e. the difference between these two quantities. Clearly, this means that if p(t) is such that: ∆(p(t)) > 0, demand outstrips supply and so the price should rise, i.e. p0 (t) > 0. ∆(p(t)) = 0, demand equals supply and we should have equilibrium, i.e. p0 (t) = 0. ∆(p(t)) < 0, supply outstrips demand and so the price should fall, i.e. p0 (t) < 0. This suggests that the rate of change of the market price with time, i.e. p0 (t), should be given by some function f of the excess of demand over supply, ∆(p(t)), i.e. we have a model where dp = f (∆(p(t))) dt
with
∆(p(t)) = q D (p(t)) − q S (p(t)).
Then, by solving this first-order ODE, we can find out how the market price varies with time and hence assess the stability of the market by considering what it does as t → ∞. To see how this works, let’s consider an example.
324
8.5. Applications of ODEs
Example 8.18
A market is governed by the demand and supply functions q D (p) = 5 − 2p
and
q S (p) = 3p − 1,
respectively. If the rate of change of the market price is given by three times the excess of demand over supply, find the ODE that describes how p(t) changes with time. We start by calculating the excess of demand over supply which is given by ∆(p(t)) = q D (p(t)) − q S (p(t)) = [5 − 2p(t)] − [3p(t) − 1] = 6 − 5p(t). We then know that the rate of change of demand over supply is given by three times the excess, i.e. 6 dp = 3∆(p(t)) = 3[6 − 5p(t)] = −15 p(t) − . dt 5 This is a separable first-order ODE and we can easily solve it using the method in Section 8.2.1. Activity 8.9 Solve the separable first-order ODE found in Example 8.18 and use it to determine how the market price changes over time if the initial price is p(0). How does the market price behave in the long-term?
8.5.3
Continuous cash flows
8
In Section 6.1.5 of 173 Algebra, you saw how to find the balance, B(t), of a bank account that utilises continuously compounded interest at an annual equivalent rate of 100r%. Another way of thinking about this is to say that, at any time, t, the rate of increase of the balance, B 0 (t), is given by rB(t). This means that we have dB = rB(t), dt and this is a simple separable first-order ODE that can be solved, using the method in Section 8.2.1, to get B(t) = P ert , where B(0) = P is the initial balance. As such, we can see that this way of thinking about continuous compounding gives us an alternative way of deriving the formula you saw in Section 6.1.5 of 173 Algebra. Activity 8.10 Verify that solving this separable first-order ODE will give the solution above. However, we can actually use ODEs to find the balance of a bank account which uses continuously compounded interest in the presence of more complicated investment schemes. For instance, if we take the bank account above and suppose that money is
325
8. Differential equations
added to the account at a rate given by f (t),7 we see that the balance, B(t), is now given by dB dB = rB(t) + f (t) =⇒ − rB(t) = f (t), dt dt which is a linear first-order ODE. And, of course, we could also have the situation where money is deducted from the account at a rate given by f (t),8 and then we see that the balance, B(t), would be given by dB = rB(t) − f (t) dt
=⇒
dB − rB(t) = −f (t), dt
which is another linear first-order ODE. Let’s consider an example. Example 8.19 Suppose that we have two bank accounts, X and Y, that pay continuously compounded interest at annual equivalent rates of 100rX % and 100rY % respectively. We initially invest an amount PX in account X and, at each instant, pay the interest accrued into account Y whose initial balance is PY . Find the ODE that determines the balance in account Y at any time t ≥ 0. Let BX (t) and BY (t) denote the balance in accounts X and Y respectively at time t. The first thing to notice is that the rate of change of BX (t) is given by dBX = rX BX (t) − rX BX (t) = 0, dt as, at every instant, any interest accrued is immediately deducted from account X so that it can be paid into account Y. This means that BX (t) must be a constant and, in particular, this constant must be the initial balance PX . Thus, we find that BX (t) = PX for all t ≥ 0 and the interest accrued at each time t (which we immediately pay into account Y) is given by rX PX .
8
The rate of change of BY (t) is then given by the sum of rY BY (t) which is the continuously compounded interest accrued on the balance in account Y and rX PX which, as we have just seen, is the continuously compounded interest accrued in account X. That is, for t ≥ 0, we have dBY = rY BY (t) + rX PX dt
=⇒
dBY − rY BY (t) = rX PX , dt
which is a linear first-order ODE and we can easily solve this, subject to the condition that BY (0) = PY , using the method in Section 8.2.2. Activity 8.11 Solve the linear first-order ODE found in Example 8.19 and use it to determine the balance in account Y at any time t ≥ 0.
7 8
That is, at each time, t, the balance increases by f (t). That is, at each time, t, the balance decreases by f (t).
326
8.5. Learning outcomes
8.5.4
Market trends
In some markets, the equilibrium price will change with time and so it is useful for consumers to try and anticipate trends. That is, the consumer will keep an eye on the current equilibrium price, but they will also look at the rate at which the price is rising or falling and whether this rate of change is speeding up or slowing down. We can represent these three considerations mathematically by using p(t), p0 (t) and p00 (t) respectively and, by considering how these affect the quantity being supplied or demanded, we can model how the price itself is varying with time by using an ODE. Let’s look at an example. Example 8.20
Suppose that the demand for a certain commodity is given by q D (p) = 9 − 2p + 6
d2 p dp −2 2, dt dt
and that supply is determined by q S (p) = −3 + 4p −
dp d2 p − 2. dt dt
Find the ODE that determines the equilibrium price at any time t ≥ 0. Here we have linear supply and demand functions which have been modified to take a trend into account. To find the equilibrium price at any time t ≥ 0, we need to determine the function, p(t), that makes the amount supplied equal to the amount demanded, i.e. −3 + 4p(t) −
8
dp d2 p dp d2 p − 2 = 9 − 2p(t) + 6 − 2 2 . dt dt dt dt
But, rearranging this, we get the non-homogeneous second-order ODE with constant coefficients given by d2 p dp − 7 + 6p(t) = 12, 2 dt dt which we can solve using the method in Section 8.3.2. Activity 8.12 Solve the second-order ODE found in Example 8.20 and use it to determine how the equilibrium price changes if p(0) = 7 and p0 (0) = 15. How does this equilibrium price behave in the long-term?
Learning outcomes At the end of this chapter and having completed the relevant reading and activities, you should be able to: identify and solve separable, linear and homogeneous first-order ODEs and other first-order ODEs that can be solved by a given substitution;
327
8. Differential equations
identify and solve homogeneous and non-homogeneous second-order ODEs with constant coefficients; solve coupled systems of first-order ODEs by rewriting them as a second-order ODE with constant coefficients; solve problems from economics-based subjects that involve applications of ODEs.
Solutions to activities Solution to activity 8.1 Looking at the given ODEs, we see that: (a) is second-order of first degree, (b) is second-order of second degree, and (c) is third-order of first degree. Here we find the highest order derivative to determine the order and then the algebraic degree (or ‘power’) of this derivative determines the degree. Solution to activity 8.2 We have the general solution y(x) = x2 + x + c,
8
and we want to find the particular solutions corresponding to: y(0) = 0. So, setting x = 0 in both sides of this expression and using the condition, we get y(0) = 02 + 0 + c =⇒ 0 = c, which means that we must take c = 0 in the general solution to see that y(x) = x2 + x, is the particular solution to the ODE given that y(0) = 0. y(0) = −1. So, setting x = 0 in both sides of this expression and using the condition, we get y(0) = 02 + 0 + c =⇒ −1 = c, which means that we must take c = −1 in the general solution to see that y(x) = x2 + x − 1, is the particular solution to the ODE given that y(0) = −1. y(2) = 7. So, setting x = 2 in both sides of this expression and using the condition, we get y(2) = 22 + 2 + c =⇒ 7 = 6 + c,
328
8.5. Solutions to activities
which means that we must take c = 1 in the general solution to see that y(x) = x2 + x + 1, is the particular solution to the ODE given that y(2) = 7. Observe that this is the same particular solution as the one we found with y(0) = 1 in Example 8.2 but that it arises from a condition that specifies information about y(x) at a different value of x. Solution to activity 8.3 We have the general solution 2
y(x) = 1 + A ex , and we want to find the particular solutions corresponding to: y(0) = 2. So, setting x = 0 in both sides of this expression and using the condition, we get y(0) = 1 + A e0 =⇒ 2 = 1 + A, which means that we must take A = 1 in the general solution to see that 2
y(x) = 1 + ex , is the particular solution to the ODE given that y(0) = 2. y(0) = 0. So, setting x = 0 in both sides of this expression and using the condition, we get y(0) = 1 + A e0 =⇒ 0 = 1 + A, which means that we must take A = −1 in the general solution to see that 2
y(x) = 1 − ex , is the particular solution to the ODE given that y(0) = 0. If we want a value of y(1) that will give us the same particular solution as the one found in (a), i.e. 2 y(x) = 1 + ex , we put x = 1 into both sides of this expression to get y(1) = 1 + e1 = 1 + e . That is, the condition y(1) = 1 + e gives us the same particular solution as the one we found in (a). Solution to activity 8.4 Here we have to solve the separable first-order ODE 2 1 dy = , x y + 3 dx
329
8
8. Differential equations
with M (x) = 2/x and N (y) = (y + 3)−1 . Using the method in Section 8.2.1, we write this as Z Z dy 2 dx = and determine the integrals to get 2 ln |x| + c = ln |y + 3|, x y+3 where c is an arbitrary constant. Taking exponentials of both sides we get 2
|y + 3| = e2 ln |x|+c = ec eln |x| = ec |x|2 . Now, |x|2 = x2 and removing the modulus on the left-hand-side, we get y + 3 = ± ec x2
y = −3 ± ec x2 ,
=⇒
and so, as before, the general solution is y(x) = −3 + Ax2 , where A ∈ R is an arbitrary constant. Solution to activity 8.5 The given ODE can be left as it is or rearranged to give 1 dy = 1, y + ex dx but, either way, it is not separable because we can’t ‘separate’ the variables.
8
Solution to activity 8.6 Here we have to solve the linear first-order ODE y dy − = 1, dx x with P (x) = −1/x and Q(x) = 1. Using the method in Section 8.2.2, we start by finding the integrating factor, µ(x), by determining the integral Z Z 1 P (x) dx = − dx = − ln |x| + c, x and so we see that − ln x is an antiderivative of −1/x. This means that the integrating factor is −1 µ(x) = e− ln x = eln x = x−1 , and so we have Z µ(x)y(x) = µ(x)Q(x) dx
=⇒
−1
x y(x) =
Z
x−1 dx
where c is an arbitrary constant. As such, we have y(x) = x(ln |x| + c), which is the same general solution as before.
330
=⇒
x−1 y(x) = ln |x| + c,
8.5. Solutions to activities
Solution to activity 8.7 If we try and write the ODE x4 + 5y 4 − 4xy 3 the best we can do is
dy + P (x)y = Q(x), dx
dy = 0 in the form dx 5 x3 dy − y = 3, dx 4x 4y
and this is not linear due to the presence of the 1/y 3 on the right-hand-side. Solution to activity 8.8 In Example 8.17, we found that q D (p) = 2/pr where r is a positive constant. As such, we can see that q D (p) → ∞ as p → 0+ and q D (p) → 0 as p → ∞. Solution to activity 8.9 Using the method in Section 8.2.1, we write the ODE as Z Z dp = (−15) dt and determine the integrals to get p − 65
ln p −
6 = −15t + c, 5
where c is an arbitrary constant. Taking exponentials of both sides, this gives us p − 6 = e−15t+c = ec e−15t . 5
Now, we remove the modulus bars and compensate for this loss by replacing ec (which must be positive) with the constant A (which can be negative), to get p(t) =
6 + A e−15t , 5
which is the general solution. Then, given that the initial price is p(0), we see that p(0) =
6 + A e0 5
=⇒
6 A = p(0) − , 5
and so, we have the particular solution 6 6 −15t e , p(t) = + p(0) − 5 5 which tells us how the market price changes over time if the initial price is p(0). In particular, if we have a p(0) such that: p(0) > 6/5, since e−15t → 0 as t → ∞, p(t) will decrease to 6/5. p(0) = 6/5, we find that p(t) = 6/5 for all t ≥ 0. p(0) < 6/5, since e−15t → 0 as t → ∞, p(t) will increase to 6/5.
Indeed, as you should be able to verify, 6/5 is the equilibrium price for this market and so, in this case, regardless of the choice of p(0), the market is either in equilibrium or tends to equilibrium in the long-term.
331
8
8. Differential equations
Solution to activity 8.10 To solve the separable first-order ODE Z
dB = rB(t) we write it as dt
dB = B
Z
r dt,
and determine the integrals to get ln |B| = rt + c
B = ert+c = ec ert ,
=⇒
where we can remove the modulus sign since, economically, B is positive. Then, using the fact that B(0) = P , we see that ec = P and so B(t) = P ert , as we would expect. Solution to activity 8.11 We have to solve the linear first-order ODE dBY − rY BY (t) = rX PX , dt subject to the condition that BY (0) = PY . The integrating factor is given by e
8
R
(−rY ) dt
= e−rY t ,
as −rY t is an antiderivative of −rY and this means that we have Z rX −rY t −rY t e BY = rX PX e−rY t dt =⇒ e−rY t BY = −PX e +c, rY where c is an arbitrary constant. As such, our general solution is BY (t) = −PX
rX + c erY t . rY
Then, as BY (0) = PY , we have PY = −PX
rX +c rY
=⇒
c = PY + P X
rX , rY
and so the required particular solution is rX rX rX rY t rY t rY t BY (t) = −PX + PY + PX e = PY e +PX e −1 , rY rY rY and this tells us the balance in account Y at any time t ≥ 0. Solution to activity 8.12 To solve the non-homogeneous second-order ODE with constant coefficients given by d2 p dp − 7 + 6p(t) = 12, 2 dt dt we note that:
332
8.5. Solutions to activities
The corresponding homogeneous second-order ODE is d2 p dp + 6p(t) = 0, − 7 dt2 dt and so the auxiliary equation is k 2 − 7k + 6 = 0
=⇒
(k − 1)(k − 6) = 0,
which has two real solutions given by k = 1 and k = 6. Consequently, the complementary function for p(t) is p(t) = A et +B e6t , where A and B are arbitrary constants. The right-hand-side is a constant and this suggests we try a particular integral of the form p(t) = α. We differentiate this twice to get p0 (t) = 0 and p00 (t) = 0 so that, on substituting these into our equation, we get 6α = 12
=⇒
α = 2.
Consequently, we see that p(t) = 2 is the particular integral for p(t). The general solution is then given by the sum of its complementary function and its particular integral, i.e. we have
8
p(t) = A et +B e6t +2, where A and B are arbitrary constants. Then given the initial condition p(0) = 7 we have 7=A+B+2
=⇒
A + B = 5,
and since p0 (t) = A et +6B e6t , the other initial condition, p0 (0) = 15, gives us 15 = A + 6B. Solving these equations, say by subtracting one from the other, we get 5B = 10 which gives us B = 2 and so, from the first equation, A = 3. Consequently, the particular solution we seek is p(t) = 3 et +2 e6t +2, and this describes how the equilibrium price changes with time. Indeed, in the long-term, as both 3 et and 2 e6t tend to infinity as t → ∞, we see that p(t) → ∞ too.
333
8. Differential equations
Exercises Exercise 8.1 Find the general solution of the ODE √ dy xy + 1 + x2 . = dx 1 + x2 What is the particular solution if y(0) = 1? Exercise 8.2 Use the substitution w(t) = y 0 (t) to show that the ODE d2 y 3 dy = −3. − dt2 t dt can be written as a linear ODE in terms of w(t). Solve this linear ODE for w(t) and hence find the general solution of the original ODE. Exercise 8.3 Find the particular solution of the ODE y 00 (t) − 5y 0 (t) + 6y(t) = 10 sin t, given that y(0) = 0 and y 0 (0) = 1. Exercise 8.4
8
The functions f (t) and g(t) are related by the first-order ODEs f 0 (t) = 3f (t) − g(t)
and
g 0 (t) = 3g(t) − f (t).
If f (0) = 2 and g(0) = 0, find these functions. Exercise 8.5 The elasticity of demand for a good is given by ε(p) =
2p2 , p2 + 1
and q = 4 when p = 1. Find the demand function, q D (p).
Solutions to exercises Solution to exercise 8.1 We solve this linear first-order ODE using the method in Section 8.2.2. Here P (x) = x/(1 + x2 ) and we start by seeing that the integral Z Z x P (x) dx = dx = 12 ln |1 + x2 | + c, 1 + x2
334
8.5. Solutions to exercises
where we have implicitly used the substitution u = 1 + x2 . So, as antiderivative of x/(1 + x2 ), the integrating factor is 1
2
µ(x) = e 2 ln(1+x ) = eln Then, as Q(x) =
√
√
1+x2
=
√
1 2
ln(1 + x2 ) is an
1 + x2 .
1 + x2 , we have µ(x)y(x) =
Z
µ(x)Q(x) dx
Z √ 2 =⇒ y(x) 1 + x = (1 + x2 ) dx
√ x3 + c, =⇒ y(x) 1 + x2 = x + 3 where c is an arbitrary constant. As such, we find that y(x) = √
x x3 c + √ +√ , 1 + x2 3 1 + x2 1 + x2
is the general solution of the given ODE. If y(0) = 1, this gives us c = 1, and so y(x) =
3x + x3 + 3 √ , 3 1 + x2
is the sought after particular solution.
8
Solution to exercise 8.2 Given that w(t) = y 0 (t), we have w0 (t) = y 00 (t), and so the given ODE, i.e. d2 y 3 dy − = −3 becomes dt2 t dt
dw 3 − w(t) = −3, dt t
which is the sought after linear ODE for w(t). We solve this ODE using the method in Section 8.2.2. Here P (t) = −3/t and we start by seeing that the integral Z Z 3 p(t) dt = − dt = −3 ln |t| + c, t and so −3 ln t is an antiderivative of −3/t which means that the the integrating factor, µ(t), is given by −3 µ(t) = e−3 ln t = eln(t ) = t−3 . Then, as Q(t) = −3, we have Z µ(t)w(t) = µ(t)Q(t) dt =⇒
−3
t w(t) =
Z
−3t−3 dt
=⇒
3 t−3 w(t) = t−2 + c, 2
where c is an arbitrary constant. As such, we see that w(t) =
3t + ct3 , 2
335
8. Differential equations
is the general solution for w(t). Then, as w(t) = y 0 (t), we see that Z Z 3 c 3t 3 + ct dt = t2 + t4 + d, y(t) = w(t) dt = 2 4 4 where d is another arbitrary constant. This is the general solution of the original ODE. Solution to exercise 8.3 The given ODE is a non-homogeneous second-order ODE with constant coefficients and we solve it using the method of Section 8.3.2. In particular: The corresponding homogeneous second-order ODE is y 00 (t) − 5y 0 (t) + 6y(t) = 0, and so the auxiliary equation is k 2 − 5k + 6 = 0
=⇒
(k − 2)(k − 3) = 0,
which has two real solutions given by k = 2 and k = 3. Consequently, the complementary function, yc (t), is yc (t) = A e2t +B e3t , for arbitrary constants A and B.
8
The right-hand-side of the given ODE is 10 sin t and this suggests that we try a particular integral of the form yp (t) = α sin t + β cos t. We differentiate this twice to get yp0 (t) = α cos t − β sin t
and
yp00 (t) = −α sin t − β cos t,
so that, on substituting these into the given ODE, we get (−α sin t − β cos t) − 5(α cos t − β sin t) + 6(α sin t + β cos t) = 10 sin t. Then, equating the coefficients of the terms on both sides we see that, from the sin t term, we get −α + 5β + 6α = 10
=⇒
α + β = 2,
and, from the cos t term, we get −β − 5α + 6β = 0
=⇒
α = β,
and so, solving these two equations simultaneously, we find that α = 1 and β = 1. Consequently, we see that yp (t) = sin t + cos t, is the particular integral.
336
8.5. Solutions to exercises
The general solution is then given by the sum of its complementary function and its particular integral, i.e. we have y(t) = A e2t +B e3t + sin t + cos t, where A and B are arbitrary constants. We can now use the initial condition y(0) = 0 to see that 0=A+B+0+1
=⇒
A + B = −1,
and, as y 0 (t) = 2A e2t +3B e3t + cos t − sin t,
the initial condition y 0 (0) = 1 gives us
1 = 2A + 3B + 1 − 0
=⇒
2A + 3B = 0.
Solving these equations simultaneously then gives us A = −3 and B = 2 which means that y(t) = −3 e2t +2 e3t + sin t + cos t, is the sought after particular solution. Solution to exercise 8.4 We will solve the given system of first-order ODEs by rewriting it as a second-order ODE in f (t). To do this we note that, rearranging the first ODE gives us g = 3f −
df dt
(8.6)
and if we differentiate this with respect to t we get df d2 f dg =3 − 2. dt dt dt Consequently, if we substitute these two expressions into the second ODE, we get df d2 f df 3 − 2 = 3 3f − − f, dt dt dt and this can be rearranged to get d2 f df −6 + 8f = 0, 2 dt dt which is our sought after second-order ODE in f (t). As it is an homogeneous second-order ODE with constant coefficients, this can be solved using the method of Section 8.3.1. The auxiliary equation is k 2 − 6k + 8 = 0
=⇒
(k − 2)(k − 4) = 0,
which has two real solutions given by k = 2 and k = 4 which means that the general solution for f (t) is f (t) = A e2t +B e4t ,
337
8
8. Differential equations
for arbitrary constants A and B. To find the general solution for g(t), we note that using (8.6) and the fact that f 0 (t) = 2A e2t +4B e4t , we get g(t) = 3[A e2t +B e4t ] − [2A e2t +4B e4t ] = A e2t −B e4t , in terms of the same arbitrary constants A and B as before. Thus, the general solution to this system of first-order ODEs is f (t) = A e2t +B e4t
and g(t) = A e2t −B e4t ,
for arbitrary constants A and B. However, we are also given the conditions f (0) = 2 and g(0) = 0 which imply that 2=A+B
and 0 = A − B.
Solving these two equations simultaneously then gives us A = 1 and B = 1 which means that f (t) = e2t + e4t and g(t) = e2t − e4t , are the sought after functions. Solution to exercise 8.5 Using the definition of elasticity with q = q D (p) and the given expression we have
8
−
p dq 2p2 = 2 , q dp p +1
and this can be written as
2p 1 dq =− 2 , q dp p +1 which is a separable ODE. So, using the method of Section 8.2.1, we write this as Z Z dq 2p =− dp and determine the integrals to get ln |q| = − ln |p2 + 1| + c, 2 q p +1 where c is an arbitrary constant.9 Taking exponentials on both sides, this gives us − ln(p2 +1)+c
q=e
c ln(p2 +1)−1
=e e
c
2
= e (p + 1)
−1
ec = 2 , p +1
where we can remove the modulus signs since, economically, q is positive and p2 + 1 is always positive too. Then, using the fact that q = 4 when p = 1, we see that ec = 8 and so 8 q = q D (p) = 2 , p +1 is the sought after demand function. 9
Here we have implicitly used the substitution u = 1 + p2 to determine the integral on the right-hand-side.
338
A
Appendix A Sample examination paper Important note: This sample examination paper reflects the intended examination and assessment arrangements for this course in the academic year 2011–2012. The format and structure of the examination may have changed since the publication of this subject guide. You can find the most recent examination papers on the VLE where all changes to the format of the examination are posted.
Calculus Time allowed: THREE hours. Candidates should answer all FIVE questions. All questions carry equal marks (20 marks each). Calculators may not be used for this paper. Z 1. (a) (i) Find t cos t dt.
(ii) Show that the differential equation dy x3 + xy 2 − x2 y = 0, cos(y/x) dx is homogeneous and find its degree of homogeneity.
(iii) Hence find the general solution of the differential equation in (ii) leaving your answer in terms of y/x. (b) A plane, P , in R3 contains the point (3, 4, −1) and has normal (−4, 8, −4)T . Find the Cartesian equation of this plane. It is known that the surface, S, with equation x2 + y 2 + z 2 = c, for some c ∈ R has P as a tangent plane. Find the value of c that makes this the case and find the point on this surface which has P as its tangent plane. Another surface with equation x2 + y 2 + αz 2 = β, for some α, β ∈ R intersects S orthogonally at the point (4, 3, 5). Find the values of α and β that make this the case.
339
A
A. Sample examination paper
2. A market has an equilibrium price of 14 and an equilibrium quantity of 6. (a) If this market’s elasticity of demand is given by p ε(p) = , 26 − p find its demand function. (b) The market’s inverse supply function has the form pS (q) = aq + b, for some numbers a and b. Given that the producer surplus is 36, find the values of a and b. Hence deduce the supply function, q S (p), for this market. (c) An excise (or per-unit) tax of T is imposed on the market. Find the new equilibrium price and quantity of the market. Hence find the value of T that maximises the tax revenue. 3. (a) A function f : R2 → R is defined by
f (x, y) = x2 − 2x − y 3 + y 2 + y.
Find and classify the stationary points of f .
Find the regions, if any, in the (x, y)-plane where f is convex, concave or neither. Does f have a global minimum or a global maximum? Justify your answer. (b) Find the general solution of the differential equation y 00 (t) − 2y 0 (t) + y(t) = et .
What is the particular solution if y(0) = 1 and y(1) = 0? 4. If a firm uses amounts k and l of capital and labour respectively, then it can produce an amount q(k, l) = k α lα where 0 < α < 1/2. Supposing that the firm is producing an amount Q, use the method of Lagrange multipliers to show that the minimum amount it can spend on capital and labour is given by √ 1 2 vw Q 2α , where each unit of capital costs v and each unit of labour costs w. By sketching the constraint and some appropriate contours, you should justify your use of the method of Lagrange multipliers and explain why your answer is a minimum. The product manufactured by the firm sells at a fixed price, p, and the raw materials required to produce each unit cost an amount, c, where c < p. If the firm acts in a way which minimises its capital and labour costs, use the result just obtained to determine the production level, Q, that will maximise its profit. 5. (a) Find the fifth-order Maclaurin series for esin x . Z cos x dx. (b) Determine the integral (1 − sin x)(2 + sin x) (c) Find and classify the stationary points of the function √ x − 3 x. f (x) = 12
340
Appendix B Solutions to the sample examination paper Question 1. (a) For (i), we use integration by parts to see that, differentiating the t and integrating the cos t, we get Z Z t cos t dt = t sin t − sin t dt = t sin t + cos t + c, where c is an arbitrary constant. For (ii), we compare the first-order differential equation with the standard form M (x, y) + N (x, y) to see that M (x, y) =
x3 + xy 2 cos(y/x)
dy = 0, dx
and N (x, y) = −x2 y.
In this case, this means that we have M (λx, λy) =
(λx)3 + (λx)(λy)2 = λ3 M (x, y), cos(λy/λx)
and N (λx, λy) = −(λx)2 (λy) = λ3 N (x, y), i.e. both M (x, y) and N (x, y) are homogeneous of degree 3. Consequently, this differential equation is homogeneous of degree 3. For (iii), as the differential equation in (ii) is homogeneous, we make the substitution y(x) = xv(x) so that, using the product rule, we have dy dv = v(x) + x , dx dx and the differential equation becomes x3 dv 3 2 3 +x v −x v v+x = 0, cos v dx which, when simplified, yields v cos v
dv 1 = , dx x
341
B
B. Solutions to the sample examination paper
B
which is a separable differential equation. Rewriting this in the usual way then gives Z Z dx v cos v dv = =⇒ v sin v + cos v = ln |x| + c, x where c is an arbitrary constant and we have used (i) to find the integral on the right-hand-side. So, using y(x) = xv(x) again, we see that y y y sin + cos = ln |x| + c, x x x
is the general solution in terms of y/x. (Obviously, this expression cannot be usefully simplified any further.) (b) As the plane, P , contains the point (3, 4, −1) and has normal (−4, 8, −4)T , we have −4 x−3 8 · y − 4 = 0 =⇒ −4(x−3)+8(y−4)−4(z+1) = 0 =⇒ x−2y+z = −6, −4 z+1 as its Cartesian equation.
The surface, S, can be written as f (x, y, z) = c with f (x, y, z) = x2 + y 2 + z 2 , for constant c. At any point, (x, y, z), on the surface, its normal vector is given by fx 2x ∇f = fy = 2y , fz 2z
and in order for this to be in the same direction as the normal to P , there must be some λ that makes 2 2x −4 x −2λ ∇f = λ −4 =⇒ 2y = λ 8 =⇒ y = 4λ . 2 2z −4 z −2λ Of course, we also need the point, (x, y, z), to lie on P and so we also have x − 2y + z = −6
=⇒
(−2λ) − 2(4λ) + (−2λ) = −6
=⇒
1 λ= , 2
i.e. this is the value of λ that we need. Thus, the point on S that we seek is (−1, 2, −1) and, using the equation for S, we get c = (−1)2 + (−2)2 + (−1)2 = 6, as the required value of c. The new surface can be written as g(x, y, z) = β with g(x, y, z) = x2 + y 2 + αz 2 ,
342
for constants α and β. At any point (x, y, z) on the surface, its normal vector is given by gx 2x 8 ∇g = gy = 2y =⇒ ∇g = 6 , gz 2αz 10α at the point (4, 3, 5). We also see that the normal to S at the point (4, 3, 5) is 8 ∇f = 6 , 10
and, in order for these two surfaces to intersect orthogonally at this point, we must have 8 8 6 ∇g · ∇f = 0 =⇒ · 6 = 0 =⇒ 64 + 36 + 100α = 0 =⇒ α = −1, 10α 10
as the value of α that we seek. Then, as the point (4, 3, 5) must lie on the new surface, we also have x2 + y 2 − z 2 = β
42 + 32 − (52 ) = β
=⇒
=⇒
β = 0,
as the required value of β. Question 2. (a) The elasticity of demand is given by the formula p dq ε(p) = − · , q dp where q = q D (p) is the demand function. In this question, we are told that ε=
p , 26 − p
and so putting this into the formula above we get p dq p − · = q dp 26 − p
=⇒
1 dq 1 · = , q dp p − 26
which is a separable differential equation. As such, we solve this by ‘separating’ the variables and integrating both sides to get Z Z 1 1 dq = dp =⇒ ln |q| = ln |p − 26| + c =⇒ q = A(p − 26), q p − 26 for some arbitrary constant, A. Then, using the fact that the equilibrium price is 14 and the equilibrium quantity is 6, we can see that A must satisfy the equation 6 = A(14 − 26)
=⇒
A=−
6 1 =− . 12 2
343
B
B. Solutions to the sample examination paper
Putting this all together, we then see that we have q = q D (p) where p q D (p) = 13 − , 2
B
is the sought after demand function. (b) The producer surplus is given by ∗ ∗
PS = p q −
Z
q∗
pS (q) dq,
0
where p∗ and q ∗ are the equilibrium price and quantity, and pS (q) is the inverse supply function. So, using the information given in the question, we have 36 = (14)(6) −
Z
6
(aq + b) dq
q2 48 = a + bq 2
=⇒
0
6
=⇒
48 = 18a + 6b,
0
or, indeed, 8 = 3a + b as our first equation for a and b. Another equation that needs to be satisfied is 14 = 6a + b, as the equilibrium quantity must give the equilibrium price when we use the inverse supply function. We can easily solve these equations for the constants a and b by subtracting one from the other to get a = 2 and then, using the first equation again, we get b = 2. Consequently, we have pS (q) = 2q + 2 so that q S (p) =
p − 1, 2
is the supply function for this market.1 (c) In the presence of an excise tax of T , the supply function becomes 1 qTS (p) = q S (p − T ) = (p − T ) − 1, 2 whereas the demand function is unchanged, i.e. qTD (p) = q D (p). 1
Of course, an alternative method here would be to observe that the supply function is a straight line and so the producer surplus is the area of a triangular region whose height is p∗ − b and whose width is q ∗ . This means that, if we find the area of this triangle, we have 36 = 21 (14 − b)(6)
=⇒
14 − b = 12
=⇒
b = 2.
Then, again using the fact that equilibrium quantity must give the equilibrium price when we use the inverse supply function, we use b = 2 to see that 14 = 6a + b
=⇒
a=
14 − 2 = 2, 6
so that, once again, we find that pS (q) = 2q + 2 is the supply function for this market.
344
so that q S (p) =
p − 1, 2
This means that, in the presence of the excise tax of T , the new equilibrium price is given by qTS (p) = qTD (p)
=⇒
p 1 (p − T ) − 1 = 13 − 2 2
=⇒
p = 14 +
B
T , 2
and, using qTD (p) say, we see that the new equilibrium quantity is 1 T T 14 + =6− . q = 13 − 2 2 4 We can now find the tax revenue, R(T ), which is the tax per unit, T , multiplied by q, the amount being sold in the presence of the tax, i.e. we have T2 T = 6T − . R(T ) = T q = T 6 − 4 4 To see where this is maximised, we start by noting that R(T ) has a stationary point when R0 (T ) = 0, i.e. when 6−
T =0 2
=⇒
T = 12,
and since R00 (T ) = −1/2 < 0 this turning point is indeed a maximum. Thus, the tax revenue is maximised when T = 12. Question 3. (a) The first-order partial derivatives of f (x, y) are fx (x, y) = 2x − 2
and
fy (x, y) = −3y 2 + 2y + 1.
At a stationary point, both of these first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous equations 2x − 2 = 0
and
− 3y 2 + 2y + 1 = 0.
But, the first equation gives us x = 1 and the second equation gives us 3y 2 − 2y − 1 = 0
=⇒
(3y + 1)(y − 1) = 0
=⇒
y=−
1 or 1. 3
Consequently, the points (1, −1/3) and (1, 1) are the stationary points of this function. The second-order partial derivatives of this function are fxx (x, y) = 2,
fxy (x, y) = 0 = fyx (x, y)
and
fyy (x, y) = −6y + 2,
and, as such, the Hessian is given by H(x, y) = (2)(−6y + 2) − (0)2 = 4(1 − 3y). Evaluating this at each of the stationary points we then find that:
345
B. Solutions to the sample examination paper
At (1, −1/3), the Hessian is H(1, −1/3) = 4(2) > 0
B
and
fxx (1, −1/3) = 2 > 0,
so this is a local minimum. At (1, 1), the Hessian is H(1, 1) = 4(−2) < 0, and so this is a saddle point. Thus, the stationary points (1, −1/3) and (1, 1) are a local minimum and a saddle point respectively. To see where the function is convex, concave or neither we note that the Hessian is given by H(x, y) = 4(1 − 3y) and fxx (x, y) = 2, and so we see that: When y > 1/3, H(x, y) < 0 and so the function is neither convex nor concave. When y ≤ 1/3, H(x, y) ≥ 0 and fxx (x, y) ≥ 0 and so the function is convex.
The function doesn’t have a global minimum or a global maximum because, if we consider the behaviour of the function when x = 0 we have f (0, y) = −y 3 + y 2 + y, and as such, we see that: As y → ∞, f (0, y) → −∞ and so f (x, y) cannot have a global minimum. As y → −∞, f (0, y) → ∞ and so f (x, y) cannot have a global maximum. (b) To solve the given non-homogeneous second-order differential equation, we follow the method in Section 8.3.2. In particular: The corresponding homogeneous second-order ODE is y100 − 2y10 + y1 = 0, and so the auxiliary equation is k 2 − 2k + 1 = 0
=⇒
(k − 1)2 = 0,
which has one real solution given by k = 1 (twice). Consequently, the complementary function, yc (t), is y1 (t) = (At + B) et , for arbitrary constants A and B.
346
The right-hand-side of the given ODE is et and our first reaction in this case would be to take yp (t) = α et where α is a constant that has to be determined. But, this won’t work as, taking A = 0 and B = α, we see that this is ‘part’ of the complementary function. As such, we ‘multiply by t’ and try yp (t) = αt et which won’t work either because, taking A = α and B = 0, we see that this is ‘part’ of the complementary function as well. Consequently, we multiply by t again and try yp (t) = αt2 et which, thankfully, will work because it is not ‘part’ of the complementary function. So, differentiating this using the product rule, we have yp0 (t) = (2αt) et +(αt2 ) et = α(2t + t2 ) et , and yp00 (t) = α(2 + 2t) et +α(2t + t2 ) et = α(2 + 4t + t2 ) et , which means that, substituting these into our ODE, we get α(2 + 4t + t2 ) et −2α(2t + t2 ) et +αt2 et = et Consequently, we see that yp (t) =
=⇒
2α et = et
=⇒
1 α= . 2
t2 t e, 2
is the particular integral we seek The general solution to our ODE is then given by the sum of its complementary function and its particular integral, i.e. we have y(t) = (At + B) et +
t2 t e, 2
where A and B are arbitrary constants. Then given the conditions y(0) = 1 and y(1) = 0, we have the equations 1 = B e0
and
0 = (A + B) e1 +
e1 , 2
respectively. The first of these gives B = 1 and then the second gives 0=A+B+
1 2
=⇒
0=A+1+
1 2
=⇒
3 A=− . 2
Thus, we find that 3 t2 t2 − 3t + 2 t y(t) = − t + 1 et + et = e, 2 2 2 is the particular solution we seek.
347
B
B. Solutions to the sample examination paper
Question 4.
B
Here the cost function is C(k, l) = vk + wl, and we want to minimise this subject to the constraint q(k, l) = Q where k, l > 0. So, writing the constraint in the form q(k, l) − Q = 0, we get the Lagrangean L(k, l, λ) = vk + wl − λ(q(k, l) − Q) = vk + wl − λ(k α lα − Q). and we seek the points which simultaneously satisfy the equations Lk (k, l, λ) = 0, Ll (k, l, λ) = 0 and Lλ (k, l, λ) = 0. As such, we find the first-order partial derivatives of L(k, l, λ), i.e. Lk (k, l, λ) = v − λαk α−1 lα , Ll (k, l, λ) = w − λαk α lα−1 and Lλ (k, l, λ) = − (k α lα − Q) , and set these equal to zero to yield the equations v − λαk α−1 lα = 0,
w − λαk α lα−1 = 0
k α lα − Q = 0.
and
We now solve these by eliminating λ from the first two equations, i.e. we get v − λαk α−1 lα = 0,
=⇒
λ=
=⇒
λ=
v αk α−1 lα
=
vk , αk α lα
=
wl , αk α lα
from the first equation, and w − λαk α lα−1 = 0
w αk α lα−1
from the second equation. As such, we can equate these expressions for λ to get vk wl = α α αk l αk α lα
=⇒
l=
v k. w
We then use this new relationship between k and l in the third equation, which is just the constraint k α lα = Q, to get r v α v α w α w 1 α 2α 2α Q=k k =⇒ Q = k =⇒ k = Q =⇒ k = Q 2α , w w v v and then, using this in the equation l = vk/w, we get r r v w 1 v 1 l= Q 2α = Q 2α . w v w Thus, these values of k and l minimise the cost of producing Q units. The minimum cost is then given by r r r r √ 1 w 1 v 1 w 1 v 1 ˆ 2α 2α 2α C(Q) = C Q , Q =v Q +w Q 2α = 2 vw Q 2α , v w v w as required.
348
l
de
cr
di
re ct
l
io ea sin n of g co st
To justify this, we note that the constraint k α lα = Q looks a bit like a rectangular hyperbola and, for k, l > 0, this is illustrated in Figure B.1(a). The objective function, C(k, l) = vk + wl has contours C(k, l) = c, where c is a constant, that are straight lines as illustrated in Figure B.1(b). The direction in which C(k, l) is decreasing is indicated in this figure along with the point we found above using the Lagrange multiplier method — i.e. a point where we have a contour of C(k, l) which is both tangential to the constraint and touching the constraint. Having seen this, it should be clear that this point will minimise C subject to the constraint.
O
k
O
k
(a)
(b)
Figure B.1: (a) The constraint q(k, l) = Q. (b) Adding three contours, C(k, l) = c, where
the direction in which C(k, l) is decreasing is as indicated. Clearly, we are interested in the point which is indicated in the figure. Using the given information, we can see that if Q is produced then the revenue generated will be R(Q) = pQ and the costs incurred will be √ 1 ˆ C(Q) = cQ + C(Q) + FC = cQ + 2 vw Q 2α + FC, which is the cost of the raw materials plus the costs of capital and labour plus any fixed costs the firm may have. As such, the profit function for the firm is √ 1 π(Q) = R(Q) − C(Q) = pQ − cQ − 2 vw Q 2α − FC, and we want to find the value of Q that maximises this. As such, we find that √ √ 1 1 −1 vw 1−2α 0 2α π (Q) = p − c − 2 vw Q =p−c− Q 2α , 2α α as the fixed costs, FC, are a constant and, setting this equal to zero, we find that 0
π (Q) = 0
=⇒
Q
1−2α 2α
p−c = α√ vw
=⇒
2α p − c 1−2α Q = α√ , vw
is the only stationary point. Indeed, notice that this value of Q is positive as p > c and α > 0. Furthermore, we have √ 2α vw 2α −1 00 π (Q) = − Q 1−2α , α 1 − 2α
349
B
B. Solutions to the sample examination paper
B
and as this is negative at the stationary point (since 0 < α < 1/2 implies that α > 0 and 1 − 2α > 0) we see that our stationary point is a local maximum. Thus, Q=
p−c α√ vw
2α 1−2α
,
is the value of Q that maximises the firm’s profit. Question 5. (a) Using the facts that ey = 1 + y +
y2 y3 y4 + + + ··· , 2! 3! 4!
and sin x = x −
x3 x5 + − ··· , 3! 5!
we find that, letting y = sin x, we have sin x
e
x3 x5 =1+ x− + − ··· 3! 5!
1 + 2! +
2 3 1 x3 x3 + ··· + + ··· x− x− 3! 3! 3!
1 1 (x − · · · )4 + (x − · · · )5 + · · · , 4! 5!
if we keep the relevant terms of the sin x series when we put them into the series for ey . Then, multiplying out the brackets and, again, keeping the relevant terms we get 3 x3 x5 1 x sin x 2 e = 1+ x− + − ··· + x + 2(x) − + ··· 3! 5! 2! 3! 3 1 x 3 + x + 3(x)(x) − + ··· 3! 3! x4 x5 + + + ··· , 4! 5! so that, tidying up, this gives us x3 x5 1 x4 sin x 2 e =1+ x− + − ··· + x − + ··· 6 120 2 3 1 x5 3 + x − + ··· 6 2 x4 x5 + + + ··· 24 120 which means that we have esin x = 1 + x + in terms up to x5 .
350
x2 x4 x5 + 0x3 − − + ··· , 2 8 15
(b) We make the substitution g = sin x so that dg = cos x dx and so we have
Z
=⇒
cos xdx = dg,
cos x dx = (1 − sin x)(2 + sin x)
Z
B
1 dg. (1 − g)(2 + g)
Thus, using partial fractions, we have 1 A B = + (1 − g)(2 + g) 1−g 2+g
=⇒
1 = A(2 + g) + B(1 − g),
so that, setting g = 1 we get A = 1/3 and setting g = −2 we get B = 1/3. Consequently, we have Z Z 1 1/3 1/3 dg = + dg (1 − g)(2 + g) 1−g 2+g 1 = − ln |1 − g| + ln |2 + g| + c 3 1 2 + sin x = ln + c, 3 1 − sin x
as the answer.
(c) To find the stationary points of the function f (x) we write it as x − x1/3 , f (x) = 12 and so we have
1 1 − x−2/3 . 12 3 0 The stationary points occur when f (x) = 0 and so we need to solve the equation f 0 (x) =
1 1 − 2/3 = 0 12 3x
=⇒
x2/3 − 4 = 0, x2/3
and this is satisfied when x2/3 = 4
=⇒
x2 = 64
=⇒
x = ±8.
To determine their nature, we find the second derivative of f (x), i.e. 1 2 2 00 f (x) = − − x−5/3 = 5/3 , 3 3 9x and we can see that If x = 8, we have f 00 (8) > 0 and so this is a local minimum. If x = −8, we have f 00 (−8) < 0 and so this is a local maximum.
Thus, the stationary points when x = 8 and x = −8 are a local minimum and a local maximum respectively.
351
B. Solutions to the sample examination paper
B
352