6 Things About Programming That Every Computer Programmer Should Know - Vincent North
June 1, 2016 | Author: vermuz | Category: N/A
Short Description
Every Programmer should know this stuff...
Description
6 Things About Programming That Every Computer Programmer Should Know by Vincent P. North
© 2015
Introduction: Just Call It “A Career’s Experience, Condensed Into Six Key Topics” By manner of introduction, I have been a professional computer programmer – and, software project manager – for about thirty-five years. I was fortunate to have begun that career when the “personal” computer had not quite yet arrived on the scene. (They were still, for the most part, a tantalizing future curiosity, then found only in the pages of RadioElectronics and Popular Electronics magazines.) “Computers” either filled airconditioned rooms or, more recently, were about the size of a breadbox. Although integrated circuits, including microprocessors, were a standard part of electronic design, computers at that time were neither small nor fast. “Computers,” also, were devices that other people owned: companies who could afford to buy the boxes and the airconditioned rooms to put them in. It was in this context that I first got my start. As you well know, the subsequent years have witnessed dramatic changes. Moore’s Law1 continues to hold true, as semiconductor manufacturers continue to astonish us all with how much more computing power they can somehow cram into a single tiny piece of sand. Builders of flat-screen and touch-sensitive displays also continue to astound us with what they can do, as have the builders of sound-chips and nearly-microscopic digital cameras. “We live in interesting times” that certainly show no sign of slacking off. Into this whirling dervish of a technological world, then, comes you, Gentle Reader.2 Tasked not only with keeping up with the hardware technology that will never allow itself to fully be kept-up with, but also with the task of computer programming itself. It has been said, by some, that computer programming is an “innate ability.” That you either “have it” (whatever “it” is …), or you don’t. But, I don’t agree. I never thought of myself as being “a natural” in what became my career. I was merely “naturally interested” in it, as I still am. I was interested enough, and found it to be engaging enough, that I persevered to learn how to do it. But I made a lot of dumb mistakes along the way which I would like to try to help you avoid. Most of all, my experience has shown me a list of “fundamental technical skills of a professional programmer,” which I have now sought to reduce to a succinct list of six items. The actual list is longer – maybe, much longer – but after some reflection, here are the six that in my humble opinion are key. These are the things – all of them technical abilities, procedures and perspectives – which you will use on the job most every day. These are six things for which, in my opinion at least, your command of these skills will make the greatest difference in your success and longevity in the craft. “Computer programming,” by the way, still is “a craft.”3 At this writing, it still requires “skilled work” and the product of experience. But we should recognize that this aspect also is changing. Thanks to the influence of open-source programs and cooperative development, computer programming is tilting in-part toward the assembly of new solutions based substantially on pre-existing components that the programmer(s) in
question did not themselves develop. This, itself, constitutes a new “fundamental technical skill” that is one of the six that you will now find in this book.
And So, And Without Further Ado: “Here Are The Six” As I said in the Introduction, the following is a list of what I consider to be “six fundamental technical skills” that every computer programmer needs to know. “Technical” means that these are things which you need to know and which you will apply when crafting (and troubleshooting) the computer software that you write, and/or that you (and your team) maintain, for your client or employer. These skills are not particular to any single size, type, or brand of computer hardware, and for the most part also are not limited to any computer programming language or tool. These also are not social nor organizational skills. (That would be a separate list, entirely.)
The List 1. Internal memory management and data structures. 2. Objects. 3. SQL Database Queries and Concepts. 4. Precise specification, strategy, and implementation. 5. “Front-End, Back-End.” User-interfaces and frameworks. 6. Pragmatic Debugging skills. This list is not “in any particular order,” although I will choose to address it in the sequence given. My treatment of each topic will also not be extremely detailed. Please understand that I am seeking to provide you with a 30,000-foot view, and to point you in specific directions from which you can pursue additional research on your own. This list is also “not a primer,” and I emphatically do not by using this phrase intend any negative slight to you. The topics that I will present here might well, section-bysection, require re-reading. (And, they might require clarification. Since this is an e-book, “we can do that.”) David Intersimone, the original director of development a (now, long-defunct …) Borland International, referred to this experience as “a sip from the fire-hose.” I find myself unable now to acknowledge that your superficial experience with regards to the forthcoming material might well be the same. However, as your Gentle Author, I hope that you will not in fact expect anything less from the text that you are about to consume. And so, with all that now said: “Let us begin.”
One: Internal Memory Management and Data Structures: Every digital computer … room-sized or pocket-sized … consists of the same three functional parts: 1. CPU = Central Processing Unit (the microprocessor, GPU, etc.). 2. I/O = Input/Output. 3. Main Memory. “Memory,” of course, consists of (today …) billions of individual storagecompartments, each one character (byte …) wide, each with an individual “address.” The CPU retrieves both instructions and data from memory, which is the only part of the computer system which is (must be …) [nearly …] as fast as the CPU itself. All modern operating systems – through interesting devisings that need not concern us here – are magically able to provide each executing program with the functional illusion that “they have some certain amount of ‘memory’ all to themselves.” They never have to worry about stumbling into “anyone else’s ‘memory,’” because no one else’s “‘memory’” is ever visible or accessible to them, unless both programs make special arrangements to “share” a certain part of it … which very-interesting topic I hereby simply declare to be “out of scope” for the purposes of our present conversation.4 Thus, we have, for each process5, “a play-pen all their own,” which they do not have to share with anyone unless they want to. However, as it turns out, this allocation of memory is not “pristine and undisturbed.” Every process runs under the auspices of an operating system6 which completely defines how each process actually perceives [its private view of …] “Main Memory.” This view, as it turns out, consists of exactly two things: 1. “The Stack”: Every process, from the operating system’s point of view, consists of “one subroutine.” The operating system launches the process by “calling” that one subroutine, and, when that one subroutine finally “returns to its caller,” the entire process ends. This subroutine, directly or indirectly, launches many other subroutines, each one of which “is called” and then, finally, “returns to its caller.” Each subroutine, during its finite lifetime, possesses some certain set of “local variables” which are peculiar to itself, such that, if the subroutine (by whatever means) happens to “call itself,” the local variables owned by each instance will be distinct. The entire portion of memory which is used to accomplish this feat is called, “the Stack,” because it has the effective functional organization of a set of dishes stored at the start of any cafeteria line. The “call-and-return” flow of control, and the storage of all local variables, is managed using this single area of storage. 2. “The Heap”: This is, quite simply, “everything else.” Storage in this area is not allocated automatically: it is obtained only “on request,” and it is likewise made available for re-allocation only “on request.” (It is rather-rudely called “the heap” because this area of storage has no inherent structure, unlike “the
(clearly, ‘push-down’) stack.”) Almost all of the storage that your process will actually use is taken from the heap, not from the stack. Furthermore, all references to this storage are actually indirect, accomplished through the use of “pointers.” A “pointer” is simply a variable whose value is understood to be a memory address. “The stack,” you see, is merely an amorphous pool of available storage. Therefore, in order to obtain the use of a chunk of however-many bytes, for whatever purposes you may devise, it will be necessary for your process to request it. The operating system will provide you with the memory address of a suitable area. In order to make use of the area, you must refer to it indirectly, using the address that you were given. “Pointers” are the base mechanism by which this trick is done. Most programming languages, however, clearly recognize that this “riding the pony bareback” strategy is fraught with quite-unnecessary danger. (If programs fail to comply with the operating system’s expectations, at any point or for any reason, the entire program will crash.) Hence, most languages implement a much more sophisticated memory management strategy, and hereafter I will elect to presume that your situation is more-or-less also like this one. Typically, modern programming languages blur the distinction between “a pointer” and “a value.” When you refer to “a variable” in your program, the language will automatically (and, transparently) deduce whether this variable’s “value” is the actual value, or a pointer to it, and in any case will respond accordingly (and, transparently). The language system will also transparently keep track of how many variables contain the addresses of (“references to …”) any particular piece of storage. They will, by some very clever mechanism, use this to discover when a given piece of storage is no longer being referenced by anyone. In this way, programs no longer have to be explicitly concerned with “cleaning up their own mess.” Blocks of storage will be allocated automatically and transparently, and, when they are no longer being used, they will be harvested and re-used. These, then, become the two basic “ground rules” upon which all processes (or, threads7) may, in most programming languages, depend: • All programs directly or indirectly consist of “subroutines,” each subroutine [instance …] of which have their own private copies of “local variables,” which come into existence when they do, and disappear when they do. • For all other purposes of necessary memory allocation, programs may simply “ask” for new storage to be allocated – and may blissfully ignore where, exactly, any particular piece of storage actually is. They may use the storage until they no longer have use of it, and then simply abandon it. When everyone has abandoned it, the storage will be silently and reliably re-used. Upon the foundations of this entire “to be assumed” memory infrastructure, provided for them gratis by the programming-language system which they use, all computer
programs must, sufficiently for their own purposes (whatever they are …) construct a foundation which is sufficient for whatever-it-is that they are supposed to do. This means, essentially, two things: 1. Any incoming (or computed) data used by the program must be stored in such a way that the program, during its execution, can (of course …) obtain it again. 2. The program must be capable of handling a variable and unpredictable quantity of data. There must be no “pre-conceived limits” as to just how many copies of data might be stored (or, storable). Ordinarily, these chores are delegated to pre-existing storage strategies which are an intrinsic part of whatever language system is being used. There are usually two types: those which store a value under a particular “key” (requiring that exact key to be provided in order to retrieve it again), and those which store an arbitrarily-sized list of (zero or more) values. These two are frequently used in combination, to allow “zero or more” values to be referenced by any unique key: each element of the keyed data-store (such as a “hash” or “tree”), refers to a separate list. Let us now consider what sort of things can go wrong with these arrangements. What sort of things can cause a program to misbehave, or fail? Here are the mostcommon culprits: 1. Stack Overrun, caused by “endless recursion”: As I said earlier, “the stack” is the portion of memory that’s used to manage subroutine calls. When a subroutine is called, information is stored in the stack to facilitate returning from the subroutine, and the subroutine instance’s local variables are also stored there. The stack is of a limited size. Therefore, it is possible to overrun the boundaries of the stack if too many nested subroutine calls are made. Pragmatically, this means that subroutines are calling themselves, so-called “recursively,” without ever returning from any of those calls. The effect of this sort of bug on a program is instantly fatal, but relatively easy to debug. 2. Heap Corruption (“the infamous ‘Double Free’”): This always-fatal problem is caused by corruption of the internal data structures which manage the allocation and return of memory in the Heap. There are two routines: a malloc() routine, which requests a block of memory of a specified size (returning an address), and a free() routine, which releases a block of memory at a specified address (which must have previously been obtained from malloc() ). Programs are required to free() only addresses that they obtained from malloc(), and to free() any particular address only once. They are also required to constrain their memory-modifications only to the range of addresses given, never modifying any adjacent bytes. Most modern programming languages protect from these types of problems by managing the low-level malloc() and free() calls themselves. 3. Heap Exhaustion (“Memory Leaks”): Programs are required to timely
release any storage that they are no longer using. Most modern programming languages take care of this chore through some mechanism which detects automatically when a particular storage block is no longer being referenced, but so-called “leaks” can occur when, for instance, a series of storage-blocks all contain references to one another, but there remain no other references elsewhere to any of those blocks. (Since all of the blocks are still “referenced,” they never get released.) Heap exhaustion can also be caused by inefficient program design. 4. Failure to detect when a storage-allocation request could not be satisfied: When a malloc() request cannot, for whatever reason, obtain the amount of storage requested, it will typically return “zero” – a special value also known as NULL. Programs should be detect if this occurs, and respond accordingly, but they rarely do. 5. Exhaustion of fixed-size storage arrays: Some early programming languages do not allow storage to be dynamically allocated from the heap. Instead, the programmer must specify a fixed size for the structure. Programs are supposed to determine if the space within these fixed structures has been exhausted, but they rarely do. Programming languages also usually do not detect that a reference has been attempted which lies outside of the proscribed boundaries of the structure. The usual consequence is stack or heap corruption. Memory issues are a common source of problems in software that is in the process of being developed, but they are much less common in programs that are in production.
Two: Objects Many of the influential books in computer science literature share a common characteristic: they are small. Certainly one of the most important of these was entitled Algorithms + Data Structures = Programs, by Dr. Niklaus Wirth.8 Truly, the title of this book says it all. Any computer program consists of “algorithms” (the step-by-step execution of instructions that is called-for), applied to “data structures” such as the ones alluded to in the preceding section of this book. In early programming languages, these two concerns (“algorithm” and “data”) were addressed separately, in different parts of the program. (The COBOL language was a particular example, defining all (fixed …) data structures in the so-called “DATA DIVISION,” and all algorithms in the “PROCEDURE DIVISION.”) This is not so much of a problem with regard to the local variables that might be associated with a particular procedure or function, but it is a very vexing concern with regard to the “global” storage that is used by multiple procedures and functions throughout a program. In a word, the problem is that the two things are separated: the data structures which are manipulated by algorithms, are separate from the algorithms which manipulate the data structures. If decisions need to be made as to which algorithm should be applied to which data, these decisions wind up being redundantly scattered throughout the entire program. To address this concern, the notion of “objects” was invented. An “object,” for our purposes, is a self-describing piece of storage, allocated from the heap. It contains, not only space for the individual values (“properties”) which might need to be stored there, but also additional descriptive data (“metadata”) which serves to directly associate the object with the procedural code (“methods”) that are designed to operate in conjunction with it. Significantly, given a particular object and a request to apply a particular function against it (a so-called “method call”), the computer is able to determine which function is the correct one to call … based only on the metadata contained within the object itself. The exact mechanisms by which this determination is made are concealed from the programmer, but they are very efficient. The paradigm that is usually quoted is: “Hey, you! Do this!” Whereas, in a conventional programming language, a specified subroutine would be called and a reference to the data would be supplied to it as a parameter, in an object-oriented programming language the primary reference is to the object (“Hey, you!”) which is then instructed to call one of its methods (“Do this!”). The actual sequence of events that subsequently takes place may vary from object to object, and from one method-call to the next, because the decision is made literally on-the-fly.9 Since late-binding is the fundamental characteristic of any object-oriented programming system, there are many approaches that existing languages use to obtain it. Some languages are designed “strictly from the ground up” to use an object-oriented
approach, whereas other languages permit object-oriented and conventional (procedural) techniques to be used in the same programs at the same time. Languages also differ – sometimes, quite markedly – in exactly what features they do and do not offer. A key notion of any object-oriented system is concealment. Objects are said to “expose” certain methods (“functionality that can be ‘called’”) and properties (“values associated with the object which can be examined or set”). (The act of examining or setting the value of a property might cause procedural code to be executed – so-called “side effects.”) The objects, in the act of “exposing” what they do and what can be done to them, also “conceal” the details of how they do it. This is done so that different and derivative definitions of an object can be devised, all of which appear (and appear to act) “the same” to their clients – to the other parts of the program which may use and reference them – while perhaps having many entirely-different implementations. This “what, not how” approach avoids unnecessary dependencies – “coupling” – between different flavors of objects, and also between the objects and their clients. Object-oriented languages are designed to allow you to define a taxonomy of related object definitions. For example, a so-called “base class” of vehicle could be defined, from which are derived “subclasses” such as automobile, motorcycle, and truck. Each subclass “inherits” the properties and methods of its base class (e.g. the method drive() and properties such as color and number_of_wheels), but implements them in different ways. Subclasses can completely “override” (replace …) the properties and methods of their ancestors, or can augment them. Some languages support “multiple inheritance,” where a subclass can inherit characteristics from more than one base class. (And sometimes, when the language allows these multiple base classes to define the same methods and properties, the results can be quite interesting. The language will always define some kind of “bright-line rule” to determine exactly what will actually happen.) Every object-oriented language will define one or two special methods that can be applied to any object, calling these constructors and destructors. A “constructor” is an initialization subroutine, called by the language immediately after memory has been carved-out for a new object instance. A “destructor” is a cleanup subroutine, and a companion to the constructor, being called by the language immediately before the object memory is freed. Constructors and destructors of subclasses usually are obliged to call the constructors and destructors of their ancestors, and to do so at particular points. Object-oriented languages openly encourage the use of the phrase, “… is a.” A car object “is a” car, and a dune_buggy object “is a” dune buggy as well as a car. The same is true of a sedan. If the client of a particular object only wishes to do something that both a sedan and a dune-buggy can do equally well (how boring …), then that client doesn’t need to know or care exactly what variety of car he is using. (The thirty-nine cent word for this idea is, “polymorphism.”) If the client wants to ask the object to do something that a sedan and a dune-buggy would carry out in different ways, then the client also doesn’t need to be concerned with these differences of implementation. There are many fundamental advantages to object-oriented languages, and very little
runtime cost. But programs are subject to certain design difficulties once they have been in service for a number of years, mostly due to the “inheritance” schemes aforementioned. As long as the business requirements do not change in any way that is not perfectly reflected in the object inheritance stratagem originally devised for the program, such programs can have a very long service-life, indeed. However, if requirements do change fundamentally, inheritance can become an intractable form of “coupling” between the various subclasses which are derived from a common ancestor.
Three: SQL Queries and Concepts SQL (Structured Query Language) is the now-ubiquitous lingua franca of the database world. With few (legacy …) exceptions, all database systems in use today are engineered to use it. Although the implementations are not the same, they are at this point sufficiently similar that I can now make several important and general statements about them. The SQL language was invented as part of the SEQUEL project at IBM, based in part on a pioneering paper by Dr. E. F. Codd, which was published in the June, 1970 issue of the Communications of the ACM. A key feature of this language was that it was essentially declarative. SQL allows you to specify what data you wish to obtain. It is up to the database engine to, on the fly, devise a plan for obtaining these answers, and then to do so. The database engine may exploit characteristics of the database – such as indexes that have been created on certain fields – in order to produce its answers more efficiently, but an SQL query does not specify how the work is to be carried out. Information in an SQL database is organized into tables, which contain an unlimited number of rows. Each row consists of an identical set of columns, each of which (usually …) contains either a single value (of a single specified data type), or “no value at all.” (When this is the case, we say that the column IS NULL.) The rows in the table are in an unspecified order, although you can request that query-results should be returned to you sorted in any order you wish. Conceptually, every SQL query (the SELECT statement) consists of a specification of the following: 1. A list of the columns whose values you want to see. (If the same column name appears in more than one table, you must be specific.) 2. The tables from whence the data is to come, and how these tables are related to one another for the purposes of this query. (See below …) 3. The selection criteria (WHERE clause …) that is to be used. 4. If you’d like to receive summary statistics, specify what the data should be GROUPed BY. 5. If you’d like to receive only certain summary rows, specify what characteristics the groups-of-interest should be “HAVING”. 6. If you’d like to receive the results sorted in a particular way, specify what the rows should be ORDERed BY. SQL queries can obtain results from any number of tables at a time. The query specifies how the rows in the various tables should be considered to be related to one another. This relationship is expressed by the presence of identical values in one-or-more specified columns in any pair of tables. For example, a customer_id field in an Orders table is sufficient to enable any sort of information about that customer to be retrieved from a Customers table in the same query. This is referred to as a join.
There are three types of joins that can be used: “inner joins,” which return only rows which have identical values in both tables, and “left” or “right” “outer joins,” which always return all of the rows from one of the tables or the other. (An “inner” join against Orders and Customers would return Orders that are associated with Customers, while an “outer” join might return [left outer-join] “all Orders, with or without Customers,” or [right …] “all Customers, with or without Orders.”) Since SQL queries do not specify how the database engine is to obtain the specified results, it is very important to understand how your queries will be interpreted. It is certainly possible to write two different queries that will produce the same results, but that will do so in dramatically more- or less-efficient ways. Most database systems provide an EXPLAIN command which will tell you (in rather arcane, system-specific terms) exactly how the database engine would go about carrying out a particular query. A very significant problem with SQL queries, in too-typical deployed applications, is that “the web server” (or, whoever is issuing a particular query …) “can do anything and everything.” Every SQL server has some kind of permissions-system which specifies exactly what any user is and is not permitted to do. If some web-site hacker is, by whatever means, able to persuade your web-server to issue the DROP TABLE (or even the DROP DATABASE(!!)) command, and your web-server is authorized to issue such a command, then … (at least a very-significant part of) your database just disappeared. ‘Nuff said. When you deal with SQL databases, you must also deal with the issue of concurrency. On a typical database server, hundreds of queries might be executing at the same time, and these queries may or may not be specifically concerned with what the other queries are doing. (For instance, if a user is merely “browsing a product catalog,” it’s essentially a certainty that the catalog’s contents won’t be changing at the time. Accounting data, however, is a different matter.) SQL database systems have a specific strategy for dealing with this issue: transactions.10 A “transaction” is defined as a single unit of work … it could be a set of modifications, deletions, and/or updates, or it could simply be a set of queries … which is considered to be atomic. That is to say, “a single, indivisible group.” In the case of modifications, either the entire set of modifications “happens,” or, “none of them do.” In any case, a transaction has some specified degree of isolation from every other transaction that is occurring at the same time. For example, an accounting report might need to secure a “snapshot” view of a very-busy database as it existed at a particular instant in time.
Four: Precise Specification, Strategy, and Implementation As a professional computer programmer, you will always be confronted with the task of “turning user-requirements into 1’s and 0’s.” But first, you will be confronted with the realization that the user-requirements which you have been given, are not yet suitable to be treated in that way! People describe their requirements in human and business terms. To them, “it is an Invoice,” and everything in their existing experience concerning “invoices” is: “assumed,” “implied,” “well, of course,” and perhaps worst of all, “I forgot to mention.” As I discuss in my e-book, Managing the Mechanism,11 when you build computer software, you are building a self-directing machine. It is a machine composed of if/then/else decisions, variables, internal and external state. A machine composed of literally billions of moving parts, all potentially interconnected to one another and so influencing one another. In true binary fashion, such a mechanism is either “( ) correct,” or “( ) not.” There is nothing in-between. Ordinarily, specifications will come to your team from “business analysts,” whose job it is to flesh-out the description of the project to fully encompass all of the relevant business implications – including the implications that the stakeholders themselves probably didn’t think about. But even these will be business requirements, not yet expressed in terms of new source-code that is to be written, nor changes that are to be made to existing code. The worst thing that a programmer can do at this point, is to do what is ordinarily done at this point: to “just start ‘writing code.’” Effectively, making it up as they go along. Addressing each problem or issue as it is encountered. The key problem with this “‘lack-of strategy’ strategy” is simply that the pieces of source-code that you are writing now must interact with other pieces that have not yet been developed or even designed! Likewise, if a thorough analysis of the existing source-code base has not first been done, there is an overwhelming probability that the new code won’t mesh properly with it. The incompatibility will be discovered at the worst possible time: when the previously-written material is being put into service. Software-writing is not – must not be – “a voyage of discovery.” No one in their right mind sets sail from a harbor or takes off from an airport without a plan; a plan that specifically includes contingencies. Sure, the overall plan usually consists of a number of shorter “hops,” but careful attention is placed upon anticipating the course of the project and looking for anything that might interfere with its success. This form of project-planning, in a very real sense, is “code-writing.” Figuring out what code needs to be written, what it needs to do, what inputs and pre-conditions it will encounter, and exactly what outputs and responses it must produce in all cases, truly is “the hard part” of developing computer software. By comparison, “writing source-code and getting it to compile” is entirely secondary.
Five: “Front-End / Back-End.” User Interfaces and Frameworks. All real-world production applications will be found to have a “multi-tier” architecture. They will involve the interaction of “the machine in the customer’s hands” (or, on her desk …), connecting to some server(s) which are responsible for performing all or part of the work. Each of these servers, in turn, may communicate with other servers. “AJAX” is the term that is most-commonly used to describe the interaction between the JavaScript on a typical web-page (which is executing on the user’s own computer), and remote HTTP servers. This is a specific case of a technique called “IPC = InterProgram Communication,” “RPC = Remote Procedure-Calls,” and/or “Client/Server.” In the case of AJAX, the JavaScript “front-end” code is making specific requests to the “back-end” web server. Although these requests are being made using the HTTP (or HTTPS) protocol, they are not requests for HTML content. Instead, the front-end creates a packet of information – similar to filling-out a standard form – and sends it via HTTP to the server, which carries out the request and returns another packet of information describing the outcome. Hence the term, “remote procedure-call.” The “front-end” “client” effectively asks the server to call a subroutine (a “procedure”), and to return to the client the results thereof. Since the protocol that is being used is HTTP, both the request and the response must be encoded in such a way as to be compatible with the requirements of HTTP. This is typically done using techniques such as “JSON” and “YAML.” Other characteristics of HTTP, such as “cookies,” are used to maintain state and to identify the source of the requests. In larger systems, a format called “XML” is used, and the process is altogether much more formalized. In this arrangement, called “SOAP,” the communicating computers might not “know” each other but might “discover” one another. Formal methods have been devised (“WSDL,” etc.) by which the computers can broadcast their capabilities to one another as part of this process. In all of these cases, the work that is being done is both very-complex and veryfamiliar. “It’s been done a thousand times before.” Therefore, developers have concocted and perfected many frameworks which can be used as foundations for building substantial parts of the application. Every time you work on an application such as these, you will encounter a framework – most likely, a different one than you have ever used before. So it goes. Frameworks are also used to construct front-end user interfaces. Some toolkits are used to gloss-over the differences between web browsers. Others gloss-over the differences between different types (and brands) of mobile devices. Each framework makes it very easy to do certain things – to create certain visual effects on a device, for example. Be warned, however, that the results are very beguiling. It’s easy to confuse “apparent progress” (which appears to happen very fast indeed) with
the not-so visible supporting software infrastructure which must be built under the surface. It is also very easy to “fall in love with what you’ve done,” only to discover that a different approach or presentation might work better. Once again, these discoveries are often made at inopportune moments, and require sometimes deep-seated and far-reaching changes to the system, which quickly de-stabilize it.
Six: Pragmatic Debugging Skills Debugging seems like an arcane art. (Done well, it seems like voodoo.) However, in its entirety, “debugging” is only one facet of a much larger concern: keeping defects from getting into the code in the first place, and making them “OIL = obvious, identifiable, and locatable” when they do. This is a concern that should permeate your software-writing at many levels. It will cause you to add things to your code that your colleagues might call “unnecessary” or “inefficient.” (But let the record show that you spend far less time “debugging” than they do!) Done properly, this will result in robust software that keeps your pocket-pager silent all night long. The first principle that I will now offer is that: “the computer software, itself, is actually the only party that is truly in the position to detect a defect within itself.” The first and hardest step in troubleshooting any piece of software is literally to be aware of the defect’s existence. Therefore, “code suspiciously.” Subroutines should check all of their inputs and their assumptions. In any chain of if/then/else logic where “one of these cases must match,” always remember to add one more case: “the case that can’t possibly happen, but just did.” If you don’t do this, it’s probably impossible for anyone to know. So, when your “suspicious” code now finds that “something which can’t possibly happen, just did,” what’s it supposed to do about it? The best strategy is to “throw an exception.” This is similar to dividing-by-zero, or yelling “Fire!” Execution stops in its tracks, and the programming language looks for an “exception handler” (which you’ve set up in advance). When it finds one, control is immediately transferred to that handler (never to return …), and an “exception object” describing the incident is handed to it. Among this information is the exact location from whence the exception was “thrown.” Your suspicious programming, having discovered a fire, has just notified the firedepartment. Another excellent debugging technique is to log informative progress-messages to some kind of file, or to an event-log mechanism provided by the operating system. Generation of debugging messages is often an option that can be turned on or off. Many programming languages provide a feature called assertions. These take the form of a subroutine, usually named , which is given a parameter that “must be true.” If the parameter is not true, an exception is automatically thrown. Furthermore, this feature is implemented in such a way that it can be turned-on or turned-off; included in the program or omitted from it. Sprinkle your source-code liberally with these assertions. A final useful debugging-technique is a trace table. This is simply an array or buffer, of some certain size, into which in-progress messages are recorded. The trace-table is circular: new messages replace the oldest ones. (Each entry is typically timestamped.) The advantage of this technique is that it is very fast, because it doesn’t involve input/output. Facilities must be provided to extract the present content of the trace table.
Debugging, unabashedly, “is detective work.” But these two techniques that I have now discussed will greatly improve the effectiveness of this process. By making the program suspicious of its own behavior, you improve the odds that defects will be discovered and corrected early. By building a chronology of what happened “recently,” you make it easier to discover the internal-state of the system which enabled the defective behavior to occur. All production systems should also be accompanied by a comprehensive “test suite,” the purpose of which is to exercise and re-exercise components of the system at various levels. The test-suite is run and re-run constantly, and both successful and unsuccessful outcomes are logged. If a source-code change is introduced (or is about to be introduced) which causes a test-case to fail … well, “forewarned is forearmed.”
In Closing: Well then, there you have it. “My six.” And, along the rambling way, my pragmatic recommendations. Most if not all of the topics that I have quickly described in this little book will call for further exploration on your part, and I hope that I have succeeded in setting the stage of understanding for you to do so. Computer programming has changed enormously over the past sixty years and counting, but in many ways it has changed not-at-all. We’re still writing instructions for electronic machines to (unthinkingly) carry out, and the process requires a lot of human thought. “Capturing the big picture, in spite of the myriad details,” is something that can easily become lost in the shuffle. I hope that these words have helped you in some small way, and welcome your comments, reviews and feedback. Vince North
1 The observation, attributed to Dr. Gordon Moore, that semiconductors would double in speed and density every two
year.
2 A term most-likely originally coined by Charlotte Brontë in her book, Jane Eyre: “Gentle Reader, may you never feel
what I then felt! May your eyes never shed such stormy, scalding, heart-wrung tears as poured from mine …” Oh yes, Ms. Brontë had quite the way with words.
3 Wikipedia defines “craft” as: “a pastime or a profession that requires particular skills and knowledge of skilled work.”
4 Hey, I’m the Author here. I can do that. Please, read on …
5 All right, all right. I have no pragmatic choice, at this point, but to impose an important technical term: “process,”
where previously I said, “executing program.” On almost any computer today, you can run more-than-one copy of ‘the same’ program at the same time, just as easily as you can run ‘different programs.’ Operating systems routinely call “each distinct instance of ‘, running here’” … “a ‘process.’”
6 The “operating system” is the foundational layer of software which governs the operation of the entire computer
system. Unix®, Linux®, OS/X®, Z-OS®, etc. are all examples of this. These create and manage the operating environment under which all processes ultimately operate, and define and implement the entire world that is available to them.
7 A “thread” is an independent thread of execution running within the auspices of a single process. For our purposes
now, simply think of it as having “its own ‘stack.’”
8 ISBN: 978-0-13-022418-7.
9 This characteristic is sometimes called, “late binding.”
10 Not every SQL database system supports transactions. Most do, but some “have strings attached.” For instance, the
ever-popular MySQL database only supports transactions if the “InnoDB” physical file-format is used.
11 North, Vincent P., Managing the Mechanism: Why Software Projects Aren’t Like Any Other Project You’ve Ever Tried
to Manage (And How To Do It Successfully). Published 2012. ISBN: 978-0-715-74383-7.
View more...
Comments