Chapter 7 shifts focus from classes (Chapter 6) to the characteristics that define a good routine. (For design principles, see Chapter 5; for defensive programming, Chapter 8; for routine creation steps, Chapter 9.)
A “routine” is defined as an individual method or procedure invoked for a single purpose (e.g., C++ function, Java method, Visual Basic function/sub procedure). Many quality techniques apply to these variants.
What is a high-quality routine? It’s easier to show what it is not. Here’s an example of a low-quality routine:
void HandleStuff( CORP_DATA & inputRec, int crntQtr, EMP_DATA empRec,
double & estimRevenue, double ytdRevenue, int screenX, int screenY,
COLOR_TYPE & newColor, COLOR_TYPE & prevColor, StatusType & status,
int expenseType )
{
int i;
for ( i = 0; i < 100; i++ ) {
inputRec.revenue[i] = 0;
inputRec.expense[i] = corpExpense[ crntQtr ][ i ];
}
UpdateCorpDatabase( empRec );
estimRevenue = ytdRevenue * 4.0 / (double) crntQtr;
newColor = prevColor;
status = SUCCESS;
if ( expenseType == 1 ) {
for ( i = 0; i < 12; i++ )
profit[i] = revenue[i] - expense.type1[i];
}
else if ( expenseType == 2 ) {
profit[i] = revenue[i] - expense.type2[i];
}
else if ( expenseType == 3 )
profit[i] = revenue[i] - expense.type3[i];
}This routine has at least 10 problems:
- Bad name:
HandleStuff()is uninformative. - Undocumented: Lacks explanation.
- Bad layout: Poor physical organization, inconsistent styles.
- Input variable changed:
inputRecis modified despite being an input; it should beconstin C++ if truly input. - Reads/writes global variables: Accesses
corpExpense(reads) andprofit(writes) instead of using parameters. - Multiple purposes: Initializes variables, updates database, performs calculations – these are unrelated. A routine needs a single, clear purpose.
- No error defense: Fails on
crntQtr = 0(divide-by-zero). - Magic numbers: Uses literal values like
100,4.0,12,2,3without explanation. - Unused parameters:
screenXandscreenYare not used. - Incorrect parameter passing:
prevColoris a reference parameter (&) but isn’t assigned, implying an unintended output. - Too many parameters: Has 11 parameters, exceeding the understandable limit of about 7, and they are poorly ordered and undocumented.
The routine is the single greatest invention in computer science (aside from the computer itself). It makes programs easier to read, understand, saves space by avoiding code repetition, and improves performance by centralizing optimizations. Abusing this powerful tool with low-quality code is a “crime.”
Many programmers, especially early in their careers, might only see routines as a way to avoid duplicate code, making development, debugging, and maintenance easier. However, this is an incomplete explanation of the theory and practice of routines; there are many more valid reasons and correct approaches to creating them.
Valid Reasons to Create a Routine
While reasons overlap, here are solid justifications for creating routines:
- Reduce Complexity: The paramount reason. Routines hide implementation details, allowing you to use them as abstractions without knowing their internal workings. This intellectual simplification is vital for managing complex programs and also aids in minimizing code size, improving maintainability, and correctness. Deeply nested loops or conditionals within a routine are indicators for extracting new routines to reduce complexity.
- Introduce an Intermediate, Understandable Abstraction: A well-named routine effectively documents its purpose, introducing a higher level of abstraction than raw code. This makes code more readable, easier to understand, and reduces complexity in the calling routine. Example:
leafName = GetLeafName(node)is clearer than eight lines of nested logic. - Avoid Duplicate Code: The most common reason. Duplication often signals a design error. Consolidating duplicate code into a single routine (possibly in a base class for specialized versions) saves space, makes modifications easier (one place to change), and improves reliability (one place to verify correctness).
- Support Subclassing: Short, well-factored routines are easier to override in subclasses than long, complex ones, reducing error potential in derived implementations.
- Hide Sequences: Encapsulate the necessary order of operations within a routine. For instance, if you have two lines of code that read the top of a stack and decrement a
stackTopvariable, put those two lines of code into aPopStack()routine to hide the assumption about the order in which the two operations must be performed. - Hide Pointer Operations: Pointer manipulations are often hard to read and error-prone. Isolating them in routines clarifies intent, ensures correctness in one place, and allows for easier future changes to data types without impacting widespread code.
- Improve Portability: Routines can isolate non-portable code (e.g., nonstandard language features, hardware/OS dependencies), explicitly identifying and confining future portability work.
- Simplify Complicated Boolean Tests: Embed complex boolean logic into a well-named function. This moves detail out of the main code flow, provides self-documentation via the function name, highlights its significance, and encourages making the test’s internals readable.
- Improve Performance: Centralizing code into a routine allows optimization in one place. This makes profiling easier and ensures a single optimization benefits all users of the routine. It also makes it practical to rewrite the routine with a more efficient algorithm or in a faster language. (Note: The goal isn’t necessarily to make all routines small).
Operations That Seem Too Simple to Put Into Routines
A common mental block is reluctance to create routines for very simple purposes (e.g., 2-3 lines of code). However, small routines offer significant advantages:
- Improve Readability: Even a simple calculation (e.g.,
points = deviceUnits * (POINTS_PER_INCH / DeviceUnitsPerInch())) becomes much more readable and self-documenting when encapsulated in a well-named function likepoints = DeviceUnitsToPoints(deviceUnits). - Ease Maintenance and Extension: Small operations often grow more complex during maintenance. If the simple calculation above was duplicated in a dozen places, handling a potential division by zero (e.g.,
DeviceUnitsPerInch()returning 0) would require 36 new lines of code. Encapsulated in one routine, it only requires 3, demonstrating how small routines drastically simplify future modifications and error handling.
Design at the Routine Level
While abstraction and encapsulation are key at the class level, cohesion remains the primary design heuristic for individual routines.
Cohesion refers to how closely related the operations within a routine are (also called “strength”). The goal is for each routine to do one thing, and do it well, nothing else. Highly cohesive routines lead to higher reliability; studies show them to be significantly more fault-free and less costly to fix than low-cohesion routines.
Understanding different levels of cohesion helps improve routine design, though the concepts are more important than memorizing terms:
- Functional Cohesion (Strongest/Best): Occurs when a routine performs one and only one complete operation(e.g.,
sin(),GetCustomerName(),CalculateLoanPayment()). This assumes the name accurately describes its sole function. - Less Ideal Kinds of Cohesion:
- Sequential Cohesion: Operations must be performed in a specific order, share data from step-to-step, but don’t form a complete function together. Example: A routine that calculates age and then uses that age to calculate time to retirement. To improve, split into separate, functionally cohesive routines (e.g.,
ComputeAge()andComputeTimeToRetirement(), where the latter might call the former). - Communicational Cohesion: Operations in a routine simply use the same data but aren’t related otherwise. Example: A routine that prints a report and then reinitializes the same data. To improve, split these unrelated operations into separate routines.
- Temporal Cohesion: Operations are grouped because they all happen at the same time (e.g.,
Startup(),Shutdown()). While sometimes seen as problematic, they can be effective as orchestrators. To improve, have the temporally cohesive routine call other specific routines rather than performing the activities directly. Naming (e.g.,Startup()vs.ReadConfigFileInitScratchFileEtc()) clarifies its single purpose as an orchestrator.
- Sequential Cohesion: Operations must be performed in a specific order, share data from step-to-step, but don’t form a complete function together. Example: A routine that calculates age and then uses that age to calculate time to retirement. To improve, split into separate, functionally cohesive routines (e.g.,
- Generally Unacceptable Kinds of Cohesion (Result in Poor Code):
- Procedural Cohesion: Operations are done in a specified order, but only because it matches an external sequence (e.g., input screen order), not because they form a complete functional unit. Example:
GetEmployeeName(), thenGetEmployeeAddress(), thenGetEmployeePhoneNumber()in one routine. To improve, extract separate operations into their own routines, ensuring the calling routine has a single, complete job (e.g.,GetEmployee()). - Logical Cohesion: Multiple operations are grouped, with one selected by a control flag (e.g.,
InputAll()takes a flag to input customer, employee, or inventory data). The “logic” of theif/casestatement is the only connection. Better to have separate routines for each distinct operation. However, it’s acceptable if the routine’s sole purpose is to dispatch commands (an “event handler”) and it performs no direct processing itself. - Coincidental Cohesion: Operations have no discernible relationship to each other (“no cohesion” or “chaotic cohesion”). The low-quality C++ routine example at the chapter’s start exemplified this. This typically requires a deeper redesign.
- Procedural Cohesion: Operations are done in a specified order, but only because it matches an external sequence (e.g., input screen order), not because they form a complete functional unit. Example:
The key takeaway is to understand the concepts of cohesion and strive for functional cohesion in routines whenever possible, as it yields maximum benefit.
Good Routine Names
Effective routine names are crucial for code clarity.
- Describe Everything the Routine Does: A routine’s name should clearly state all its outputs and side effects. If
ComputeReportTotals()also opens a file, its name should reflect that (ComputeReportTotalsAndOpenFile()). However, such long names indicate a deeper problem: routines should achieve results directly, avoiding side effects that aren’t clear from their primary purpose. The solution is to prevent side effects, not use less descriptive names. - Avoid Meaningless, Vague, or Wishy-Washy Verbs: Shun generic verbs like
Handle,Perform,Process,DealWith,Output(unlesshandlerefers to event handling). These names provide little information. A vague name often symptomatic of a routine with a weak purpose; the best solution is to restructure the routine for a stronger, more focused purpose and a corresponding stronger name (e.g.,FormatAndPrintOutput()instead ofHandleOutput()). - Don’t Differentiate Routine Names Solely by Number: Avoid using numerical suffixes (e.g.,
OutputUser,OutputUser1,OutputUser2). Numbers offer no semantic meaning or indication of different abstractions, leading to poorly named routines. - Make Names of Routines as Long as Necessary: Prioritize clarity. While variable names might have an optimal length, routine names should be as long or short as needed to be fully understandable. When routines are methods, the object name often provides context, shortening the needed routine name.
- To Name a Function, Use a Description of the Return Value: Functions return a value, so their names should describe what they return (e.g.,
cos(),customerId.Next(),printer.IsReady()). - To Name a Procedure, Use a Strong Verb Followed by an Object: Procedures (which perform operations) should have verb-plus-object names that reflect their action (e.g.,
PrintDocument(),CalcMonthlyRevenues()). In object-oriented languages, the object is implicit in the call (e.g.,document.Print()), so including the object name in the routine itself (document.PrintDocument()) is redundant and can become misleading in derived classes. - Use Opposites Precisely: Employ consistent naming conventions for opposing actions (e.g.,
Open/Close,Add/Remove). Asymmetrical pairs (e.g.,FileOpen()/_lclose()) are confusing. - Establish Conventions for Common Operations: For recurring operations across different objects (e.g., retrieving a unique ID), establish a consistent naming convention. Lack of such conventions (e.g.,
employee.id.Get(),dependent.GetId(),supervisor(),candidate.id()) forces programmers to waste mental effort remembering inconsistent syntaxes, leading to annoyance and errors.
How Long Can a Routine Be?
The ideal maximum length for a routine has been a long-standing debate. Theoretically, it’s often suggested to be one screen or one to two pages (approximately 50-150 lines), with some past limits set at 50 lines (IBM) or two pages (TRW). While modern programs feature many very short routines, extremely long ones (e.g., 4,000 or 12,000 lines) still exist in practice.
Research on routine length offers mixed findings:
- Some studies suggest inverse correlation with errors (up to 200 lines): as routine size increased, errors _per line_decreased (Basili & Perricone).
- Other studies found no correlation between size and errors, but rather with structural complexity and data amount (Shen et al.).
- Another indicated small routines (under 32 lines) weren’t necessarily cheaper or less fault-prone, and larger routines (65+ lines) were cheaper to develop per line (Card, Church, & Agresti; Card & Glass).
- A study of 450 routines found smaller ones (under 143 statements) had more errors per line but were 2.4 times less expensive to fix (Selby & Basili).
- Optimal change frequency was found for routines averaging 100 to 150 lines (Lind & Vairavan).
- IBM found routines larger than 500 lines were most error-prone, with error rates proportional to size beyond that point (Jones).
For object-oriented programs, many routines will naturally be very short accessors. For more complex algorithms, routines should be allowed to grow organically up to 100-200 lines (non-comment, non-blank lines). Decades of evidence suggest routines in this range are no more error-prone than shorter ones. Instead of strict length limits, let factors like cohesion, nesting depth, number of variables, decision points, and necessary comments dictate a routine’s length.
However, if routines consistently exceed 200 lines, proceed with caution. Studies reporting benefits for larger routines typically didn’t distinguish beyond this length, and understandability will inevitably become an issue past this point.
How to Use Routine Parameters
Routine interfaces are error-prone, with communication errors between routines accounting for a significant portion of faults. Minimize these issues with the following guidelines:
- Put Parameters in Input-Modify-Output Order: Arrange parameters systematically: input-only first, then input-and-output, then output-only. This reflects the typical operational flow. While conflicting with C’s convention, consistency (any ordering) is beneficial.
- Example (Ada-like):
procedure InvertMatrix(originalMatrix: in Matrix; resultMatrix: out Matrix); - Example (C++ with custom keywords for documentation):
void ChangeSentenceCase(IN StringCase desiredCase, IN OUT Sentence *sentenceToEdit); - Caution with custom keywords: They extend the language in an unfamiliar way, require project-wide consistency, and aren’t compiler-enforceable. C++‘s
constkeyword is usually preferable for input-only parameters.
- Example (Ada-like):
- If Several Routines Use Similar Parameters, Put Them in a Consistent Order: Consistent parameter ordering across related routines (e.g.,
strncpy()andmemcpy()) serves as a mnemonic, making them easier to remember. Inconsistent ordering (e.g.,fprintf()vs.fputs()) is confusing. - Use All the Parameters: If a parameter is passed, it should be used. Unused parameters correlate with increased error rates. Remove them unless there’s a strong, specific reason (e.g., conditional compilation of code that uses it).
- Put Status or Error Variables Last: By convention, status and error-indicating output parameters belong at the end of the parameter list, as they are secondary to the routine’s main purpose.
- Don’t Use Routine Parameters as Working Variables: Avoid reassigning input parameters to store intermediate results. Use local working variables instead. Modifying an input parameter creates misleading naming (it no longer holds the original input) and can lead to future errors if the original value is later needed.
- Bad Example:
int sample(int inputVal) { inputVal = inputVal * CurrentMultiplier(inputVal); inputVal = inputVal + CurrentAdder(inputVal); ... return inputVal } - Good Example:
int sample(int inputVal) { int workingVal = inputVal; workingVal = workingVal * CurrentMultiplier(workingVal); workingVal = workingVal + CurrentAdder(workingVal); ... return workingVal } - Using a working variable clarifies roles and prevents accidental modification. In C++,
constcan enforce this for input-only parameters.
- Bad Example:
- Document Interface Assumptions About Parameters: Document any assumptions about parameter characteristics (e.g., input/output nature, units, status code meanings, expected ranges, disallowed values) both in the routine and at its call sites. Using assertions in code is even better than just comments.
- Limit the Number of a Routine’s Parameters to About Seven: Psychological research suggests people can typically track around seven “chunks” of information. Exceeding this limit for routine parameters makes them difficult to comprehend. If consistently passing many arguments, it indicates tight coupling; consider re-designing routines or grouping them into a class where frequently used data becomes class data.
- Consider an Input, Modify, and Output Naming Convention for Parameters: If distinguishing these types is important, establish a naming prefix convention (e.g.,
i_,m_,o_, orInput_,Modify_,Output_). - Pass the Variables or Objects That the Routine Needs to Maintain Its Interface Abstraction:
- Pass specific elements: If the routine’s abstraction only requires certain data elements, and their origin from a single object is coincidental, pass those specific elements individually. This reduces coupling and aids understanding/reuse.
- Pass the whole object: If the routine’s abstraction inherently involves operating on a specific object as a whole, pass the entire object. This promotes interface stability if the routine later needs other object members and avoids exposing internal data usage.
- Rule of thumb: If you find yourself setting up an object with data just for a call, or frequently changing parameter lists from the same object, it indicates which approach is better.
- Use Named Parameters: In languages that support it (like Visual Basic), explicitly associating formal and actual parameters (e.g.,
Distance3d(xDistance := latitude, yDistance := longitude)) improves self-documentation and prevents errors, especially with long lists of identically typed arguments. This is particularly valuable in safety-critical environments. - Make Sure Actual Parameters Match Formal Parameters: Always verify that the types of variables, constants, or expressions passed in a call (actual parameters) correctly match the types declared in the routine definition (formal parameters). While strongly typed languages (C++, Java) help, in weakly typed languages (C without full warnings), or for input/output arguments, mismatches can lead to subtle bugs. Heed compiler warnings.
Special Considerations in the Use of Functions
Modern languages support both functions (routines that return a value) and procedures (routines that do not). In C++, void functions are semantically procedures. The distinction is primarily semantic, not just syntactic.
When to Use a Function and When to Use a Procedure
Purists argue functions should return only one value (like mathematical functions), taking only input parameters and being named for their return value (e.g., sin(), CustomerID(), ScreenHeight()). Procedures, on the other hand, can take input, modify, and output parameters (output parameters are parameters passed into a function or method call that are modified during the function).
A common practice is a “function” that acts as a procedure but returns a status value:
if (report.FormatOutput(formattedReport) = Success) then ...While technically a function, its primary purpose is the procedure-like operation. This is acceptable if the status return is used consistently and doesn’t confuse the routine’s main purpose.
The alternative (and author’s preference) is to use a procedure with an explicit status output parameter:
report.FormatOutput(formattedReport, outputStatus)
if (outputStatus = Success) then ...This separates the call from the status test, reducing statement complexity.
Another acceptable style is:
outputStatus = report.FormatOutput(formattedReport)
if (outputStatus = Success) then ...In short: Use a function if its primary purpose is to return the value indicated by its name; otherwise, use a procedure.
Setting the Function’s Return Value
Using functions introduces the risk of returning an incorrect or unset value, especially with multiple execution paths.
- Check All Possible Return Paths: Mentally trace every execution path to ensure a value is returned in all circumstances. Initialize the return value to a default at the function’s start as a safety net.
- Don’t Return References or Pointers to Local Data: Local data goes out of scope and becomes invalid when the routine ends, making any returned references or pointers to it dangerous. Instead, an object should store such information as class member data and provide accessor functions to return their values.