Class Foundations: Abstract Data Types (ADTs)
At its core, an Abstract Data Type (ADT) is a fundamental concept in software programming. You can think of an ADT as a special kind of package. This package contains two main things:
- Data: This is the actual information or values that the ADT holds.
- Operations: These are the specific actions or functions that can be performed on that data.
The purpose of these operations is twofold: they describe what the data represents to other parts of your program, and they also provide the only allowed ways for the rest of the program to change or interact with that data. The term “data” in ADT is used quite broadly. It doesn’t just mean simple numbers or text. For example, an ADT could represent:
- A graphics window, along with all the operations you can do to it, like resizing or moving.
- A file, and all the actions you can perform on it, such as opening, reading, or saving.
- A table of insurance rates, and the operations to look up or update those rates.
Understanding ADTs is extremely important for grasping object-oriented programming. If programmers try to create classes (which are like blueprints for objects in object-oriented programming) without really understanding ADTs, their “classes” often end up being just convenient containers for unrelated data and functions. They won’t truly benefit from the power of object-oriented design. However, with a good understanding of ADTs, programmers can build classes that are much easier to create in the first place and also simpler to modify and update over time.
Many traditional programming books might explain ADTs in a very dry, mathematical way, making them seem theoretical and not very practical. But that misses the main point! ADTs are exciting because they allow you to work with real-world ideas and entities directly within your program, rather than getting stuck in the low-level technical details of how things are implemented. Instead of thinking about complex actions like “inserting a node into a linked list,” you can use an ADT to think about more practical, problem-solving actions like “adding a cell to a spreadsheet,” “adding a new type of window to a list of window types,” or “adding another passenger car to a train simulation.” This powerful ability lets you focus on the problem you’re solving (the “problem domain”) instead of the detailed, technical implementation (the “low-level implementation domain”).
Example of the Need for an ADT
Let’s look at a concrete example to see why ADTs are so useful. Imagine you’re writing a program that controls how text appears on a screen. This program needs to handle different typefaces (like Arial or Times New Roman), point sizes (like 10-point or 12-point), and font attributes (like bold or italic). A part of your program will need to manage these font settings.
- Without using an ADT (an “ad hoc” approach): If you don’t use an ADT, you might handle font settings in a very direct, disorganized way. For instance, if you need to set the font size to 12 points (which might translate to 16 pixels), your code might look like this:
currentFont.size = 16This code directly changes a piece of data calledsizewithin acurrentFontvariable. This can be ambiguous because16could mean pixels, points, or something else. If you needed to change the font to bold, your code might look even more confusing, like this:currentFont.attribute = currentFont.attribute or 0x02Here,0x02is a special number (a hexadecimal constant) that represents “bold.” While you might try to make it a bit clearer by using a named constant likeBOLD(currentFont.attribute = currentFont.attribute or BOLD), the core problem remains: your program is directly changing internal data members (size,attribute). If you decide later that “bold” should be represented differently (e.g., a simpleTrue/Falsevalue likecurrentFont.bold = True), you would have to go through your entire program and find every single place where you set “bold” using0x02and change it. This is tedious, error-prone, and makes your code hard to update. - Using an ADT: With an ADT, you would define a set of specific operations (functions or routines) that control the font, and these operations would be bundled with the font’s data. So, instead of directly manipulating
currentFont.sizeorcurrentFont.attribute, you would use clear, descriptive functions like:currentFont.SetSizeInPoints( sizeInPoints )currentFont.SetSizeInPixels( sizeInPixels )currentFont.SetBoldOn()currentFont.SetBoldOff()currentFont.SetItalicOn()currentFont.SetItalicOff()currentFont.SetTypeFace( faceName )The actual code inside these functions would likely be similar to the “ad hoc” examples (currentFont.size = 16), but the key difference is that these details are now isolated within the ADT’s operations. The rest of your program only interacts with these clear, higher-level functions, not the messy internal data.
Benefits of Using ADTs
Using ADTs offers many significant advantages over a less organized approach:
- You can hide implementation details: This is a major benefit. By wrapping the data and its operations within an ADT, you hide how the data is actually stored or managed internally. This means if you decide to change how
boldis represented (e.g., from0x02to a simpleTrue/Falseflag), you only need to change it in one single place—inside the ADT’s routines. The rest of your program, which callscurrentFont.SetBoldOn(), doesn’t need to know or care about this change. This also protects your program if you decide to store data in a different way (like in a file instead of in memory) or rewrite parts of the ADT in another programming language. - Changes don’t affect the whole program: If you need to make your fonts more complex and support new operations (like small caps, superscripts, or strikethrough), you can add these new operations within the ADT. Because the rest of your program only interacts with the ADT’s defined operations, adding new features to the ADT will not force you to change unrelated parts of your program.
- You can make the interface more informative: Code like
currentFont.size = 16is unclear because16could mean 16 pixels or 16 points. This ambiguity can lead to errors. By collecting all similar operations into an ADT, you can clearly define functions likeSetSizeInPoints()andSetSizeInPixels(), making it obvious what kind of size you are setting and helping to avoid confusion. - It’s easier to improve performance: If you find that font operations are making your program slow, you don’t have to search through your entire program to optimize them. Instead, you know exactly where to look: the few, well-defined routines within your font ADT. You can recode and optimize these specific routines without affecting the rest of the application.
- The program is more obviously correct: Comparing a confusing line like
currentFont.attribute = currentFont.attribute or 0x02with a clear call likecurrentFont.SetBoldOn()highlights how much easier it is to spot errors with ADTs. In the first case, you could accidentally use the wrong variable name, the wrong operation (e.g.,andinstead ofor), or the wrong numerical value (0x20instead of0x02). WithcurrentFont.SetBoldOn(), the only likely mistake is calling the wrong routine name, which is much simpler to verify. - The program becomes more self-documenting: While you can improve code by replacing a cryptic number like
0x02with a named constant likeBOLD, nothing beats the clarity and readability of a routine call such ascurrentFont.SetBoldOn(). This makes your code much easier for other programmers (or even yourself in the future) to understand what’s happening just by reading the function calls. In fact, a study showed that students working with programs divided into ADT routines scored over 30 percent higher on understanding questions compared to those working with functionally divided programs. - You don’t have to pass data all over your program: In the old, ad-hoc way, you might have to directly change a
currentFontvariable or pass it as an argument to every function that needed to work with fonts. This can make your program messy and hard to manage, or it might tempt you to makecurrentFonta “global” variable (which has its own problems). When you use an ADT, the ADT itself has a structure that contains all its data (likecurrentFont’s information). Only the routines that are part of the ADT can directly access this data. Routines outside the ADT don’t need to worry about the data’s internal details, which simplifies how you structure your program. - You’re able to work with real-world entities rather than with low-level implementation structures: This is a crucial philosophical benefit. ADTs allow you to define operations (like
SetSizeInPoints,SetBoldOn) that relate directly to the concepts in your problem domain (like “fonts”). This means most of your program can operate purely in terms of these meaningful font operations, instead of dealing with the nitty-gritty, low-level details of how fonts are stored or manipulated internally (like “array accesses,” “structure definitions,” or using rawTrueandFalsevalues for attributes).
In conclusion, by defining an ADT for fonts and creating routines like currentFont.SetSizeInPoints(), you effectively isolate all font operations into a dedicated set of functions. This provides a much better level of abstraction for the rest of your program when dealing with fonts, and it gives you a crucial layer of protection against future changes in how font operations are handled.
Good Class Intefaces
The most crucial step in building a top-quality class is designing a good interface. A good interface involves two main things: creating a clear, simple representation (an abstraction) of what the class does, and making sure that all the internal details are hidden behind this simple view.
Good Abstraction
As discussed before, abstraction means being able to look at something complicated in a simplified way. A class interface provides this simplified view of the complex code that runs behind it. The routines (functions) that a class offers through its interface should all clearly belong together and serve a consistent purpose.
Let’s consider an example of a class that represents an Employee. This class would store data like the employee’s name, address, and phone number. It would also provide services (operations) to create and manage an employee.
class Employee {
public:
// public constructors and destructors
Employee();
Employee(
FullName name,
String address,
String workPhone,
String homePhone,
TaxId taxIdNumber,
JobClassification jobClass
);
virtual ~Employee();
// public routines
FullName GetName() const;
String GetAddress() const;
String GetWorkPhone() const;
String GetHomePhone() const;
TaxId GetTaxIdNumber() const;
JobClassification GetJobClassification() const;
// ... other employee-related routines
private:
// ... internal data and routines
};This Employee class provides a good abstraction because every routine it exposes (GetName, GetAddress, etc.) works towards a single, consistent goal: managing an employee. Internally, the class might have many more routines and data to make these services work, but users of the class don’t need to know any of those hidden details.
Now, let’s look at an example of a class with a poor abstraction:
class Program {
public:
// public routines
void InitializeCommandStack();
void PushCommand( Command command );
Command PopCommand();
void ShutdownCommandStack();
void InitializeReportFormatting();
void FormatReport( Report report );
void PrintReport( Report report );
void InitializeGlobalData();
void ShutdownGlobalData();
// ...
private:
// ... internal data and routines
};This Program class includes routines for managing a command stack, for formatting and printing reports, and for initializing global data. It’s hard to see any clear connection between these different sets of routines. The class interface doesn’t offer a consistent abstraction; it’s just a mix of unrelated functions. This means the class has poor cohesion (its parts don’t stick together logically). These routines should really be separated into more focused classes, with each class providing a better, clearer abstraction for its specific purpose.
The example above could be improved by reorganizing the routines into a Program class that presents a more consistent abstraction, perhaps like this:
class Program {
public:
// public routines
void InitializeUserInterface();
void ShutDownUserInterface();
void InitializeReports();
void ShutDownReports();
// ...
private:
// ... internal data and routines
};This improved Program class is cleaner because some of the original routines (like those for the command stack or detailed report formatting) would have been moved to other, more appropriate classes, or they might have been made into private routines used internally by InitializeUserInterface() or InitializeReports().
The evaluation of a class’s abstraction is mainly based on its public routines (its interface). While the overall class interface should be a good abstraction, the routines inside the class also need to be designed well individually.
Designing good, abstract interfaces leads to several important rules for creating class interfaces:
-
Present a consistent level of abstraction in the class interface: A helpful way to think about a class is that it should implement one and only one Abstract Data Type (ADT). If you find a class trying to do too many different things, or if you can’t figure out what single ADT it’s supposed to represent, it’s a sign that you need to reorganize it into more focused classes. Here’s an example of a class where the level of abstraction is inconsistent:
class EmployeeCensus: public ListContainer { // Inherits from ListContainer public: // public routines void AddEmployee( Employee employee ); // Abstraction at "employee" level void RemoveEmployee( Employee employee ); // Abstraction at "employee" level Employee NextItemInList(); // Abstraction at "list" level Employee FirstItem(); // Abstraction at "list" level Employee LastItem(); // Abstraction at "list" level // ... private: // ... };This
EmployeeCensusclass is actually trying to represent two different ADTs: anEmployeecollection (like a census) and a genericListContainer. This often happens when a programmer directly uses a library class (likeListContainer) to build their class and doesn’t hide that fact. Usually, whether a container class is used is an implementation detail that should be hidden from the rest of the program. Here’s a better way to design theEmployeeCensusclass with a consistent level of abstraction:class EmployeeCensus { public: // public routines void AddEmployee( Employee employee ); void RemoveEmployee( Employee employee ); Employee NextEmployee(); // Now "employee" level Employee FirstEmployee(); // Now "employee" level Employee LastEmployee(); // Now "employee" level // ... private: ListContainer m_EmployeeList; // ListContainer is now hidden inside // ... };In the improved example, the
ListContaineris an internal part ofEmployeeCensus, not something exposed directly through the interface. The methods are now consistently about “employees” (NextEmployee,FirstEmployee,LastEmployee). Some programmers might argue that inheriting fromListContaineris convenient for features like searching or sorting. However, inheritance should typically only be used for “is a” relationships (e.g., a “Dog is an Animal”). AnEmployeeCensusis not aListContainer; it uses aListContainer. When you mix levels of abstraction like this, it makes the program harder to understand as it grows and changes, eventually making it difficult to maintain. Think of inconsistent public routines as “leaky panels” in your class, which, over time, can “sink the boat.” -
Be sure you understand what abstraction the class is implementing: Sometimes, classes can be very similar, and you need to be careful to choose the exact right abstraction for your class interface. For example, if you need a simple grid control but only have a more complex spreadsheet control available, you might create a “wrapper” class. This wrapper should expose only the functionality of a grid control (e.g., 15 routines) plus any specific additions you need (like cell coloring), not all 150 routines of the underlying spreadsheet control. Exposing too much detail (like all 150 routines) means you fail to hide the implementation and create more work for yourself if you ever switch the underlying control. You must choose the right abstraction based on what the class is truly meant to represent.
-
Provide services in pairs with their opposites: Many operations have a matching “opposite” operation. If you have a function to
TurnLightOn(), you’ll likely needTurnLightOff(). If youAddItem()to a list, you’ll probably need toDeleteItem(). When designing a class, check if each public routine needs a complementary opposite. Don’t add opposites just for the sake of it, but consider if they are truly necessary for complete functionality. -
Move unrelated information to another class: If you notice that roughly half of a class’s routines work with one set of data, and the other half work with completely different data, it’s a strong sign that you actually have two classes pretending to be one. In such cases, you should break them apart into two more focused classes.
-
Make interfaces programmatic rather than semantic when possible: An interface has two parts:
- The programmatic part includes things like data types and function parameters that the compiler can automatically check for errors.
- The semantic part consists of rules about how the interface should be used that the compiler cannot check (e.g., “RoutineA must be called before RoutineB”). These semantic rules should be documented in comments. However, try to design your interfaces so they rely as little as possible on just documentation. Any part of an interface that the compiler can’t enforce is prone to being misused. Look for ways to convert these semantic rules into programmatic ones, perhaps by using checks within the code (like
Asserts) to ensure correct usage.
-
Beware of erosion of the interface’s abstraction under modification: As a class is changed and expanded over time, new functionality might be needed that doesn’t quite fit the original, clean interface. It can be tempting to just add these new, slightly unrelated routines to the existing class because it seems easier. For example, our clean
Employeeclass might evolve into something messy like this:class Employee { public: // public routines FullName GetName() const; Address GetAddress() const; PhoneNumber GetWorkPhone() const; // ... (original clean routines) bool IsJobClassificationValid( JobClassification jobClass ); // Added later bool IsZipCodeValid( Address address ); // Added later bool IsPhoneNumberValid( PhoneNumber phoneNumber ); // Added later SqlQuery GetQueryToCreateNewEmployee() const; // Added later SqlQuery GetQueryToModifyEmployee() const; // Added later SqlQuery GetQueryToRetrieveEmployee() const; // Added later // ... private: // ... };This
Employeeclass, which started with a clear purpose, has now become a jumble of functions that are only loosely related. There’s no logical reason for anEmployeeclass to contain routines that check if ZIP Codes or phone numbers are valid; those are general validation checks. Similarly, routines that expose internal SQL database query details (GetQueryToCreateNewEmployee) are at a much lower level of abstraction and break the idea of what anEmployeeclass should represent. Don’t add public members that are inconsistent with the interface’s abstraction. Each time you add a new routine to a class interface, ask yourself: “Does this routine fit consistently with the overall concept (abstraction) that this class represents?” If it doesn’t, you need to find a different way to make that modification and protect the clear purpose of your class.
Finally, remember that the ideas of abstraction and cohesion are very closely connected. A class interface that provides a good, clear abstraction usually has strong cohesion (all its parts work together towards one goal). While a class with strong cohesion also tends to have a good abstraction, focusing on the abstraction presented by the class interface often gives more helpful insights when you’re designing classes. If you’re struggling to fix a class with weak cohesion, try asking yourself what consistent abstraction the class should be representing.
Good Encapsulation
Encapsulation is a concept in programming that goes even further than abstraction. While abstraction helps you simplify complex ideas by letting you ignore their inner workings, encapsulation acts as a strict guard, preventing you from seeing or changing those internal details even if you wanted to. The two are very closely linked: if you don’t have encapsulation, your abstraction often falls apart. In practice, you usually have both or neither – there’s no in-between. The main thing that separates a well-designed piece of code from a poorly designed one is how much it hides its internal data and implementation details from other parts of the program.
Here are some guidelines for good encapsulation:
-
Minimize accessibility of classes and members: This means making your routines and data as private as possible. When you’re deciding if a specific routine should be
public(visible to everyone),private(visible only within the class), orprotected(visible within the class and its subclasses), it’s generally best to choose the strictest level of privacy that still allows the code to work. A more important rule to follow is: “What best protects the main idea (abstraction) of this interface?” If exposing a routine fits with the class’s purpose, it’s probably okay. If you’re unsure, it’s usually safer to hide more rather than less. -
Don’t expose member data in public: Directly making a class’s internal data (like variables) public is a direct violation of encapsulation. It takes away your control over how that data is used and changed. For example, if you have a
Pointclass with public variablesfloat x; float y; float z;, other parts of the program can change these values freely without thePointclass even knowing about it. This breaks encapsulation. However, if thePointclass provides functions likeGetX(),GetY(),SetX(float x),SetY(float y), it maintains perfect encapsulation. The code using thePointclass doesn’t know howx,y, andzare stored (e.g., as floats, doubles, or even in a very complex way). All it knows is that it can get and set the values using these defined functions, which gives the class control over its own data. -
Avoid putting private implementation details into a class’s interface: Ideally, programmers shouldn’t be able to see a class’s internal details at all. However, in some popular programming languages like C++, the language structure sometimes forces you to declare private internal details (like
String m_Name;orint m_jobClass;) in the public header file where the class interface is defined.class Employee { public: // ... public methods private: String m_Name; // Exposed implementation detail String m_Address; // Exposed implementation detail int m_jobClass; // Exposed implementation detail // ... };Even though these are marked
private, including them in the header file encourages other programmers to look at these details. This can lead them to make assumptions about how the class works internally, which is bad for encapsulation. A common way to fix this is to separate the class interface from its implementation. You can do this by using a “pointer to implementation” (often called “pimpl” idiom). This means the public class only holds a pointer to another, completely separate internal class that contains all the private details. This way, the users of the class never see those internal details in the header file. -
Don’t make assumptions about the class’s users: A class should be designed to follow the rules set by its public interface, and nothing more. It should not assume how its users will or won’t use it, beyond what is clearly stated in the interface documentation. Comments like
-- initialize x, y, and z to 1.0 because DerivedClass blows up if they're initialized to 0.0are a sign that the class knows too much about how it’s being used by others, which is a problem. -
Avoid friend classes: In general, friend classes (a feature in some languages that allows one class to access the private parts of another) break encapsulation. They increase how much code you need to think about at once, making your program more complex. While there are a few very specific cases where they might be used carefully, it’s best to avoid them in most situations.
-
Don’t put a routine into the public interface just because it uses only public routines: The fact that a routine only uses other public routines of the class is not a good enough reason to make it public itself. The real question should always be: “Is making this routine public consistent with the overall purpose and abstraction of the class’s interface?”
-
Favor read-time convenience to write-time convenience: Code is read many, many more times than it is written, even during its first development. Prioritizing quick writing over easy reading is a mistake. This is especially true when creating class interfaces. It might be tempting to add a routine to an interface because it’s convenient for a specific client you’re currently working on, even if it doesn’t quite fit the class’s main abstraction. But this is the start of a “slippery slope” that can damage the interface over time, so it’s best not to take that first step.
-
Be very, very wary of semantic violations of encapsulation: It’s relatively easy to avoid syntactic (code structure) errors by simply declaring internal data and routines as
private. However, semantic encapsulation(how you use the class based on your knowledge, not just the code structure) is much harder to maintain and much more dangerous when broken. Examples of semantic violations:- You don’t call
ClassA.InitializeOperations()because you knowClassA.PerformFirstOperation()will call it automatically. - You don’t call
database.Connect()beforeemployee.Retrieve(database)because you knowemployee.Retrieve()connects if needed. - You use a
MAXIMUM_ELEMENTSconstant fromClassBinstead ofClassA.MAXIMUM_ELEMENTSbecause you know they have the same value.
The problem with all these examples is that your code becomes dependent not on the class’s clear public interface, but on its hidden, private internal details. If you find yourself looking at a class’s internal code to figure out how to use it, you’re not programming to the interface; you’re programming through the interface to the implementation. When encapsulation is broken in this semantic way, abstraction will soon follow. If you can’t figure out how to use a class just by looking at its official interface documentation, the correct action is not to look at its source code. Instead, you should contact the class’s author and tell them you can’t understand how to use it. The author’s proper response should be to update the class’s interface documentation so it’s clearer, not to just explain it to you verbally. This way, the improved understanding is written directly into the code itself for future programmers.
- You don’t call
-
Watch for coupling that’s too tight: Coupling refers to how strongly connected two classes are. Generally, looser connections are better. Tight coupling happens when a class’s abstraction is “leaky” or when encapsulation is broken. If a class doesn’t offer all the necessary services, other parts of the program might be forced to directly read or change its internal data. This turns the class from a “black box” (whose internals you don’t need to know) into a “glass box” (where you can see everything inside), effectively destroying its encapsulation. Guidelines to minimize tight coupling (and promote encapsulation):
- Minimize the visibility of classes and their members (make them as private as possible).
- Avoid
friendclasses, as they create tight connections. - Keep data
privatein a base class instead ofprotectedto reduce how tightly derived (child) classes are connected to the base class. - Never expose internal data directly in a class’s public interface.
- Be very careful about semantic violations of encapsulation.
- Follow the “Law of Demeter” (a rule about how objects should interact, covered elsewhere).
In summary, encapsulation is the protection that keeps the internal workings of a class hidden, ensuring that other parts of your program only interact with it through its clearly defined public interface. This hiding is crucial for managing complexity, making code easier to change, and preventing subtle, hard-to-find errors.
Design and Implementation Issues
While creating good class interfaces is a big step towards writing a high-quality program, the internal design and implementation of a class are also very important. This section talks about several key topics related to how a class is built on the inside, including containment, inheritance, how member functions and data are used, how classes connect to each other (coupling), constructors, and the difference between value and reference objects.
Containment (“has a” Relationships)
Containment is a straightforward idea: it means that one class simply contains or holds another piece of data or another object. You might read more about inheritance in programming books than about containment, but this is usually because inheritance is more complicated and prone to errors, not because it’s a better technique. In fact, containment is the everyday workhorse in object-oriented programming.
- Implement “has a” through containment: The most common way to think about and implement a “has a” relationship is by using containment. For example, an
Employee“has a” name, “has a” phone number, “has a” tax ID, and so on. You achieve this in code by makingname,phone number, andtax IDmember data (variables that belong to the class) within theEmployeeclass. This is the simplest and most direct way to show that one object contains another. - Implement “has a” through private inheritance as a last resort: In very rare cases, you might find it difficult to use simple containment (making one object a member of another). In such situations, some experts suggest using private inheritance from the contained object. The main reason for this approach would be to allow the “containing” class to access
protectedmember functions orprotectedmember data of the class it contains. However, in practice, this method often creates a relationship that is too close (overly cozy) with the parent class and violates encapsulation (the idea of hiding internal details). It usually points to design mistakes that should be fixed in a better way, rather than resorting to private inheritance. It’s generally something to avoid. - Be critical of classes that contain more than about seven data members: Research has shown that people can typically remember about “7 plus or minus 2” (so, 5 to 9) separate items at a time while doing other tasks. Applying this to class design, if a class holds more than about seven data members, you should consider whether that class is trying to do too much. It might be better to break it down into multiple smaller, more focused classes. You might lean towards the higher end of this “7±2” range if the data members are simple types like numbers or text (integers and strings). However, if the data members are complex objects themselves, you should lean towards the lower end of that range, aiming for fewer than seven.
Inheritance (“is a” Relationships)
Inheritance allows one class (derived/subclass) to be a specialized version of another (base/superclass). Its purpose is to simplify code and avoid duplication by defining common routine interfaces, implementations, data members, or data types in a base class, which derived classes then inherit.
When using inheritance, key decisions involve the visibility and overridability of each member routine and data member for derived classes.
- Implement “is a” through public inheritance: Public inheritance strictly means the new class “is a” more specialized version of the old one. The derived class must fully adhere to the base class’s interface contract. If it doesn’t, inheritance is the wrong technique; consider containment or re-designing the hierarchy.
- Design and document for inheritance or prohibit it: Inheritance adds complexity. Either design and document classes specifically for inheritance, or prevent it entirely (e.g.,
non-virtualin C++,finalin Java,non-overridablein VB) to avoid unintended inheritance. - Adhere to the Liskov Substitution Principle (LSP): This principle states that “Subclasses must be usable through the base class interface without the need for the user to know the difference.” All base class routines should retain the same meaning in derived classes. If, for example,
InterestRate()means different things forCheckingAccountandAutoLoanAccount, thenAutoLoanAccountshould not inherit fromAccount, as this increases complexity instead of reducing it. - Be sure to inherit only what you want to inherit: Derived classes can inherit a routine’s interface, implementation, or both.
- Abstract overridable: Inherits interface, no default implementation (derived class must provide).
- Overridable: Inherits interface and default implementation (derived class can override).
- Non-overridable: Inherits interface and default implementation (derived class cannot override).
- Don’t “override” a non-overridable member function: Avoid creating a function with the same name as a
privateornon-overridablebase class routine in a derived class. This leads to confusion as it appears polymorphic but isn’t. - Move common interfaces, data, and behavior as high as possible in the inheritance tree: Place shared elements higher up to maximize reuse by derived classes. However, don’t move them so high that it breaks the higher object’s consistent abstraction.
- Be suspicious of classes of which there is only one instance: A single instance might indicate a confusion between objects and classes; consider just creating an object. Also, assess if variations could be handled by data rather than distinct classes (except for patterns like Singleton).
- Be suspicious of base classes of which there is only one derived class: This often suggests premature “designing ahead.” The best preparation for future work is not to design extra layers of base classes that “might be needed someday”. Current work must be simple, clear, and unnecessary inheritance hierarchies must be avoided.
- Be suspicious of classes that override a routine and do nothing inside the derived routine: This indicates a design flaw in the base class. For example, creating a
ScratchlessCatthat does nothing forScratch()violates theCat’s abstraction and leads to complex, confusing hierarchies (ScratchlessTaillessMicelessMilklessCat). The issue should be fixed at the source, e.g., by modelingClawsas a contained object withinCat. - Avoid deep inheritance trees: While inheritance manages complexity, overly deep hierarchies (more than 2-3 levels in practice, though some suggest 6) can increase it. Deep trees are associated with higher fault rates due to increased debugging difficulty. Use inheritance primarily for code duplication avoidance and complexity reduction.
- Prefer polymorphism to extensive type checking: Frequent
switchorcasestatements on object types often signal that polymorphism is a better choice. Instead ofswitch (shape.type) { case Circle: shape.DrawCircle(); ... }, prefershape.Draw(), letting each shape draw itself. However,casestatements are appropriate for genuinely distinct command types where polymorphism would dilute meaning. - Make all data private, not protected: Inheritance, by granting access to
protectedmembers, can weaken encapsulation. If derived classes need access to base class attributes, provideprotectedaccessor functions instead of direct access toprotecteddata.
Multiple Inheritance
If single inheritance is a powerful but potentially dangerous tool like a chainsaw, multiple inheritance is compared to a very old, unprotected chainsaw – it’s even more dangerous and complex. While some experts advocate for its broad use, in the author’s experience, multiple inheritance is primarily useful for creating “mixins.”
Mixins are simple, usually abstract classes designed to add a set of properties or behaviors (like Displayable, Persistent, Serializable, or Sortable) to other objects. They are “mixed in” to derived classes to extend their capabilities. Mixins are almost always abstract, meaning they aren’t meant to exist on their own.
Mixins require multiple inheritance, but they generally avoid the “diamond inheritance problem” (a common source of complexity in multiple inheritance) as long as each mixin is truly independent of the others. They also improve design clarity by grouping attributes. It’s easier to understand an object that uses Displayable and Persistent mixins than to track 11 separate routines that achieve the same properties.
Languages like Java and Visual Basic acknowledge the value of mixins by allowing multiple inheritance of interfaces only, but restrict classes to single inheritance. C++, on the other hand, allows multiple inheritance of both interface and implementation. Programmers should only use multiple inheritance after very carefully considering other options and weighing its impact on the system’s overall complexity and how easy it is to understand.
Why Are There So Many Rules for Inheritance?
The many rules surrounding inheritance exist because inheritance often works against the primary goal of managing complexity in software. To effectively control complexity, programmers should have a strong bias against using inheritance unless it’s clearly necessary and beneficial.
Here’s a summary of when to choose between inheritance (is-a) and containment (has-a):
- If multiple classes share common data but not behavior: Create a common object that these classes can contain (i.e., make it a member of those classes).
- If multiple classes share common behavior but not data: Derive them from a common base class that defines those shared routines.
- If multiple classes share both common data and behavior: Inherit from a common base class that defines both the shared data and routines.
- Inherit when you want the base class to control your interface (meaning the derived class must conform to the base’s contract).
- Contain when you want to control your interface (meaning your class dictates how it uses the contained object).
Member Functions and Data
Here are guidelines for effectively implementing a class’s internal functions and data:
- Keep the number of routines in a class as small as possible: While a lower number of routines per class is linked to fewer errors, other factors like deep inheritance, many routines called by a class (high “fan-out”), and strong coupling between classes are often more significant. Balance minimizing routines with these other considerations.
- Disallow implicitly generated member functions and operators you don’t want: Compilers automatically generate some functions (like assignment or default constructors). If you don’t want these, you can prevent client code from using them by declaring them
private. For example, making a constructorprivateis a standard way to create a singleton class (a class that only allows one object to be created). - Minimize the number of different routines called by a class: Studies show that classes calling many different routines from other classes, or using many different classes, tend to have more faults (errors). This concept is sometimes called “fan out.”
- Minimize indirect routine calls to other classes (Law of Demeter): Direct connections between objects are risky, but indirect ones (like
account.ContactPerson().DaytimeContactInfo().PhoneNumber()) are even riskier. The “Law of Demeter” states that an object (Object A) can call its own routines, and routines on objects it directly creates (Object B), but it should avoid calling routines on objects returned by Object B. For example,account.ContactPerson()is fine, butaccount.ContactPerson().DaytimeContactInfo()is not. In general, a class should minimize its collaborations with other classes.
Constructors
These guidelines apply specifically to constructors, which are special routines used to create and initialize objects.
- Initialize all member data in all constructors, if possible: As a defensive programming practice, it’s good to initialize all internal data members within every constructor. This helps prevent unexpected behavior.
- Enforce the singleton property by using a private constructor: To ensure that a class can only have one single object (instance), you can make all its constructors
private. Then, provide apublic staticroutine (likeGetInstance()) that returns this single instance. The private constructor is called only once when this single instance is first created.public class MaxId { // private constructor prevents direct creation private MaxId() { /* ... initialization ... */ } // public static method to get the single instance public static MaxId GetInstance() { return m_instance; } // The single instance itself, created once private static final MaxId m_instance = new MaxId(); } - Prefer deep copies to shallow copies until proven otherwise: When dealing with complex objects, you’ll decide between deep copies and shallow copies.
- A deep copy creates a completely new, separate copy of all the object’s data, including any objects it contains.
- A shallow copy typically just copies a reference or pointer, meaning both the original and the copy point to the same underlying data. While shallow copies might seem better for performance, they rarely offer a measurable speedup for most objects. Deep copies are generally simpler to code and maintain, as shallow copies add complexity with reference counting, safe comparisons, and deletion, which can be error-prone. Unless there’s clear evidence of a performance problem demanding a shallow copy, prefer deep copies.
Reasons to Create a Class
While often seen for modeling real-world objects, classes serve many crucial purposes:
- Model Real-World Objects: A primary reason. Use classes for tangible entities in your program, encapsulating their data and behavior.
- Model Abstract Objects: Create classes for non-concrete concepts (like a
ShapeabstractingCircleandSquare). Identifying effective abstractions is a key design challenge. - Reduce Complexity: The most vital reason. Classes hide information, simplifying the intellectual management of complex programs and improving code size, maintainability, and correctness.
- Isolate Complexity: Centralize complex algorithms, data, or protocols within a class. This localizes errors, limits change impact, and eases algorithm replacement.
- Hide Implementation Details: An excellent reason to use classes, whether the details are intricate (like database access) or simple (like data storage type).
- Limit Effects of Changes: Isolate volatile areas (hardware, I/O, data types, business rules) into classes to contain the impact of modifications.
- Hide Global Data: Encapsulate global data behind a class interface. This allows structural changes without program-wide ripple effects, enables access monitoring, and often reveals that “global” data belongs to an object.
- Streamline Parameter Passing: If a parameter is passed through many routines, those routines and the parameter might be better organized into a class, sharing the parameter as object data.
- Make Central Points of Control: Design classes to be single points of control for specific tasks (e.g., managing device access, database operations). This simplifies maintenance if underlying mechanisms change.
- Facilitate Reusable Code: Well-designed classes promote reuse in other programs. NASA studies show object-oriented approaches yield significantly higher code reuse (70%+) than functional ones. Their strategy involves identifying reuse candidates post-project and making them reusable, avoiding premature “designing for reuse.”
- Plan for a Family of Programs: Isolate anticipated changing parts of a program into separate classes. This enables easy modification or replacement of specific components, supporting a whole family of related programs.
- Package Related Operations: Group logically connected operations (e.g., math functions, string routines) into classes, even when information hiding isn’t the main goal.
- Accomplish a Specific Refactoring: Many refactoring techniques, like splitting a class or hiding a delegate, result in new classes, often driven by the desire to achieve the benefits above.
Classes to Avoid
Some class types can introduce problems:
- Avoid Creating God Classes: Don’t create all-knowing, all-powerful classes that excessively query and control other classes via
Get()/Set()methods. Their functionality often belongs in the classes they operate on. - Eliminate Irrelevant Classes: If a class contains only data and no behavior, reconsider if it’s truly a class. Its data might be better integrated as attributes of other classes.
- Avoid Classes Named After Verbs: A class with only behavior and no data isn’t typically a true class. Consider making it a routine within an existing class (e.g.,
DatabaseInitialization()as a routine on aDatabaseclass).