The TRUE hardest programming problem is tight vs. weak coupling
A few months ago, I claimed that naming is the hardest programming problem. I was wrong. The true hardest problem is one that impacts every developer at every skill level, across all programming languages, regardless of experience. It appears on multiple levels, from language details to large scale distributed computing. It is equally applicable across all programming disciplines. And its impacts are monetary, hedonic, and cognitive.
The hardest problem in programming is this:
Should X and Y be tightly coupled or weakly coupled?
I was trained as a philosopher. Philosophers are a motley bunch, but if there is any generalization that is fair to level at all philosophers, it's this: Philosophers revel in questions that appear to have easy answers, but which, upon reflection, might just be intractable. This appears to be one of those.
What Do We Mean by Tight and Weak Coupling?
When we talk about coupling, we're talking about establishing a relationship between two things. While programming, we might couple two functions like this:
function x() {
//do something
}
function y() {
x()
}
By making y()
dependent on x()
, we've coupled them. But coupling is a more generic term. We could reverse the dependency (have x()
call y()
). We could have a mutual dependency (x()
calls y()
, which calls x()
). We could add a layer of indirection, such as have y()
call an implementation of interface X
, and have x()
be an implementation of X
that is available to y()
, and so on.
This particular domain is especially interesting because it admits of multiple levels and abstractions. We can talk not just about functions being coupled, but other common things:
- Libraries can be coupled
- Programs can be coupled (program
x
calls programy
...) - Services can be coupled (which is the basis for microservice architecture)
- Data formats can be coupled (such as of SGML, HTML, XML, SVG, and so on)
- Protocols can be coupled
- User interface elements can be coupled
- Hardware and software can be coupled
- Network stacks can be coupled
- We can even get into high level techs like websites, phones, relations between datacenters, and so on.
And to really drive the point home, the new FaaS (Functions as a Service) paradigm is all about coupling "functions" that are each stand-alone services. The hype generated around this technology revolves around this idea that FaaS provides a desirable boundary across which "functions" couple.
So part of what makes this problem interesting is that it is likely be hit by everyone from CS101 students to datacenter operators to architects to data scientists. Everyone in our space deals with coupling, and anyone making design decisions will have to make decisions about coupling.
Next, we need to distinguish between two different kinds of coupling, which many computer scientists refer to as tight and weak coupling.
Tight Coupling
It is easier to start with the more restrictive case, and that is tight coupling.
When establishing a relationship from X to Y, only the mutual needs of X and Y are considered.
What this says is that tight coupling consists in reducing a coupling problem to only two questions: What does X need to be successfully related to Y? And what does Y need to be successfully related to X?
In programming, this might play out as deciding how to break up programming logic into different functions, and then make calls from one function to another. At a higher level, two microservices are tightly coupled when one service's sole role is to fulfill the needs of one other service.
There's actually a very interesting question to ask at this point: Is coupling about intention or about reality? That is, when we say X is tightly coupled to Y, do we mean "X was designed in such a way as to be related to Y in an exclusive way?" Or do we mean, "X happens to be related to Y in an exclusive way"? It might be easiest to explain that in light of the microservice example above:
Option A: My microservice Y was designed only to interoperate with microservice X.
Option B: As it happens, microservice Y only interoperates with microservice X.
Clearly, there are many cases where Option B arises in the wild. But those cases are not particularly... how should we say it... philosophically interesting. There are no engaging problems to solve.
But when it comes to intentions, then we are onto something interesting. For in this case we can ask what ought to be done, and how we should make decisions.
At this point, we are talking about making design decisions that involve coupling X to Y by considering only the needs of X and Y.
Loose Coupling
Defining loose coupling now no appears to be a boring exercise:
It is not the case that when establishing a relationship from X to Y, only the mutual needs of X and Y are considered.
But we can really ignore a bunch of logical cases we don't care about (in which we're focusing on the antecedent) do a little rewording, and give an account of loose coupling that is not a negation of strong coupling, but does provide a relevant alternative to string coupling:
When establishing a relationship between X and Y, the mutual needs of X and Y are considered along with additional needs.
But "additional needs" is frustratingly vague.
Given a system of abstraction S, composed of individual component parts, when establishing the relationship between X (a component of S) and Y (a component of S), the needs of each component in S are considered as they pertain to the relationship between X and Y.
Note that since X and Y are both components of S, their needs are each considered. But have we just kicked the ambiguity up a level? Because now we are talking about components as they pertain to.... To get rid of this is going to be tedious.
Given a system of abstraction S, composed of individual component parts, and where X and Y are each component parts, when establishing a relationship from X to Y, the needs of each component's relationship to Y, and the needs of X's relationship to each component must be considered.
The problem with this definition is that we don't need to consider every possible relationship. That is, when looking at how function X calls function Y to sum a few numbers, we shouldn't also have to look at how X calls Z to check whether a string contains a substrings. We just care about the particular relationship that is under scrutiny. (e.g. what we are really asking is whether Y should or could be used by other components rather than just X, and (vice versa) whether X should be able to use components other than Y to perform the required summing task?
So now we're onto a seriously frustrating definition:
Given a system of abstraction S, composed of individual component parts, and where X and Y are each component parts, when establishing a particular relationship R from X to Y, the needs of each component's relationship R to Y, and the needs of X's relationship R to each component must be considered.
By limiting the definition to just a particular relationship makes things less generic.
At this point, one might argue that we've gotten overzealous. Do we really need to consider each component in the system? The short answer is yes. Really, it's a yes, because.... It is perfectly legitimate to apply broad heuristics and say, "When considering the relationship R, I simply ruled out a whole bunch of components because they weren't directly relatable."
But wait! There's more! We need to pull off a very dangerous philosophical move and leave the realm of the existing system, entering the realm of the possible. Because we also need to say, "what if at some point I write new code that does Z... will it, too, need a relationship R with X?"
(If you're keeping track... we started with predicate logic, worked our way into set logic, and are now in modal logic. This problem is a massive pain in the butt.) So we somewhat need to revise our statement to be thus:
Given a system of abstraction S, composed of all possible individual component parts, and where X and Y are each possible component parts, when establishing a particular relationship R from X to Y, the needs of each component's relationship R to Y, and the needs of X's relationship R to each possible component must be considered.
Now the problem we skirted earlier might actually be a real problem. For while we can, with some justification, rule out a broad number of actual cases in our code, the set of possible components is highly likely to be substantially larger. Which means, I'm afraid, that we are going to have to cheat... err... be instrumental.
Given a system of abstraction S, composed of all possible individual component parts, and where X and Y are each possible component parts, when establishing a particular relationship R from X to Y, the needs of each component's relationship R to Y, and the needs of X's relationship R to each relevant possible component must be considered.
And now we use "relevance" to give us a cognitive safety net, fleshing it out "within scope of system S at time T (when the decision is being made)" and then declaring "within scope" to include a cognitive boundary. Or, to put it in plain English, "relevant" is shorthand for "stuff that seemed to me to be likely at the time."
Were this a proper philosophy paper, we would now revisit our definition of strong coupling, and would discover that we needed to enfancificate it as well. We'd add our systems wording, and our revised relationship wording, but it would still mean the same thing.
Instead, let's bump our definitions from the realm of set logic back to a simple grokkable definition:
When establishing a particular relationship between X and Y... * Strong coupling says we only considered the mutual needs of X and Y * Weak coupling says we consider the other relevant components in the system as well
How is this a Problem?
We've spent some serious wordcount just trying to explain the terms. But does any of this justify claiming that this distinction is at the heart of the hardest problem in programming?
Let's lay down the problem plainly: The hardest problem in programming is assessing, in any given circumstances, the myriad problems associated with coupling. Here's a two-pronged approach for illustrating just how deeply the problem is. First, I'll pick a particular programming challenge, and show the breadth of issues associated with coupling. Then I'll list out a variety of broader circumstance, each of which will admit a similar breadth of issues. In other words, we're doing something like tracing the perimeter of an issue in order to assess the area of the issue.
XXX
Now we can enumerate the areas in which problems like the above might manifest:
- Do I write generalized classes (getters and setters for all the things?) or just do what's necessary for now?
- Do I make the class/function/interface/variable public or private?
- Do I expose this part of the library as part of the public API/SDK or leave it internal?
- Do I expose this information on the REST API?
- Do I allow this data to be mutated, or just accessed?
All of these questions have at their core the question of whether the implementation is designed to tightly couple ("This API is private because only the internals should use it") or loosely couple ("This API is public, and thus I have to design for possible use cases").