Recently I published the article Prevent Duplicated TypeScript Code. A good friend of mine approached me and said:
“Great article, I learned a lot!”
I was a little bit astonished because he is a way better developer than me. I asked:
“What exactly did you learn?”
He replied:
“I never thought about the difference between code duplication and knowledge duplication. It was great that you focused on differentiating this.”
He was talking about accidental duplication. Removing it will make your code harder to read and harder to change in the future.
Accidental duplication is code that looks similar but represents different logic.
The fact that my friend had the same perception as I had while writing the post made me write this article to explain with some examples how to identify the difference between essential duplication and accidental duplication.
What is the DRY principle about?
If you want to have a detailed introduction to the “Don’t repeat yourself” (DRY) principle I suggest reading my recent article first. This article will only give a short introduction to the principle and focuses on the pitfalls of applying the DRY principle.
The DRY principle states that duplication and repetition in code, that exists in two places and repeats the same knowledge and business logic should be avoided. The principle is credited to Andy Hunt and Dave Thomas and is stated in their book The Pragmatic Programmer:
“Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” — Andy Hunt and Dave Thomas
Notice that Andy Hunt and Dave Thomas are pointing out “every piece ofknowledge” and not “every piece of code”. Understanding the difference is essential to identify accidental duplication.
Knowledge Duplication
“Are we looking at syntax duplication or knowledge duplication?” — Anthony Sciamanna
To spot knowledge duplication basically, two questions have to be answered affirmatively:
Does code exist that looks identical?
Does the code repeat the same knowledge and logic?
#1 Example: Knowledge Duplication
Let’s have a look at this example:
#1 Example: Knowledge Duplication
We have two signUp
functions. One in the client and the other in our server. Let’s answer our questions:
1. Does code exist that looks identical? ✅
The alarm bells should ring here for every developer. It is obvious that both functions are sharing duplicated code.
2. Does the code repeat the same knowledge and logic? ✅
This question is the harder one. If you are not sure you can ask a replacement question: Does changing one code block lead to changing the other one?
“There is true duplication, in which every change to one instance necessitates the same change to every duplicate of that instance.” — Robert C. Martin
I consciously chose an easy example here. Obviously, the validation of credentials is the same for the signUp
function in the client as the one in the server. If we for example chose to increase the needed password length to 7 we would need to change it in two places. If we forgot to change it in one place, it would lead to a bug.
Since both questions are affirmed, we can confirm that there is some knowledge duplication in the code. Let`s refactor our code by extracting our code into helper functions. Since the client and the server are living in different environments we have to create a shared library.
#1 Example: Knowledge Duplication
Notice that each function has only one responsibility to comply with the SOLID principles, in particular the Single Responsibility Principle.
“The Single Responsibility Principle states that a given method/class/component should have a single reason to change” — Robert C. Martin in Clean Code
Looks much better and cleaner, right? Now we could change our credential validation in one single source of truth.
Accidental Duplication
Duplication is far cheaper than the wrong abstraction
Recognizing the difference between knowledge duplication and accidental duplication is even hard for experienced developers because a domain understanding of the code is needed. That is also why static analysis tools are great to detect duplicated syntax but they can not tell (at least not yet) if it is also a duplication of knowledge.
#2 Example: Accidental Duplication
Let`s start with another example by looking at this code:
#2 Example: Accidental Duplication
And again, let’s check our questions:
1. Does code exist that looks identical? ✅
As well as the first example, this code contains some duplicated syntax. Both, the ProductService
as well as the FeedbackService
are duplicating this code block:
#2 Example: Accidental Duplication
We could easily outsource this block of code to the abstract superclass CRUDService
. That would save us a few lines of duplicated code.
2. Does the code repeat the same knowledge and logic? ❌
Let`s start by asking the replacement question: Does changing one code block lead to changing the other one?
Presuming our business logic changes:
Feedback can now be created by every user while creating products still requires having the right permissions.
This would lead us to change the FeedbackService
class but the ProductService
class does not need to be changed. This means that we identified accidental duplication and we should not abstract our code.
“If two apparently duplicated sections of code evolve along different paths — if they change at different rates, and for different reasons — then they are not true duplicates” — Robert C. Martin
Imagine having abstracted our code by cleaning up the duplicated code and not only our two example classes are inheriting from our superclass but many more. We would have ended up un-refactoring our code because we cleaned up accidental duplication. This is much worse than having some duplicated code.
Minimizing Accidental Duplication
Often you just can’t entirely remove accidental duplication, but you can minimize it by complying with the SOLID principles and using good naming.
#3 Example: Minimize Accidental Duplication
Let’s illustrate how to minimize accidental duplication by looking at this example:
#3 Example: Minimizing Accidental Duplication
1. Does code exist that looks identical? ✅
Obviously, the functions toFileName
and toFolderName
are both transforming a string to an underscore string. The code is exactly the same, therefore duplicated code exists.
2. Does the code repeat the same knowledge and logic? ❓
To answer this question let`s go ahead and apply the DRY principle incorrectly.
We have noticed that the functions toFileName
and toFolderName
are identical. The obvious solution would be to merge both functions into one toFileOrFolderName
function. Now we can use it in our saveFile
method:
#3 Example: Minimizing Accidental Duplication
Why is that wrong?
Imagine it was decided that file names should now also have a timestamp in front of their name. A new developer in the project should implement this new requirement and is facing our toFileOrFolderName
function. What I often see is a solution like this:
#3 Example: Minimizing Accidental Duplication
By eliminating our original functions we have caused our new function toFileOrFolderName
to do more than one thing, that is violating the Single Responsibility Principle.
The fact that our original two functions do the same thing is an accident. One transforms file names and the other folder names. We don`t want the caller of toFileName
to know anything about folder names and we don`t want the caller of toFolderName
to know anything about file names.
How to minimize the duplication correctly?
Let`s restore our initial toFileName
and toFolderName
functions and get rid of the low-level duplication by creating the function toUnderscore
:
#3 Example: Minimizing Accidental Duplication
By following the Single Responsibility Principle each function maintains a single level of abstraction and has only one single reason to change.
Even though we didn`t eliminate the accidental duplication entirely we minimized it by getting rid of the low-level duplication of transforming a string to underscore.
Theoretically, we should now be able to differentiate between essential and accidental duplication… But what if we are not sure?
Rule of Three
“Three strikes and you refactor”
Source: https://giphy.com/gifs/dallasmavs-YlkCKp0EttSa7jxg6y
Spotting knowledge duplication isn’t easy and cleaning up accidental duplication is far more harmful than having duplicated code.
The Rule of Three 3️⃣ basically defines that when you spot some duplicated code and the first two cases aren’t enough to clearly identify shared knowledge, wait for the third duplicate before you refactor.
“It’s really hard and feels terrible, but close your eyes and try it anyway.” — Justin Weiss
Martin Fowler defined the Rule of Three in his book Refactoring: Improving the Design of Existing Code:
The first time you do something, you just do it.
The second time you do something similar, you wince at the duplication, but you do the duplicate thing anyway.
The third time you do something similar, you refactor.
Final Thoughts
Duplicated code is one of the major reasons for technical debt and bugs in software. That`s why the DRY principle is one of the most valuable ones in software development. But, applying it correctly is even more important.
Remember, whenever you spot dome duplicated code ask yourself: “Am I looking atduplicated syntaxor**duplicated knowledge*?”.* And if you are not sure, apply the Rule of Three.
Thanks for reading!
Want to learn more about how I scaled my Chrome Extension to almost 100,000 users as a solopreneur? Subscribe to my stories or follow me on LinkedIn and Twitter.
If you read a lot online, make sure to check out my Chrome Extension loved by 90,000+ active user — it’s free: