When You SHOULD Duplicate Code

Recently I published the article Prevent Duplicated TypeScript Code. A good friend of mine approached me and said:

“Great article, I learned a lot!”

I was a little bit astonished because he is a way better developer than me. I asked:

“What exactly did you learn?”

He replied:

“I never thought about the difference between code duplication and knowledge duplication. It was great that you focused on differentiating this.”

He was talking about accidental duplication. Removing it will make your code harder to read and harder to change in the future.

Accidental duplication is code that looks similar but represents different logic.

The fact that my friend had the same perception as I had while writing the post made me write this article to explain with some examples how to identify the difference between essential duplication and accidental duplication.

What is the DRY principle about?

If you want to have a detailed introduction to the “Don’t repeat yourself” (DRY) principle I suggest reading my recent article first. This article will only give a short introduction to the principle and focuses on the pitfalls of applying the DRY principle.

The DRY principle states that duplication and repetition in code, that exists in two places and repeats the same knowledge and business logic should be avoided. The principle is credited to Andy Hunt and Dave Thomas and is stated in their book The Pragmatic Programmer:

“Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” — Andy Hunt and Dave Thomas

Notice that Andy Hunt and Dave Thomas are pointing out “every piece ofknowledge” and not “every piece of code”. Understanding the difference is essential to identify accidental duplication.

Knowledge Duplication

“Are we looking at syntax duplication or knowledge duplication?” — Anthony Sciamanna

To spot knowledge duplication basically, two questions have to be answered affirmatively:

Does code exist that looks identical?
Does the code repeat the same knowledge and logic?

#1 Example: Knowledge Duplication

Let’s have a look at this example:

#1 Example: Knowledge Duplication

We have two signUp functions. One in the client and the other in our server. Let’s answer our questions:

1. Does code exist that looks identical? ✅

The alarm bells should ring here for every developer. It is obvious that both functions are sharing duplicated code.

2. Does the code repeat the same knowledge and logic? ✅

This question is the harder one. If you are not sure you can ask a replacement question: Does changing one code block lead to changing the other one?

“There is true duplication, in which every change to one instance necessitates the same change to every duplicate of that instance.” — Robert C. Martin

I consciously chose an easy example here. Obviously, the validation of credentials is the same for the signUp function in the client as the one in the server. If we for example chose to increase the needed password length to 7 we would need to change it in two places. If we forgot to change it in one place, it would lead to a bug.

Since both questions are affirmed, we can confirm that there is some knowledge duplication in the code. Let`s refactor our code by extracting our code into helper functions. Since the client and the server are living in different environments we have to create a shared library.

#1 Example: Knowledge Duplication

Notice that each function has only one responsibility to comply with the SOLID principles, in particular the Single Responsibility Principle.

“The Single Responsibility Principle states that a given method/class/component should have a single reason to change” — Robert C. Martin in Clean Code

Looks much better and cleaner, right? Now we could change our credential validation in one single source of truth.

Accidental Duplication

Duplication is far cheaper than the wrong abstraction

Recognizing the difference between knowledge duplication and accidental duplication is even hard for experienced developers because a domain understanding of the code is needed. That is also why static analysis tools are great to detect duplicated syntax but they can not tell (at least not yet) if it is also a duplication of knowledge.

#2 Example: Accidental Duplication

Let`s start with another example by looking at this code:

#2 Example: Accidental Duplication

And again, let’s check our questions:

1. Does code exist that looks identical? ✅

As well as the first example, this code contains some duplicated syntax. Both, the ProductService as well as the FeedbackService are duplicating this code block:

#2 Example: Accidental Duplication

We could easily outsource this block of code to the abstract superclass CRUDService . That would save us a few lines of duplicated code.

2. Does the code repeat the same knowledge and logic? ❌

Let`s start by asking the replacement question: Does changing one code block lead to changing the other one?

Presuming our business logic changes:

Feedback can now be created by every user while creating products still requires having the right permissions.

This would lead us to change the FeedbackService class but the ProductService class does not need to be changed. This means that we identified accidental duplication and we should not abstract our code.

“If two apparently duplicated sections of code evolve along different paths — if they change at different rates, and for different reasons — then they are not true duplicates” — Robert C. Martin

Imagine having abstracted our code by cleaning up the duplicated code and not only our two example classes are inheriting from our superclass but many more. We would have ended up un-refactoring our code because we cleaned up accidental duplication. This is much worse than having some duplicated code.

Minimizing Accidental Duplication

Often you just can’t entirely remove accidental duplication, but you can minimize it by complying with the SOLID principles and using good naming.

#3 Example: Minimize Accidental Duplication

Let’s illustrate how to minimize accidental duplication by looking at this example:

#3 Example: Minimizing Accidental Duplication

1. Does code exist that looks identical? ✅

Obviously, the functions toFileName and toFolderName are both transforming a string to an underscore string. The code is exactly the same, therefore duplicated code exists.

2. Does the code repeat the same knowledge and logic? ❓

To answer this question let`s go ahead and apply the DRY principle incorrectly.

We have noticed that the functions toFileName and toFolderName are identical. The obvious solution would be to merge both functions into one toFileOrFolderName function. Now we can use it in our saveFile method:

#3 Example: Minimizing Accidental Duplication

Why is that wrong?

Imagine it was decided that file names should now also have a timestamp in front of their name. A new developer in the project should implement this new requirement and is facing our toFileOrFolderName function. What I often see is a solution like this:

#3 Example: Minimizing Accidental Duplication

By eliminating our original functions we have caused our new function toFileOrFolderName to do more than one thing, that is violating the Single Responsibility Principle.

The fact that our original two functions do the same thing is an accident. One transforms file names and the other folder names. We don`t want the caller of toFileName to know anything about folder names and we don`t want the caller of toFolderName to know anything about file names.

How to minimize the duplication correctly?

Let`s restore our initial toFileName and toFolderName functions and get rid of the low-level duplication by creating the function toUnderscore :

#3 Example: Minimizing Accidental Duplication

By following the Single Responsibility Principle each function maintains a single level of abstraction and has only one single reason to change.

Even though we didn`t eliminate the accidental duplication entirely we minimized it by getting rid of the low-level duplication of transforming a string to underscore.

Theoretically, we should now be able to differentiate between essential and accidental duplication… But what if we are not sure?

Rule of Three

“Three strikes and you refactor”

Source: https://giphy.com/gifs/dallasmavs-YlkCKp0EttSa7jxg6y

Spotting knowledge duplication isn’t easy and cleaning up accidental duplication is far more harmful than having duplicated code.

The Rule of Three 3️⃣ basically defines that when you spot some duplicated code and the first two cases aren’t enough to clearly identify shared knowledge, wait for the third duplicate before you refactor.

“It’s really hard and feels terrible, but close your eyes and try it anyway.” — Justin Weiss

Martin Fowler defined the Rule of Three in his book Refactoring: Improving the Design of Existing Code:

The first time you do something, you just do it.
The second time you do something similar, you wince at the duplication, but you do the duplicate thing anyway.
The third time you do something similar, you refactor.

Final Thoughts

Duplicated code is one of the major reasons for technical debt and bugs in software. That`s why the DRY principle is one of the most valuable ones in software development. But, applying it correctly is even more important.

Remember, whenever you spot dome duplicated code ask yourself: “Am I looking atduplicated syntaxor**duplicated knowledge*?”.* And if you are not sure, apply the Rule of Three.

Thanks for reading!

Want to learn more about how I scaled my Chrome Extension to almost 100,000 users as a solopreneur? Subscribe to my stories or follow me on LinkedIn and Twitter.

If you read a lot online, make sure to check out my Chrome Extension loved by 90,000+ active user — it’s free:

https://web-highlights.com/