Monorepository and polyrepository. Part 2. Some thoughts on perfecting industry standards

Last edited: January 10, 2022

In the previous article, we looked at monorepos and polyrepos, their pros, cons, and common applications. Now let’s see why none of these solutions is ideal.

I don’t like either of them (and a few more thoughts)

As you might see, the further we go, the steeper is our slope. Even the title of this part bodes us no good. Both monorepos and polyrepos have big problems, and they are the following:

The biggest problem is conceptual. There are no single meanings for the terms “monorepository” or “polyrepository”. As we’ve seen earlier, a monorepo could either be a single repository for many projects, or for a single enormous one. “Polyrepo” only means that there is more than one repository involved.
After reading several articles, I concluded that some problems are not actually being solved by either of these approaches; neither code reusability nor cross-project refactoring, nor something else. The most apparent evidence of this issue is that authors praise both monorepo and polyrepo with the same arguments. And if something is true in both cases, it does not influence the subject.
Monorepository for multiple projects abuses the repository idea as it was initially introduced by code hosting services. Originally, path-based CI/CD was not an option, and it had to appear after the concept of monorepos matured. We couldn’t yet overcome the inability to experiment with processes. The DoD example above illustrates this conundrum. What does it mean for us? It means agility failures for the teams and fragility followed by stagnation for the projects.
Polyrepositories do not have a well-defined way of growth and division, and thus they can’t have automation for it. How do I understand if it is time to split into several repositories? How do I do versioning and release then?

Thoughts on a possible upgrade to code-keeping approaches

We don’t have a satisfying solution for code keeping. Let me suggest some shifts we could make to upgrade our approaches and maybe then invent something meaningful. I think we need to invest in the development of the polyrepository strategy. I’m going to cover the following points:

Meta-repository level concept and tooling
Decoupling source and infrastructural code
Default tooling and processes with the option for adjustment or replacement
Switching from static to dynamic imports in some cases
- Lightweight importing approaches
Overcoming versioning scalability flaws and looking for robust contract-based testing

Meta-Repository level concept and tooling

When talking about the importing above, I noticed that the polyrepository approach doesn’t prevent you from placing modules in your file system the same way the monorepository approach does. You just need to do the necessary provisioning manually. The daily tool of a software engineer is Git, and it allows you to clone only a single repo. We used to have a massive readme at my last place describing how to place modules from different repositories to start correctly.

You might point me to Shell or Makefile to solve this problem for a single place, but they have a different purpose. Also, their learning curve is not gentle, which doesn’t help their popularity. For this case, we need something as simple as the git clone command pointing to a document describing the positioning of repositories and interconnection of modules. An example of a purely imaginary structure of a document could be the following:

https://gitlab.com/product/backend-core.git
	# Put Logging module into the logging folder inside the core.
	git@bitbucket.org:project/backend-logging-module.git -> logging
	# Put OAuth module into the oauth folder inside the core.
	git@bitbucket.org:project/backend-oauth.git -> oauth
git@github.com:project/micro-front-end-public.git
git@github.com:project/micro-front-end-application.git
git@github.com:project/micro-front-end-feedback.git

Then you could run your meta-git clone command and have everything downloaded to your computer.

I thought of Docker and other similar-purpose software when coming up with this idea. It would also be great to have something as simple to build and run everything locally to become ready for development.

Decoupling source and infrastructural code

In front-end development, you usually have two separate file groups inside your project: source files of the application under development and infrastructural files like package.json, webpack configuration, and more. We could divide the latter even further: files for local development and files for CI/CD. When making a production build, you usually don’t need the webpack development mode and vice versa. Nevertheless, we always download everything. In the Predix Design System mentioned in the previous article, every component has its own infrastructural part, and there are 131 components. What a tremendous amount of work is required to keep everything updated!

The idea here is to have a dedicated repository for every part mentioned and combine them when cloning. For the Predix case, we could have 131 repositories describing components, the 132nd repository describing local development infrastructure, the 133rd repository describing the production build. This minor update would save lots of maintenance effort.

Default tooling and processes with the option for adjustment or replacement

Agile software development needs to contain the claimed agility somewhere. If you have your build tooling and process-supporting environment engraved in stone for the whole company, you miss opportunities to rethink how work gets done. If you implement Scrum, XP, or other well-described processes and never consider reviewing them, you become fragile instead of agile. Let’s ask some highly acclaimed management books what they would recommend doing with our approaches:

Improve constantly and forever the system of production and service, to improve quality and productivity, and thus constantly decrease costs.

— “Out of Crisis” by William Edwards Deming

The first mindset shift is to create a culture of experimentation.

— “From Chaos to Successful Distributed Agile Teams” by Johanna Rothman and Mark Kilby

The Third Way “shows us how to create a culture that simultaneously fosters experimentation, learning from failure, and understanding that repetition and practice are the prerequisites to mastery.”

— “Summary of The Project Phoenix” by Gene Kim, Kevin Behr, and George Spafford

Improve collaboratively and evolve experimentally.

— “Kanban in Action” by Marcus Hammarberg and Joakim Sundén

It looks like that these books by very different authors are very well aligned. What kind of environment could we come up with to gather all these promising findings? What to do to make experimenting easier? Let’s use another great concept from the book Antifragile by Nassim Nicholas Taleb: optionality.

The summarized idea is to have default tooling for a specific company’s code assets and default processes; e.g., to have a decoupled webpack configuration for local development as described earlier. It is also a good idea to have optionality in two forms:

Selective adjustment of specific tooling or process parts
Complete replacement of tooling or processes

The second one would be easier to implement, but the first one allows much cheaper experiments and fosters evolvability. If some investigation shows successful results, we can use its approach to update the default company-wide tooling or processes.

Selective adjustment of specific tooling or process parts
Complete replacement of tooling or processes

{
  "devDependencies": {
	  "express": "libraries-expectations/express.ts"
  }
}

This solution is not risk-free:

We can leave some expectations undocumented and rely on the corresponding interfaces. They might break on the next update in a place we don’t expect.
We allow the automatic upgrade to a version with possible problems.
We add extra work to dependency management by adding the need to write tests, and so on.

The potentially small scope of expectations covered by manually written tests bothers me more than the others. We could introduce an enhancement here by using generative testing. This approach relies on a general description of input and output values. It generates multiple values and checks if all the invocations meet the desired conditions. The plan could be the following:

Write usual tests covering edge cases
Apply generative tests for common ones

The two-sided testing approach helps us ensure that a specific library version meets our expectations. What about the possible vulnerabilities? What if the 5.11.44 version contains a problem, and 5.11.43 doesn’t have it? We discover vulnerabilities by running something like the `npm audit` command. We could also use it for the test-driven library version selection in our algorithm.

Final words

We went a very long way in this article discovering the current state of code-keeping practices. I hope you are not too disappointed with my pessimistic view of the existing practices. I also hope that the ideas from the last part of this article might help someone find a better approach that would increase our coding productivity, ability to experiment, onboarding practices, and other things.

However, I don’t want to leave you without practical advice. If you need to develop a product right away, use the most elaborate approach, despite the problems I described above. By “elaborate approach” I mean monorepositories, as they have really well-defined and well-documented tools. Once your project starts bringing in good profit, you can return to the issues and suggestions expressed in this article and attempt to solve some issues of the polyrepository approach in order to move the software engineering industry further.