Back to all articles
Git hardcore: hooks, submodules, monorepos and surviving huge codebases

Git hardcore: hooks, submodules, monorepos and surviving huge codebases

Advanced Git techniques for large codebases: hooks, submodules, monorepos, performance and disaster plans.

Human-architected research synthesized with the assistance of AI personas.
7 min read

✨TL;DR / Executive Summary

Advanced Git techniques for large codebases: hooks, submodules, monorepos, performance and disaster plans.

πŸ’‘ TL;DR (Too Long; Didn't Read)

Large repos need advanced tools. git hooks automate local checks. git submodules is more trouble than it's worth (avoid). Monorepos work if well-structured. Performance improves with .gitignore, Git LFS and shallow clones. Treat the repo as critical infra: backups, protected branches, disaster plan. The truth: most "Git problems" are actually organization problems.

So far we've talked about day-to-day Git. Now it's time to touch the stuff people pretend doesn't exist because it's hard:

  • Git hooks to automate checks
  • Submodules (yes, that weird thing)
  • Monorepos and strategies for gigantic repos
  • Performance: when Git eats your RAM and CPU
  • Disaster planning so you don't become hostage of a single repo

This is for when you already know how to use Git, but the size of the codebase is starting to hurt.


1. Git hooks: automating good practices

Hooks are scripts that Git runs automatically at certain points:

  • Before a commit (pre-commit)
  • After a commit (post-commit)
  • Before a push (pre-push)
  • Etc.

They live in .git/hooks/.

1.1. Enabling a simple pre-commit

Example: block commits that still have stray console.logs.

Create .git/hooks/pre-commit:

bash
#!/usr/bin/env bash if git diff --cached | grep -q "console.log"; then echo " [pre-commit] Found 'console.log' in the diff. Remove it before committing." >&2 exit 1 fi

Make it executable:

bash
chmod +x .git/hooks/pre-commit

Now, if you try to commit with console.log, the commit fails.

1.2. Running linter/tests automatically

bash
#!/usr/bin/env bash npm test STATUS=$? if [ $STATUS -ne 0 ]; then echo " [pre-commit] Tests failed. Fix them before committing." >&2 exit $STATUS fi

Spicy take: local-only hooks are duct tape if the whole team isn't using them. For big teams, standardize via tools like Husky (JS), pre-commit (Python) or via CI.


2. Submodules: when (almost never) to use them

A submodule is basically a Git repo inside another Git repo.

Typical use case:

  • You have a shared library across multiple projects
  • You want to keep its history separate, but version it together

2.1. Adding a submodule

bash
git submodule add https://github.com/org/shared-lib.git libs/shared-lib git commit -m "Add shared-lib submodule"

This creates a .gitmodules entry and treats libs/shared-lib as a submodule.

2.2. Cloning a repo with submodules

bash
git clone https://github.com/org/project-with-submodules.git cd project-with-submodules git submodule update --init --recursive

Without this, submodule directories stay empty.

2.3. Why many people hate submodules

  • The flow is more cumbersome
  • It's easy to break if the team doesn't know what they're doing
  • CI needs special handling

Modern alternatives:

  • Separate repos with proper versioning (npm, pip, maven, etc.)
  • Monorepos with workspace managers (Nx, Turborepo, pnpm, Bazel, etc.)

Spicy take: submodules aren't "wrong". What's wrong is dropping submodules into a team that barely knows how to merge.


3. Monorepos: one repo to rule them all

A monorepo is when you put multiple projects (services, libs, apps) into the same Git repo.

3.1. Pros

  • Easier to keep versions in sync
  • Cross-service refactors are safer
  • CI can be optimized by paths
  • Less "where does this code live?" in your life

3.2. Cons

  • Repo can become huge
  • Initial clone is slow
  • History gets noisy if you don't organize it

3.3. Typical monorepo layout

text
.
β”œβ”€β”€ apps
β”‚   β”œβ”€β”€ web
β”‚   β”œβ”€β”€ mobile
β”‚   └── admin
β”œβ”€β”€ services
β”‚   β”œβ”€β”€ users
β”‚   β”œβ”€β”€ billing
β”‚   └── notifications
β”œβ”€β”€ libs
β”‚   β”œβ”€β”€ ui
β”‚   β”œβ”€β”€ core
β”‚   └── utils
└── tools
    └── ci-scripts

3.4. Useful Git commands in monorepos

History of a specific subdirectory:

bash
# see history for the billing service only git log --oneline -- services/billing

Diff between branches limited to a directory:

bash
git diff main..my-branch -- services/billing

Blame focused on one area:

bash
git blame services/billing/src/invoice_service.ts

3.5. Filtering history by path

Useful to generate a "per-project" history inside a monorepo:

bash
git log --oneline -- services/users

Or even extract a subdirectory into its own repo while keeping history (using git filter-repo or filter-branch).


4. Performance: when Git eats your machine

Huge repositories can slow Git down:

  • Too many binary files
  • Giant merge-heavy history
  • Accidentally committed node_modules or dist

4.1. Use .gitignore properly

No way around it: if you commit builds, dependencies and junk, you'll suffer.

Basic example for JS projects:

gitignore
node_modules/ dist/ coverage/ .env *.log

4.2. Git LFS for big files

Large binaries (heavy images, videos, ML models) don't work well with plain Git.

Use Git LFS (Large File Storage):

bash
git lfs track "*.psd" echo "*.psd filter=lfs diff=lfs merge=lfs -text" >> .gitattributes

4.3. Shallow clones (--depth)

In CI or environments where you don't need full history:

bash
git clone --depth=50 https://github.com/org/huge-project.git

This fetches only the last 50 commits.

4.4. Periodic cleanup

Over time, repos accumulate refs, old tags, dead branches.

bash
# remove deleted remote refs git fetch --prune # clean up local loose objects git gc --prune=now --aggressive

Tip: don't run git gc --aggressive every day, but doing it occasionally on large repos can help.


5. Disaster planning for critical repos

If your repo is the heart of the product, treat it accordingly.

5.1. Backups and multiple remotes

You can configure more than one remote:

bash
git remote add backup git@backup-server:org/project.git git push --all backup git push --tags backup

If GitHub/GitLab/etc. have a bad day, you still have a copy.

5.2. Branch protection

On providers like GitHub/GitLab/Bitbucket:

  • Protect main/master from push --force
  • Require PR + review for merges
  • Require green CI status

This isn't bureaucracy; it's what saves you from a bad git push --force on Friday 6pm.

5.3. Post-disaster recovery

If someone does something truly catastrophic:

  1. Stop everything (no more pushes until you understand the damage)
  2. Use git reflog on local clones that are still healthy
  3. Create a new branch from a good commit
  4. Align with the team on how to migrate to that safe point

Having fresh clones on multiple machines is already a form of implicit backup.


6. Hardcore summary

When repos get big and chaos grows:

  • Use hooks to automate local checks
  • Avoid submodules unless the team really knows what it's doing
  • Consider a monorepo if you have many interconnected services
  • Take care of performance: .gitignore, Git LFS, --depth, git gc
  • Treat the repo as critical infra: backup, protected branches, disaster plan

If this article made sense, you've probably been hit by Git a few times already.

Send it to that friend who still thinks git just means "save code on GitHub".

Find me on X/Twitter: @gss_62

#git #gittips

Receive new articles

Subscribe to receive notifications about new articles directly to your email

We won't send spam. You can unsubscribe at any time.