
Git hardcore: hooks, submodules, monorepos and surviving huge codebases
Advanced Git techniques for large codebases: hooks, submodules, monorepos, performance and disaster plans.
β¨TL;DR / Executive Summary
Advanced Git techniques for large codebases: hooks, submodules, monorepos, performance and disaster plans.
π‘ TL;DR (Too Long; Didn't Read)
Large repos need advanced tools.
git hooksautomate local checks.git submodulesis more trouble than it's worth (avoid). Monorepos work if well-structured. Performance improves with.gitignore, Git LFS and shallow clones. Treat the repo as critical infra: backups, protected branches, disaster plan. The truth: most "Git problems" are actually organization problems.
So far we've talked about day-to-day Git. Now it's time to touch the stuff people pretend doesn't exist because it's hard:
- Git hooks to automate checks
- Submodules (yes, that weird thing)
- Monorepos and strategies for gigantic repos
- Performance: when Git eats your RAM and CPU
- Disaster planning so you don't become hostage of a single repo
This is for when you already know how to use Git, but the size of the codebase is starting to hurt.
1. Git hooks: automating good practices
Hooks are scripts that Git runs automatically at certain points:
- Before a commit (
pre-commit) - After a commit (
post-commit) - Before a push (
pre-push) - Etc.
They live in .git/hooks/.
1.1. Enabling a simple pre-commit
Example: block commits that still have stray console.logs.
Create .git/hooks/pre-commit:
#!/usr/bin/env bash
if git diff --cached | grep -q "console.log"; then
echo "
[pre-commit] Found 'console.log' in the diff. Remove it before committing." >&2
exit 1
fiMake it executable:
chmod +x .git/hooks/pre-commitNow, if you try to commit with console.log, the commit fails.
1.2. Running linter/tests automatically
#!/usr/bin/env bash
npm test
STATUS=$?
if [ $STATUS -ne 0 ]; then
echo "
[pre-commit] Tests failed. Fix them before committing." >&2
exit $STATUS
fiSpicy take: local-only hooks are duct tape if the whole team isn't using them. For big teams, standardize via tools like Husky (JS), pre-commit (Python) or via CI.
2. Submodules: when (almost never) to use them
A submodule is basically a Git repo inside another Git repo.
Typical use case:
- You have a shared library across multiple projects
- You want to keep its history separate, but version it together
2.1. Adding a submodule
git submodule add https://github.com/org/shared-lib.git libs/shared-lib
git commit -m "Add shared-lib submodule"This creates a .gitmodules entry and treats libs/shared-lib as a submodule.
2.2. Cloning a repo with submodules
git clone https://github.com/org/project-with-submodules.git
cd project-with-submodules
git submodule update --init --recursiveWithout this, submodule directories stay empty.
2.3. Why many people hate submodules
- The flow is more cumbersome
- It's easy to break if the team doesn't know what they're doing
- CI needs special handling
Modern alternatives:
- Separate repos with proper versioning (npm, pip, maven, etc.)
- Monorepos with workspace managers (Nx, Turborepo, pnpm, Bazel, etc.)
Spicy take: submodules aren't "wrong". What's wrong is dropping submodules into a team that barely knows how to merge.
3. Monorepos: one repo to rule them all
A monorepo is when you put multiple projects (services, libs, apps) into the same Git repo.
3.1. Pros
- Easier to keep versions in sync
- Cross-service refactors are safer
- CI can be optimized by paths
- Less "where does this code live?" in your life
3.2. Cons
- Repo can become huge
- Initial clone is slow
- History gets noisy if you don't organize it
3.3. Typical monorepo layout
.
βββ apps
β βββ web
β βββ mobile
β βββ admin
βββ services
β βββ users
β βββ billing
β βββ notifications
βββ libs
β βββ ui
β βββ core
β βββ utils
βββ tools
βββ ci-scripts3.4. Useful Git commands in monorepos
History of a specific subdirectory:
# see history for the billing service only
git log --oneline -- services/billingDiff between branches limited to a directory:
git diff main..my-branch -- services/billingBlame focused on one area:
git blame services/billing/src/invoice_service.ts3.5. Filtering history by path
Useful to generate a "per-project" history inside a monorepo:
git log --oneline -- services/usersOr even extract a subdirectory into its own repo while keeping history (using git filter-repo or filter-branch).
4. Performance: when Git eats your machine
Huge repositories can slow Git down:
- Too many binary files
- Giant merge-heavy history
- Accidentally committed
node_modulesordist
4.1. Use .gitignore properly
No way around it: if you commit builds, dependencies and junk, you'll suffer.
Basic example for JS projects:
node_modules/
dist/
coverage/
.env
*.log4.2. Git LFS for big files
Large binaries (heavy images, videos, ML models) don't work well with plain Git.
Use Git LFS (Large File Storage):
git lfs track "*.psd"
echo "*.psd filter=lfs diff=lfs merge=lfs -text" >> .gitattributes4.3. Shallow clones (--depth)
In CI or environments where you don't need full history:
git clone --depth=50 https://github.com/org/huge-project.gitThis fetches only the last 50 commits.
4.4. Periodic cleanup
Over time, repos accumulate refs, old tags, dead branches.
# remove deleted remote refs
git fetch --prune
# clean up local loose objects
git gc --prune=now --aggressiveTip: don't run
git gc --aggressiveevery day, but doing it occasionally on large repos can help.
5. Disaster planning for critical repos
If your repo is the heart of the product, treat it accordingly.
5.1. Backups and multiple remotes
You can configure more than one remote:
git remote add backup git@backup-server:org/project.git
git push --all backup
git push --tags backupIf GitHub/GitLab/etc. have a bad day, you still have a copy.
5.2. Branch protection
On providers like GitHub/GitLab/Bitbucket:
- Protect
main/masterfrompush --force - Require PR + review for merges
- Require green CI status
This isn't bureaucracy; it's what saves you from a bad git push --force on Friday 6pm.
5.3. Post-disaster recovery
If someone does something truly catastrophic:
- Stop everything (no more pushes until you understand the damage)
- Use
git reflogon local clones that are still healthy - Create a new branch from a good commit
- Align with the team on how to migrate to that safe point
Having fresh clones on multiple machines is already a form of implicit backup.
6. Hardcore summary
When repos get big and chaos grows:
- Use hooks to automate local checks
- Avoid submodules unless the team really knows what it's doing
- Consider a monorepo if you have many interconnected services
- Take care of performance:
.gitignore, Git LFS,--depth,git gc - Treat the repo as critical infra: backup, protected branches, disaster plan
If this article made sense, you've probably been hit by Git a few times already.
Send it to that friend who still thinks git just means "save code on GitHub".
Find me on X/Twitter: @gss_62
#git #gittips