How to get up to speed in a new codebase, fast
You're alone in a wooded wilderness. Dappled sunlight is sneaking through the forest canopy. You hear the steady hum of a stream in the distance. A compass hangs from your neck, and you raise it to check for north. You swing your backpack off of your shoulders, and open it to look for your map and your mission. Inside, instead of a map you find a hastily scribbled software architecture diagram, a laptop, and a note that says: "Find the React component that renders the Pay Invoice button, and the HTTP route that processes payment requests." You open the laptop and find the text editor, angle the screen away from the glare of the sun, and sit down on an old tree stump... the code orienteering race has begun!
Orienteering is a sport that involves navigating through an unfamiliar terrain using a map and compass. Participants race against each other to find a series of checkpoints or control points in the quickest time possible. The terrain can vary from urban parks to remote wilderness areas, and participants must navigate around obstacles and through various types of terrain, such as forests, hills, and streams.
While programming isn't a competitive race, code orienteering is an important skill for any developer. I imagine the experience of navigating an unfamiliar terrain might have some similarities to the challenges we experience navigating an unfamiliar codebase. When we join a team or project, we're often not starting from an empty repository. We're walking into an entangled, multilayered ecosystem of existing code often without clear paths to navigate.
In code orienteering, the goal isn't to get from point A to point B as quickly as possible, it's to get a better understanding of the big picture of the system as a whole, how it's organized, why it grew that way, and how to navigate it so that we can contribute to it more thoughtfully and effectively. There are many strategies to practice code orienteering, and here are some of my favorite tips and approaches for a successful orienteering adventure.
(Photo of orienteering gear by Jametlene Reskp on Unsplash)
Look for common landmark types
While every codebase has unique organization patterns and development tools and might be simple or extremely complex, every codebase also has similar basic needs: data and document storage, a user interface, and a way for views and databases to communicate with each other. In most codebases, the code is often organized around these needs and these areas can be used as landmarks just like rivers and mountains in a landscape.
When surveying an unfamiliar codebase, keep a list of a few basic questions in mind and search for clues to answer them:
- How does the frontend request or receive data from the backend? The patterns for fetching and sending data and handling miscommunication or errors between server and client are some of the most important code in any codebase. It's a great place to start to get a survey of the overall landscape.
- How many and what kind of databases are available? How is the data structured? Is it loosely defined or strictly defined? Some codebases have highly structured and rigid data storage, like solid rock layers, while others are more fluid and shifting, like sandy soil. Often there are multiple complementary layers in data storage from low level persistent storage to high level temporary caches. While you may not need to understand all the details of the data storage layer, understanding its interface and capabilities provides a helpful map of a large part of the codebase.
- How do users access and interact with the frontend interface? How is the interface rendered or patterned? Is there a visual or audio interface? Is there a programmatic interface? The frontend of a codebase is like the flora and fauna of the landscape. It's varies widely between different codebases, just like the plants and animals in a desert are distinct from the ones found in a rainforest. Some codebases have well defined trails and patterns available for navigating the frontend code, while some are deeply entangled ecosystems that are difficult to traverse. All interfaces are multifaceted with different kinds of user accessibility—visual, audible, tactile, and programmatic. Understanding all the nuances of interface-related code can take a while, but making a list of the general patterns and tools currently in use can be handy when diving in to more specific tasks.
Make a visual map
In orienteering, participants are given a topographical map which they must interpret to navigate the landscape. If you're lucky, before your own orienteering through a codebase, you'll also have access to documentation from other developers that provides some guidance. Of course, very often this documentation is incomplete, and is likely out of date compared to the reality in the codebase. Codebases in active development are constantly evolving, and even the most energetic efforts to maintain documentation aren't perfect.
Even if you do have great documentation and helpful colleagues available, I recommend you make your own visual map of the codebase as you survey it using any visual note taking tool of your choice. It helps to make maps at several scales, one for the system as a whole, and several for subsystems within the codebase at different levels of detail. In the high level map you might give names to areas of the system and show how they interact, and in low level maps you might show specific function and module names within a particular feature.
Mapmaking is practical because when you inevitably get turned around after opening a few dozen files you can quickly see the path back to where you started and get reoriented. It also serves as a record of your progress, where you left off, and what's left to explore so you can more easily start again if you need to stop. You can also show it to colleagues who have more experience in the codebase as a visual representation of your understanding so they can help make it more accurate or point out paths you may have missed. So, take the time to read the trail guides if you can, including documentation for third party libraries and packages, but keep your own notes as well.
Leave only footprints
While making your way through an unfamiliar codebase, a lot of the code you discover may not have an obvious purpose. Of course ideally code is written in a way that is easy to understand through clear module, function, and variable names or has clarifying comments, but in practice there are always areas that are difficult to decipher. I often find myself thinking, "Is this code even necessary?! It seems pointless, I'd love to simply delete it!"
As good as it feels to delete seemingly disposable code, orienteering is not the time to make extensive changes. Sometimes, rarely, the code you find is truly pointless. It was made redundant by a change elsewhere, or left behind unintentionally during a previous feature deprecation. But much more often than not, the code is a perfect example of the law of Chesterton's Fence: don't remove a fence unless you understand why it was put there in the first place. By removing the code, all may seem well at first... until it's released to production and mysterious support tickets start appearing describing a new issue in a feature that has been working fine for years.
Code refactoring and new feature development are separate processes from orienteering. During refactoring, steps are taken to better understand the purpose of specific lines of code and make it safe to change them without introducing unintentional side effects or damaging the function of the existing code. During orienteering, the goal is to understand high level structure and general purpose of large areas of the codebase so that you can navigate more easily and keep the whole in mind when doing more specific refactoring work or new development.
With this in mind, when in an orienteering mode, only make changes to add comments or TODO notes for yourself to return to when you have time to do a more thorough investigation. These notes are a clear trail you or your colleagues can follow later, and makes areas for improvement easier to spot so that more legible paths can be created. They're a sign that "someone has been working here" and that the codebase has caretakers looking for ways to maintain it, which encourages everyone to tread more carefully.
Use developer tools to scout
Reading code is like reading a map. Running code is like actually hiking through the forest. We can get a pretty good sense of the landscape from the map, but the only way to fully understand the code is to actually run it and see what happens. If we're only reading code, we can make a lot of assumptions that may not be true, or miss important details that are only apparent when seeing actual inputs and outputs. Orienteering in a codebase involves both reading code and actually running it and observing the results.
Consider your normal developer tools to also be your pack of code orienteering gear. Run existing tests to see the real code in action, and write your own to see how the code responds in different scenarios. Use a debugger to trace code execution and inspect variables. And manually run the code with experimental, temporary changes to observe the results and gain a deeper understanding of its relationships. All of these approaches give a much more detailed experience of the code in reality and reveal hidden paths that might be missed with only skimming. Of course don't forget to undo your changes at the end and "leave only footprints"!
Make a campsite
In codebases that have been around for a while, it's impossible to get a complete picture in a day or a week, and complex codebases can take years to fully explore. Sometimes it's best to focus your efforts on one part of the codebase from which you can get a good view of other parts. By fully exploring one feature and beginning to make significant contributions to it through refactoring and new development, you can learn a lot about other areas of the code since similar patterns are often repeated elsewhere.
Consider this area your home base campsite you can return to when you get stuck or fatigued exploring or want to experiment more in depth with new ideas you might want to implement. Since everything is interconnected, focusing on one feature as a vertical slice of the code can be more efficient than exploring horizontally across all features.
Code orienteering is an essential skill for developers to get up to speed with a codebase and make effective contributions to the quality of the whole system. As you get started with a new codebase, you may be eager to start building new features and making improvements right away, but spending some time orienteering by finding common landmarks, making maps, and exploring with developer tools can make your contributions more effective, efficient, and thoughtful.
Do you have any favorite strategies for getting familiar with a new codebase? Please share them in the comments below!
By the way… I’m writing a book! Introducing The Code Gardener’s Almanac…
Do you have an itch to build a programming side project but aren't sure where to start? Are you finally starting to get the hang of programming, only to be left wondering, "now what should I make?" Do you feel dispirited by endless two-week sprints, confounding codebases, and tumultuous job markets and wonder if there's a better way?
Whether you're a seasoned pro or just starting out, The Code Gardener's Almanac offers approachable lessons on programming as a creative practice. In this mysterious handbook you'll learn why creativity is a skill and how to practice it like any other skill. You'll also learn how to apply new creative skills to build the project of your dreams, navigate uncertain career changes, and collaborate with your team to make great software.
Almost every developer was drawn to programming for its ability to bring ideas to life and the satisfaction of practicing a craft. The Code Gardener's Almanac will help you tap into this thrill and satisfaction, giving you the tools to expand your creative skills and find fun, wonder, and curiosity in your programming work. So, why wait? Continue your journey towards a more creative and fulfilling career in software engineering today: preorder The Code Gardener's Almanac.