notes.velouria.dev/content/Posts/making-your-first-open-sour...

8.5 KiB
Raw Blame History

title draft tags date
Making your first open source contribution, part 3 - navigating new codebases false
open source
2020-02-05

This post is part of a 3-part series titled "Making your first open source contribution":

  1. Making your first open source contribution, part 1 - finding projects, issues, and mentored programs
  2. Making your first open source contribution, part 2 - before you start contributing
  3. Making your first open source contribution, part 3 - navigating new codebases

Youve got your development environment set up, and everythings working wonderfully. Sweet! Time to dive into the codebase… except…

Youre lost.

This part is mostly about navigating a new codebase, especially when its a large one, so the advice below probably applies to any large codebase & not just open source projects. When youre looking to contribute to open source projects though it might take a while to find a project that feels right for you, and as a consequence youll be meeting new codebases more often.

Navigating new, large codebases can be especially challenging for someone who:

a) is currently in school, with no access to large codebases (this was me!)

b) mostly work in self-initiated or small-sized codebases (me, currently!)

At a glance, this post might only be relevant to contributions that involve writing code, but Ive personally used these tips when contributing to the documentation too. Sometimes documentations are coupled with code that its impossible for you to not touch the projects codebase. Besides, when writing a documentation for a piece of code, you still need to understand what the code does & sometimes how it interacts with other functions in the project.

Use it

You might come across a project that youve never used before, but you want to contribute to it, & thats fine! In fact, although I use the pandas library daily, I dont use all of the functions, so I did find myself working on something that Ive never used before.

My first tip is: use it. If its a new library, go through its tutorials or “getting started” guides, & play around until youre comfortable enough with it.

If its a new function, check out the documentation, run the examples or use it in a toy problem so you can get a better intuition on the problem youre trying to solve.

Even if its a function that you have used before, you might need to modify parts that youre not familiar of, e.g. parameters that youve never had to use before so its probably still useful to do the things mentioned above.

Explore the tests

Sometimes its not very clear from the documentation what a function is supposed to do. Sometimes there is no documentation at all. If thats the case, usually what Id do next is explore the tests, especially unit tests, if there are any. Unit tests are great to learn from because they can show you how to correctly invoke a function or show you the expected behavior of a piece of code.

Tests can usually be found in their own folder, such as /tests.

Heres an example from pandas. Lets say that you want to know how to use the function rename_categories for CategoricalIndex & what should happen when yo

The test can give you some idea that, okay, if I have the following CategoricalIndex:

CategoricalIndex(list("aabbca"), categories=list("cab"))

And then I apply the rename_categories function:

result = ci.rename_categories(list("efg")))

Im supposed to get back a:

CategoricalIndex(list("ffggef"), categories=list("efg"))

Find keywords in the issue & use them to find relevant parts in the codebase

I usually extract important keywords in the issue, type that in the search bar of my code editor (I use VS Code) & see what other pieces of code pops up & where.

For example, I worked on an issue where I had to update the index parameter in pandas to_parquet. The first thing I did was search to_parquet in my code editor to see where the function is.

There are a lot of search results including other pieces of code that are calling the function to_parquet, instead of the to_parquet function itself. For this issue, Im not interested in these other parts of the codebase, so I had to narrow down my search.

I searched for def to_parquet() instead. In Python, the keyword def is the start of a function header, so I can be sure that I will get the locations of the to_parquet function itself. Of course, other programming languages will be different. The key here is sometimes you need to think of some tricks that can help you get better search results.

Search for similar issues & PRs

Other people might have made PRs that solved problems that are similar to the one that youre solving right now. You can use the keywords from the issue to search for other similar issues & PRs. A few things that you can learn from reading other issues & PRs:

  • Possibly relevant code & files: if the previous steps didnt work for you, this can help. In GitHub, you can find these by checking out the “Files Changed” tab in the PR. Here is an example.
  • Pointers on what to do: although the PR that Im looking at is not solving the same exact problem, sometimes they do give clues on what I can do to solve my problem, e.g. an existing helper function that I didnt know about that can simplify my solution.
  • Feedback from maintainers: oftentimes, maintainers request for changes before they approve your PRs. These are well recorded in the thread within the pull request, & theres always a thing or two that I can learn from them.
  • Bugs: a PR can introduce new bugs, which are often discovered after the PR is approved & merged. Learning about these bugs helps me become aware of the kinds of bugs that I may possibly introduce with my PR.

Most projects have platforms where they have discussions regarding the development of the project that are open to public, be it Slack, Gitter, mailing list, or other channels. These are usually listed either in the README or in their contributing guide. You can search for related discussions because its possible that others have asked similar questions, but of course you can ask your own question as well… which will bring me to my next point.

Ask for help

You might have done all of the above & still get stuck. Thats fine! Dont be afraid to ask for pointers - you can do this by raising a question in the relevant issue or asking questions in the dev channel (see above). You might find this scary at first, but if the project youre working on has a Code of Conduct (they better do!), it can be a reminder for you that inappropriate behaviors are not tolerated.

From browsing various repositories & joining communication channels, I also learned that people do ask questions all the time & its OK! I guess I had this assumption that everyone (but me) knows everything & this also contributed to how I initially perceived open source: intimidating & overwhelming. Seeing how people ask questions & how maintainers positively respond really helps shatter that unrealistic assumption.

Diving into new codebases is not a trivial thing, so if you feel like youre having difficulty making progress, its totally normal. Even the most experienced programmers still need time to understand a new codebase.

Final notes

One last thing I want to emphasize: you dont have to get it perfect the first time.

Your first contribution—or the ones after, really—does not have to be a pull request that provides a major feature with changes of thousands of lines of code. Your first pull request does not have to be fault-free—sometimes you mess up your git to the point that the only solution you can think of is deleting your repository & redoing your work (we all have been there, havent we?). Its okay if you forget to write your commit message with the correct prefix per the convention. In fact, you might find that these hiccups still happen in your second, third, fourth… hundredth contribution. Youll find that its not the end of the world. Youll learn. Youll continue contributing anyway.

The most important thing is to get started, & I hope this 3-part series helps you to do just that. :)