ReactOS Git For Dummies

From ReactOS Wiki
Revision as of 14:32, 5 October 2017 by Gigaherz (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

Basic Git concepts

Lots of information here, take it slowly!

  • Commit: A commit in git contains information about a set of changes to the code. It represents the same information you could see in a patch/diff file, but in an internal binary representation. Notably, a commit has some values we care about:
    • Hash: A SHA1 (for now) hash computed from all the data and metadata included in the commit.
    • Message: The first line, or 80 characters of the first line, are considered the summary, and the rest is usually shown separately.
    • Author: The name+email of the person that wrote the patch.
    • Committer: The name+email of the person that applied the patch.
    • Parent(s): In a standard commit, there will be one parent, represented as its commit hash. In a merge commit (see below), it would have two parents.
  • Branching: A git branch is merely an entry in the repository that has a HEAD property, pointing to the commit hash of the newest commit in the branch. The commit operation implicitly assigns the new hash to the branch’s head. This gives git an extremely fast branching system, turning them into a very effective tool for managing contributions.
  • Merging: When two branches are merged, a new commit is created, which contains all the data necessary to apply the changes from one branch, onto the other. This is represented in the metadata by a commit with two parents. Parent 1 represents the primary source of code, and parent 2 the “other” branch that is being merged. Git does not distinguish between merging from master into a work branch, or from a work branch to master, in both cases a merge is “A=A+B” with A being the currently checked out branch, and B the chosen target to merge from. The downside of merge commits is that they can “dirty” the history, when merge commits pile up one after another.
  • Commit: A git commit encodes the modified files, along with a commit message, and authorship information, into the internal representation. The commit’s hash is then assigned to the branch head in the local repository.
  • Clone: In git, everyone works based on a full copy of the history. The action of creating this copy from a remote repository into your computer, is called cloning.
  • Remotes: A git remote tells git about a repository in another computer (for most of us, this will be GitHub’s servers), where you will be fetching from and/or pushing to. You can have as many remotes as you need, and with any names you want, but the most commons are named “origin” for your personal fork, and “upstream” for the repository that you are contributing to.
  • Fetch: In order to retrieve new commits from a remote, the fetch operation computes what data you are missing, and sends back all the compressed objects and metadata you need in order to replicate the rest of the history. This data is accessible from special branches, with names such as “remotes/origin/master”, where “origin” is the name of the remote you assigned when setting up the repository, and “master” is the name of the branch in the remote repository.
  • Checkout: Checking out is the operation of extracting the data at the state represented by the given branch/commit, and making it available in the working copy. Unlike SVN, git does not like performing a checkout with modified files, and stashing is often required.
  • Stashing: Git has the ability to save local modifications as patches in an internal “stash”. It is a common thing to stash changes before you update the working copy, and “pop” from the stash when you are done, to return to the previous state or apply the changes to a different branch.
  • Pull: You will inevitably come across tutorials and examples using the pull command. This is a combined utility command that performs a fetch operation of the selected branch, and immediately after, a merge operation from the remote branch, to the selected branch. We don’t recommend using pull, but rather to use fetch and rebase.
  • Push: The reverse operation to pull. It computes the required data and sends it to the remote repository, subsequently updating the remote’s HEAD to match your local HEAD commit hash.
  • Fast-forwarding: A push operation is considered to be fast-forwarding, if it adds commits following the current HEAD, and non-fast-forward if the history does not fully contain all the commits from the target.
  • Rebase: In the normal workflow of git, updating the state of a branch implies merging from another branch. There are cases, however, when we do not want to use a merge commit, but rather we want to re-apply all the changes on top of the new updated code. The rebase operation does exactly that: it computes the list of commits that have been added since the branches split off, and applies all of them on top of the target commit. Because the metadata will change, a rebase operation is considered to “rewrite history”. Because of that, rebasing is normally considered an advanced operation, and can have side-effects that may be harder to grasp for a beginner. However, we’ll be using it extensively, so it is important to understand the idea.
  • Force-pushing: Git normally rejects commits that are non-fast-forward. If we amend, rebase, or otherwise modify the commit data or metadata of commits that have been previously pushed to a different branch or repository, we are implicitly generating an “alternate history” of those changes. Git will then refuse to accept those modifications, as they could potentially lose data, and hence is considered dangerous. Git provides a “--force” setting, which disables all of those checks. Because “--force” is extremely dangerous, it’s highly discouraged to use this flag. However, in order to effectively use the rebase option, we do need to force-push commits, when working with our own work branches (force-pushing to the master branch of our main repository is strictly forbidden , and reserved only for administrative actions). In order to lessen the danger of the force-push operation, there is an alternative flag, namely “--force-with-lease”, that conditionally allows the force-push, but only if the remote state matches the local cache.
  • Pull Requests: When contributing as an external contributor, or when working with large change sets, spanning many commits, or with sensitive edits, we want to be able to review the changes before they are applied. In GitHub, as with most other repository hosts, a pull request offers this service. A pull request gives other developers a chance to comment, request changes, and eventually approve or reject those changes. Once the changes have been reviewed, and sufficiently approved (the number of approvals is at the discretion of the pull request’s author -- or assignee if the author is not a team member), the Merge button allows integrating those changes into the master branch.

FAQ

  • You spoke about parents, but what about children, which is the “next commit” for a given hash?
    • Git does not have any information about the children in the commit graph, in order to obtain the list of children (branching implies there can be many), a slow walk through all the branch histories is needed. The main method to analyze the history is to open the commit log on your favorite Git GUI tool, and locate your commit in there. Because this can sometimes be annoying, we have an alternative way through the getbuilds site, which allows navigating back and forth through the builds, giving something that approximates the linear history of SVN.

Workflow

Basic rules that we should try to follow

  • Prefer working on a branch, and pushing to your personal fork.
    • Except for small commits that can be safely pushed to master, if you feel they are safe and you are part of the team
  • Avoid merge commits
    • This implies using rebase to ensure that all history remains linear
  • Keep a summary of the commit on the first line
    • Example:
[NTOSKRNL]: Update exports to NT6.3
* Added some stuff
* Removed some stuff 

Cloning the repository

These are the recommended steps to set up a local clone:

  • Get a personal fork on GitHub, which will host your work branches and will be the source of your Pull Requests. This fork will be available publicly in a URL like github.com/<yourusername>/reactos
    • I will be calling them “personal fork” to distinguish them from a project-wide fork such as OpenOffice vs LibreOffice. A personal fork is just a personal space to make changes in, with the intention of contributing them back upstream.
    • Creating forks on other sites is possible, but you won’t have the pull request feature, so not recommended.
  • Clone into your computer, either the personal fork or the main repository, doesn’t matter which one you start with.
  • Edit the repository settings to add the other repository as a remote, so that you end up with two remotes. The recommended names are:
    • “origin” -> your personal fork, which will be the default place you use for working branches.
    • “upstream” -> the main repository, which will be needed to get yourself up to date with the master HEAD.

Updating your environment with latest code

This is how you get the latest code set up before you start working on something new:

  • Checkout your local “master” branch (if you were working on another branch)
git checkout master 
  • Fetch the latest commits from the main repository
git fetch upstream 
  • Afterward, we will use the rebase command to get the fetched commits into the current branch
git rebase upstream/master 

Performing quick fixes (a.k.a. Working directly into the master branch of the main repository)

These are the steps one would follow during normal development of a “quick fix” (without a work branch). It is recommended to use a work branch for anything that is more than a trivial change:

  • Ensure your local master is up to date following the instructions above.
  • Do the work.
  • Commit early, commit often (locally).
    • You can later use TortoiseGit’s “combine into one commit”, or commandline git’s interactive rebase, to reduce the number of total commits you will push later, so don’t worry about spamming the log!
  • Commit the final touches.
  • Fetch the latest code, and use the rebase method to make sure your changes are applied on top of the latest master code.
    • THIS IS VERY IMPORTANT!
    • ALWAYS REBASE BEFORE PUSHING TO MASTER!
      • It’s okay if the target is a pull request, but that’s in the next section.
  • This is your last chance to use interactive rebase, or tortoisegit’s log window, to cleanup your commits until you are satisfied with the resulting “patch set”.
  • Push to upstream
git push upstream 
  • If the push operation complains about non-fastforward commits, it means someone else pushed something in between your fetch, and your push, so you will have to go back to the fetch+rebase steps and try again.
  • It should go without saying, but NEVER EVER FORCE PUSH TO MASTER , if you messed up, ask an admin.

Working on a branch

These are the steps one would follow to work on a personal branch, backed by your personal fork, on something that can’t be described as a “quick fix”:

  • Ensure your local master is up to date following the instructions above.
  • Create the branch with a name representative of the work you will be doing
    • “git branch <branch-name-here>”
  • Do the work.
  • Commit early, commit often (locally).
    • You can later use TortoiseGit’s “combine into one commit”, or commandline git’s interactive rebase, to reduce the number of total commits you will push later, so don’t worry about spamming the log!
  • Push to your personal fork whenever you want to back things up online (optional, but recommended)
    • The first time you push to your personal fork, it’s recommended to assign your chosen remote, so that later pushes are quicker
git push --set-upstream origin <branch-name> 
    • Subsequent pushes will be just
git push 
    • If you don’t, you will always have to push using
git push origin 
  • Commit the final touches.
  • Push to your personal fork (recommended), or into a branch on the main repository (not recommended, but possible)
    • See the remark above about pushing with and without an assigned remote
  • Navigate to your personal fork on github, and in your branch, choose “Create pull request”
    • If the push was done very recently, it will show up on the main page of your fork
  • Take a final look, and maybe ask others to review the code and give their approval.
    • Getting things reviewed is highly recommended, specially for bigger changes.
  • Once you are positive about your changes, press the “Merge with rebase” button that github provides.
  • If the merge button isn’t enabled, it probably means github can’t safely rebase your changes, in that situation, you will have to rebase manually:
    • Fetch the latest code from the main repository (no need to switch branches)
git fetch upstream
    • Rebase your branch on top of upstream/master
      • “git rebase upstream/master”
    • Force-push (with lease) into your personal fork
      • The push needs to be forced because rebase is a history-rewriting operation, and git doesn’t allow this by default.
      • “git push --force-with-lease <branch-name>”
        • In TortoiseGit, the push dialog has a “Force: May discard [_] known changes” checkbox, that activates this flag. The other checkbox activates the raw “--force” flag, and is not recommended.

Applying an old patch from SVN

We have plenty of patches in JIRA, and in our HDDs, that were created from SVN, or at least for svn.


Although git is perfectly capable of generating and applying patches, it’s primary patch format contains more metadata that svn patches are lacking. Most notably, the authorship information. To work around the differences in the patch format, alternative methods need to be used to get the patch applied.


Here are the steps needed to apply an old patch:

  • Make sure your environment is updated, and an appropriate branch is in place, as described in the previous sections.
  • Use the general-purpose “patch” program (instead of “git apply”) or a GUI patching utility to apply the changes.
    • With TortoiseGit, this can be done by copying the patch file to the appropriate folder, and using “TortoiseGit -> Review/Apply single patch…” from the context menu of the patch file.
    • TODO: I can’t remember the commandline to give to “patch” ;P
  • Review the changes and write an appropriate message.
  • Perform the commit with a custom author
    • With TortoiseGit: In the commit dialog, there is a “Set Author” checkbox, select this, and enter the name+email of the patch author.
    • With commandline:
git commit --author "name <email@address>" 
  • If you commited before, and you forgot to set author, this commandline will help
git commit --amend --author "name <email@address>"
  • Rebase and push as needed.

Slightly more advanced topics

Removing the need to enter your user+password on push (improves security)

Github (similarly to most other git hosts), offers a way to authenticate your machines using SSH, replacing the need for passwords. These ssh keys are usually generated one per machine, so that if one machine becomes stolen and/or compromised, you can disavow the public key, and prevent extra damage.

In order to use an SSH key, these are the steps:

  • Get yourself an SSH key. This depends on your environment and platform. My choice is to configure TortoiseGit to use Putty, generate my keys with PuttyGen, and load them with Putty’s Pageant (authentication agent). Another option is to use OpenSSH’s tools instead.
    • You can find tutorials of your chosen tool, or ask someone on IRC for help.
  • Add the public key to your github account, in the account settings, under “SSH Keys”. Don’t ever share the private key!
  • Change your local clone settings, so that your remotes point to the URL with SSH protocol, you can see this URL by navigating to the repository and clicking the download button and choosing “Use SSH”.
  • From this point on, pushing will not ask for your github password, but it WILL ask for the SSH passphrase if your private key has one.