Git (software)
Encyclopedia : G : GI : GIT : Git (software)
Git is a revision control file system project begun by Linus Torvalds to manage the Linux kernel and now maintained by Junio Hamano. It is free software, released under the GNU General Public License version 2. Originally designed only as a low-level engine that others could use to write front ends such as Cogito or StGIT, the core Git project has since become a complete revision control system that is usable directly. It is targeted to run on Linux, but is perfectly usable on other Unix-like operating systems (like BSD, Solaris and Darwin). Git has been made to work under MS Windows using cygwin, but it is noticeably slower, due to its heavy use of file system features that are particularly fast on Linux.
Unique characteristics
Git's design is a synthesis of Torvalds' intimate knowledge of maintaining a large distributed development project, and of file system performance. Combined with his urgent need to produce a working system in short order, these factors led to following characteristics:- Strong support for non-linear development. Git supports rapid and convenient branching and merging, and includes powerful tools for visualizing and navigating a non-linear development history. A core assumption in Git is that a change will be merged more often than it is written, as it is passed around various reviewers. Torvalds himself does the most merging and least direct editing, so has made sure that it works well.
- Distributed development. Like BitKeeper, SVK and Monotone, Git gives each developer a local copy of the entire development history, and changes are copied from one such repository to another. These changes are imported as additional development branches, and can be merged in the same way as a locally developed branch. Repositories can be easily published via HTTP, ssh, rsync, or a special git protocol.
- Efficient handling of large projects. Git is very fast, and scales well even when working with large projects or large histories. It is commonly an order of magnitude faster than other revision control systems, and several orders of magnitude faster on some operations.
- Cryptographic authentication of history. The Git history is stored in such a way that the name of a particular revision (a "commit" in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being noticed. (Monotone also has this property.)
- Toolkit design. Following the Unix tradition, Git is a series of primitive programs written in C, and a large number of shell scripts that provide convenient wrappers. It is easy to chain the components together to do other clever things.
- Pluggable merge strategies. As part of its toolkit design, git has a well-defined model of an incomplete merge, and it has multiple algorithms for completing it, culminating in telling the user that it is unable to complete the merge automatically and manual editing is required. It is thus easy to experiment with new merge algorithms.
- Garbage accumulates unless collected. Aborting operations or backing out changes will leave useless dangling objects in the database. These are generally a small fraction of the continuously growing history of wanted objects, but reclaiming the space using git-fsck-objects, can be slow.
Git rejects this concept and does not explicitly record file revision relationships at any level below the source code tree. This has the consequence that:
- Renames are handled implicitly rather than explicitly. A major complaint with CVS is that it uses the name of a file to identify its revision history, so moving or renaming a file is not possible without either interrupting its history, or renaming the history and thereby making the history inaccurate. Most post-CVS revision control systems solve this by giving a file a unique long-lived name (a sort of inode number) that survives renaming. Git does not record such an identifier, and this is claimed as an advantage. Source code files are sometimes split or merged as well as simply renamed, and recording this as a simple rename would freeze an inaccurate description of what happened in the (immutable) history. Git addresses the issue by detecting renames while browsing the history of snapshots rather than recording it when making the snapshot. (Briefly, given a file in revision N, a file of the same name in revision N-1 is its default ancestor. However, when there is no like-named file in revision N-1, Git searches for a file that existed only in revision N-1 and is very similar to the new file.) However, it does require more CPU-intensive work every time history is reviewed, and a number of options to adjust the heuristics.
- Periodic explicit object packing. Git stores each newly created object as a separate file. Although individually compressed, this takes a great deal of space and is inefficient. This is solved by the use of "packs" that store a large number of objects in a single file (or network byte stream), delta-compressed among themselves. Packs are compressed using the heuristic that files with the same name are probably similar, but do not depend on it for correctness. Newly created objects (newly added history) are still stored singly, and periodic repacking is required to maintain space efficiency.
Early history
Git development began after many kernel developers were forced to give up access to the proprietary BitKeeper system (see "Zero-cost BitKeeper for Linux and other open source projects"). The ability to use BitKeeper as freeware had been withdrawn by the copyright holder Larry McVoy after he claimed Andrew Tridgell had reverse engineered the BitKeeper protocols in violation of the BitKeeper license. The development of Git began on April 6, 2005, and proceeded very rapidly. The first merge of multiple branches was done on April 18, 2005, and two months later (June 16, 2005), the kernel 2.6.12 release was managed by Git.Linus wanted a distributed system that he could use like BitKeeper, but none of the available free systems met his needs, particularly his performance needs. From an e-mail he wrote on April 7, 2005 while writing the first prototype:
Linus achieved his performance goals; on April 29, 2005, the nascent Git was benchmarked recording patches to the Linux kernel tree at the rate of 6.7 per second.
He developed the system until it was usable by technical users, then turned over maintenance on July 26, 2005 to Junio Hamano, a major contributor to the project. Junio was responsible for the 1.0 release on December 21, 2005. As of July 2006, the current release is 1.4.1.
Implementation
Like BitKeeper, Git does not use a centralized server. However, Git's primitives are not inherently a SCM system. Torvalds explains,(Note that his opinion has changed since then.)
Git has two data structures, a mutable index that caches information about the working directory and the next revision to be committed, and an immutable, append-only object database containing four types of objects:
- A blob object is the content of a file. Blob objects have no names, timestamps, or other metadata.
- A tree object is the equivalent of a directory: it contains a list of filenames, each with some type bits and the name of a blob or tree object that is that file, symbolic link, or directory's contents. This object describes a snapshot of the source tree.
- A commit object links tree objects together into a history. It contains the name of a tree object (of the top-level source directory), a timestamp, a log message, and the names of zero or more parent commit objects.
- A tag object is a container that contains reference to another object and can hold additional meta-data related to another object. Most commonly it is used to store a digital signature of a commit object corresponding to a particular release of the data being tracked by Git.
Each object is identified by a SHA1 hash of its contents. Git computes the hash, and uses this value for the object's name. The object is put into a directory matching the first two characters of its hash. The rest of the hash is used as the file name for that object.
Git stores each revision of a file as a unique blob object. The relationships between the blobs can be found through examining the tree and commit objects. Newly added objects are stored in their entirety using zlib compression. This can consume a large amount of hard disk space quickly, so objects can be combined into packs, which use delta compression to save space, storing blobs as their changes relative to other blobs.
Using Git
Git is quite easy to use, a selection of basic commands is given below (for a complete list, see the [GIT manpages]):- git init-db -- creates a new repository.
- git add . -- adds all files in current directory to the list of files under git revision control.
- git status -- shows which files in working copy need to be committed
- git commit -a -- commit all changes in working copy into the repository
- git log -- show a listing of changes committed to the repository starting from most recent
- gitk -- graphical browsing of the repository history
Related projects
Projects built on top of Git- Cogito ([homepage]) - Petr Baudiš maintains a set of scripts called Cogito (formerly git-pasky), a revision control system that uses Git as its backend.
- StGIT ([homepage]) - Stacked GIT is a Python application providing similar functionality to Quilt ([homepage]) (i.e. pushing/popping patches to/from a stack) on top of Git, to manage patches until they get merged upstream.
- [pg (Patchy GIT)] is a shell script wrapper around Git to help the user manage a set of patches to files. pg is somewhat like Quilt or StGIT, but it does have a slightly different feature set.
- [(h)gct] is a GUI enabled commit tool for Git and Mercurial (hg).
- [DarcsGit] is an enhancement to Darcs enabling it to interact with Git repositories.
- [gitweb] – a Perl implementation maintained by Kay Sievers. Used at kernel.org
- [wit] – a Python implementation maintained by Christian Meder.
- [gitk] is a simple Tcl/Tk GUI for browsing history of Git repositories easily, distributed with Git.
- [QGit] (SourceForge [project page]) is a Qt GUI for browsing history of Git repositories, similar to gitk.
Projects that use Git
Aside from the Linux kernel and many of its related projects, notable users of Git[link] include:References
External links
- redirect[[Template:Portal]]
- [GIT homepage], maintained by Petr Baudis
- [GIT manpages]
- [A tutorial introduction to GIT]
- [Manage source code using Git]
- Git mailing list archives, primary source documentation for git's development history:
- *[Git mailing list archive at MARC], Mailing list ARChives, with search.
- *[Alternate Australian archive].
- *[Gmane archive], with web interface using frames, blog-like interface, [NNTP interface] (Usenet interface) and RSS feeds.
- *[spinics.net archive] (least used)
- *[mail-archive] of git@vger.kernel.org
- [#git] IRC channel on [FreeNode].
- *[Official #git channel log]
- *[Historical #git channel log]
- [git] - the project page at kernel.org
- *[GIT source code] via gitweb
- [GitWiki] (MoinMoin powered)
- [Kernel Hackers' Guide to git]
- [Git Traffic] – Newsletter that summarises events on the git mailing list (only one issue, for 2 May 2005, seems abandoned)
- [The guts of git], article by LWN.net
- From KernelTrap (all pre 1.0 release):
- *[Managing the Kernel Source With 'git']
- *[Continued git Development]
- *[Beginner's Guide To Git]
- *[Importing The Kernel Into git, Merging]
- *[Git Web Interfaces]
- *[Official Git Web Interface]
- *[git for beginners]
- *[Junio Hamano New Git Maintainer]
- *[Using Git For More Than The Kernel]
- [PC World] – "Torvalds seemed aware that his decision to drop BitKeeper would also be controversial. When asked why he called the new software, "git," British slang meaning "a rotten person," he said. 'I'm an egotistical bastard, so I name all my projects after myself. First Linux, now git.'"
- [Website for Git] seemingly unmaintained since mid-2005. Doesn't offer much.
- [Examples of different merge algorithms] at [Revctrl] wiki
- [Git] and [WhatIsGit] at [LinuxMIPS] wiki
- [Projects that use Git] from GitWiki
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
