An Early History of git

An Early History of git

2020-08-13T13:05:17-07:00

In my experience, nothing inflames the passions of software developers like a discussion about source code control tools. You can’t make this stuff up. At nearly every company I’ve ever worked at, a decision to choose one system over another, create a new one, or replace an existing one being used has led to broken friendships, recriminations and general disaffection.

Off the top of my head I have used the original C version of SCCS, Network Software Environment (NSE), TeamWare, RCS, CVS, and eventually Perforce. I don’t know how I would exist without a source code control system. Of course, this is not a complete list of version control systems (another name for the technology) in use.

While most of NetApp was using Perforce (a decision to switch to from CVS that I participated in way back in 1998, for use in ONTAP development, as NetApp engineering was doubling every year), when it came to ramping up a new storage product line at NetApp, as I built a team from scratch, we made a bunch of product architecture and development environment decisions that were strikingly different from normal development at NetApp. We implemented our product base Mars OS in Linux user space using the about-to-be-released VFIO interface. We used a well-defined, but useful, subset of C++ as the implementation language. And we chose to use git as the source control system*.

When I told git’s creator, Linus Torvalds, that our project had decided to use his software in our project, he replied “You and the rest of the world.” While I expected there to be controversy around the use of C++ as the primary implementation language amongst the FlashRay team developers, I was taken aback by the controversy over git usage. Some internal transfers argued for continued use of Perforce, but I was looking forward to exploiting some ecosystem possibilities around git in overall product development.  

It is easily argued that Linus Torvalds is one of the most influential software developers in the world for his creation of Linux. Linux is probably running in your car, your washing machine, definitely your television, and various smart devices in your home. It’s ubiquitous. But, necessity forced Linus to create a source code management system back in 2005.

 Git branching diagram: source https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows

Taking a break from work, I went searching for the earliest history of git, and I found some info on the web that did not quite mesh with what Linus told me at the time, during its creation. So I decided to ask him a question over e-mail:

beepy: A completely orthogonal question. When you locked yourself away to produce what became git, how long did you spend in seclusion until you had the first working version that you shared with the world?

Linus: It was something like two weeks. The first git commit is April 7, 2005, and that was actually very early on: just over a thousand lines of code total. So that was just a day or two in.

So, in a day or two, a new source code management system was born.

Linus continues: I made the 2.6.12-rc2 kernel (last kernel with BK) on April 4th.

On April 6th, I posted this:

 https://lore.kernel.org/lkml/Pine.LNX.4.58.0504060800280.2215@ppc970.osdl.org/

and that “kernel SCM saga” thread actually ends up having a fair amount of discussion about what I’m doing. For example:

 https://lore.kernel.org/lkml/Pine.LNX.4.58.0504081613180.28951@ppc970.osdl.org/

already talks about the SHA1’s and how the tree objects work, and why doing “diff” is efficient.

Before the use of bitkeeper (BK), Linus merged single kernel patches sent from subsystem maintainers via e-mail. So bitkeeper provided a huge leap forward in terms of managing the Linux kernel as the number of contributors grew.

The performance of the source code management system was a critical factor for Linus in defining git.

Linus continues: And in

https://lore.kernel.org/lkml/Pine.LNX.4.58.0504080758420.28951@ppc970.osdl.org/

people are already sending patches to some of the early git infrastructure (and one of the replies is from a person who ended up being a git developer).

And on April 16, I imported that last kernel version into git, and started applying patches.

And my first kernel merge commit is from April 17. So by then we already had a couple of kernel developers who were cautiously testing the waters with git.

So two weeks for it to be “functional”, although honestly, at that point a lot of it was very very raw and a lot of manual processes.

It’s been 15 years, and git went from “odd very inconvenient shell scripts and trying to explain what I mean by type-tagged objects” to” everybody uses it”.

-Linus

Of course we use git at DriveScale (“You and the rest of the world”). And I use it in my work with the proposed SPECstorage® Solution 2020 benchmark. It is the most widely used source code control system today.

Git has several things going for it. I’ll put first that it is free and open source. That means you can look under the covers and check out the source of the tool itself. 

beepy@Tireless git % git clone https://github.com/git/git.git

And it only take a few pokes to find an early commit to git, since it was self hosting almost immediately:

beepy@Tireless git % git log cache.h | tail

    Make “cat-file” output the file contents to stdout.

    New syntax: “cat-file -t ” shows the tag, while “cat-file “

    outputs the file contents after checking that the supplied tag matches.

commit e83c5163316f89bfbde7d9ab23ca2e25604af290

Author: Linus Torvalds <torvalds@linux-foundation.org>

Date:   Thu Apr 7 15:13:13 2005 -0700

    Initial revision of “git”, the information manager from hell

beepy@Tireless git % 

Second, git supports distributed development. When you make a clone of a repository, it contains an entire copy of the original source tree, including comments and version history. A simple outcome of this is that git supports completely disconnected operation. This resilience or capability is critical for a widely distributed development project like Linux.

Third, git is secure. It uses a strong hash function when you do a commit to ensure that the code and its metadata is signed in a way that prevents malicious modification.

Deploying git is easy for developers via hosting services like github (free for open source projects) and gitlab. Other toolsets have integrated git into their product to provide a more complete DevOps lifecycle. 

Linus Torvalds will be the first to say that git has gone far beyond what he originally crafted long ago in April 2005. It has evolved with the contributions of many talented developers from around the world. I want to argue now that more than Linux, Linus Torvalds’s creation of git may have a larger impact on software development and systems than Linux in the end.

But that’s just one person’s opinion.

——————

Footnotes:

NB: I corrected two minor typos in our e-mail exchange and inserted some commentary into Linus Torvalds’s reply.

† and ‡ – Tom Lyon, co-founder of DriveScale was the key developer of both these technologies. Our paths have continually crossed throughout my career.

*  My marching orders at the start of the FlashRay project were the simplest I’d ever had in my entire career: “Get out of here and go build me a competitive All Flash Array. Don’t come back until you have one. And don’t do anything we’ve done before unless it helps you achieve that goal.” A blank sheet of paper, with the right team, can be very inspiring.

About the Author:

Brian Pawlowski has years of experience in building technologies and leading teams in high-growth environments at global technology companies. He is currently the CTO at DriveScale. Previously, Brian served as Vice President and Chief Architect at Pure Storage, focused on improving the user experience for the all-flash storage platform provider’s rapidly growing customer base. As CTO at storage pioneer NetApp, Brian led the first SAN product effort, founded NetApp labs to collaborate with universities and research centers on new directions in data center architectures.

Leave A Comment