Programming language statistics at Sourceforge · Thursday April 7, 2005

To start of this blog I wanted to follow up on Jarno Virtanen’s suggestion to look at Python’s growth at souceforge. In fact, I did exactly this in december 2001 . To quote from that thread:

...it was taken for granted that Python is growing rapidly in popularity, raising the question of why Python is not used more in commercial enterprise. Some suggested that this might be the result of some conservatism in business that is the result of interference from clueless managers. My suggestion would be that Python is not growing nearly as rapidly as some might think and/or that commercial enterprises may not necessarily be more conservative than open source developers.

I derived that conclusion by comparing the number of projects for different languages at sourceforge. I assumed that, in their free time, developers will choose what, to them, is the most appropriate tool for their project. Most appropriate may of course mean ‘most fun’ or ‘best’ or ‘known’ or whatever. However, I would think that if developer preference was reflected in anything, then it would it would be in the choice they make on how to spend their free time. In short: sourceforge project statistics measure real developer preference.

Sourceforge stastics can be obtained by screen-scraping the souceforge software trove pages. For example, if we look at the Programming Languages category, we find:

C 14613 projects
C# 2304 projects
C++ 15131 projects
Java 14520 projects
Perl 5728 projects
PHP 10731 projects
Python 3894 projects

These are the languages I will investigate in detail. Some other notable languages are VB (2061 projects), Delphi (1751), Javascript (2253) and Assembly (1520!).

The software trove gives an overview of each project (example). I will use the following statistics:

To start off, let’s look at the total number of projects in each language as a function of date:

total number of projects at sourceforge

Except for C#, Python is the smallest language considered here. There are approximately equal numbers of C, C++ and Java projects. PHP has the largest number of projects of the three Ps.

All languages grow rapidly. This is mostly due to the growth of sourceforge itself of course. This is even more obvious if we look at the growth rate of each language:

Growth rates are continously declining, presumably asymptotically to the rate of growth of the language as an open-source development tool.

To account for the growth of sourceforge, let’s compare the number of projects for each language to the number of C projects:

Nothing earth-shattering (yet). Especially Java and PHP seem to grow quite fast (wrt C).

Let’s compare the growth rates of the different languages to the growth rate of C.

We can draw some interesting conclusions from this figure:

More later.

Comment [3761]

* * *