Earlier this year I was at PyCon in the US. I had an interesting
experience there: people were talking about the problem of packaging
and distributing Python libraries. People had the impression that this
was an urgent problem that hadn't been solved yet. I detected a vibe
asking for the Python core developers to please come and solve our
packaging problems for us. I felt like I had stepped into a parallel universe. I've been using
powerful tools to assemble applications from Python packages
automatically for years now. Last summer at EuroPython, when this
discussion came up again, I maintained that packaging and distributing
Python libraries is a solved problem. I put the point strongly, to
make people think. I fully agree that the current solutions are
imperfect and that they can be improved in many ways. But I also
maintain that the current solutions are indeed solutions. There is now a lot of packaging infrastructure in the Python
community, a lot of technology, and a lot of experience. I think that
for a lot of Python developers the historical background behind all
this is missing. I will try to provide one here. It's important to
realize that progress has been made, step by step, for more than a
decade now, and we have a fine infrastructure today. I've named some important contributors to the Python packaging story,
but undoubtedly I've also did not mention a lot of other important
names. My apologies in advance to those I missed. The Python world has been talking about solutions for packaging and
distributing Python libraries for a very long time. I remember when I
was new in the Python world about a decade ago in the late 90s, it was
considered important and urgent that the Python community implement
something like Perl's CPAN. I'm sure too that this debate had started
long before I started paying attention. I've never used CPAN, but over the years I've seen it held up by many
as something that seriously contributes to the power of the Perl
language. With CPAN, I understand, you can search and browse Perl
packages and you can install them from the net. So, lots of people were talking about a Python equivalent to CPAN with
some urgency. At the same time, the Python world didn't seem to move
very quickly on this front... The Distutils SIG (special interest group) was started in late 1998.
Greg Ward in the context of this discussion group started to create
Distutils about this time. Distutils allows you to structure your
Python project so that it has a setup.py. Through this
setup.py you can issue a variety of commands, such as creating a
tarball out of your project, or installing your project. Distutils
importantly also has infrastructure to help compiling C extensions for
your Python package. Distutils was added to the Python standard
library in Python 1.6, released in 2000. We now had a way to distribute and install Python packages, if we did
the distribution ourselves. We didn't have a centralized index (or
catalog) of packages yet, however. To work on this, the Catalog SIG
was started in the year 2000. The first step was to standardize the metadata that could be cataloged
by any index of Python packages. Andrew Kuchling drove the effort on
this, culminating in PEP 241 in 2001, later updated by PEP 314: Distutils was modified so it could work with this standardized
metadata. In late 2002, Richard Jones started work on the Python Package Index,
PyPI. PyPI is also known as the Cheeseshop, a name I prefer but
apparently has been deprecated. The first work on an implementation
started, and PEP 301 that describes PyPI was also created
then. Distutils was extended so the metadata and packages themselves
could be uploaded to this package index. By 2003, the Python package
index was up and running. The Python world now had a way to upload packages and metadata to a
central index. If we then manually downloaded a package we could
install it using setup.py thanks to Distutils. Phillip Eby started work on Setuptools in 2004. Setuptools is a whole
range of extensions to Distutils such as from a binary installation
format (eggs), an automatic package installation tool, and the
definition and declaration of scripts for installation. Work continued
throughout 2005 and 2006, and feature after feature was added to
support a whole range of advanced usage scenarios. By 2005, you could install packages automatically into your Python
interpreter using easy_install. Dependencies would be
automatically pulled in. If packages contained C code it would pull
in the binary egg, or if not available, it would compile one
automatically. The sheer amount of features that Setuptools brings to the table must
be stressed: namespace packages, optional dependencies, automatic
manifest building by inspecting version control systems, web scraping
to find packages in unusual places, recognition of complex version
numbering schemes, and so on, and so on. Some of these features
perhaps seem esoteric to many, but complex projects use many of them. The problem remained that all these packages were installed into your
Python interpreter. This is icky. People's site-packages
directories became a mess of packages. You also need root access to
easy_install a package into your system Python. Sharing all packages
in a direcory in general, even locally, is not always a good idea: one
version of a library needed by one application might break another
one. Solutions for this emerged in 2006. Ian Bicking drove one line of solutions: virtual-python, which evolved
into workingenv, which evolved into virtualenv in 2007. The concept
behind this approach is to allow the developer to create as many fully
working Python environments as they like from a central system
installation of Python. When the developer activates the virtualenv,
easy_install will install all packages into its the virtualenv's
site-packages. This allows you to create a virtualenv per project
and thus isolate each project from each other. In 2006 as well, Jim Fulton created Buildout, building on Setuptools
and easy_install. Buildout can create an isolated project environment
like virtualenv does, but is more ambitious: the goal is to create a
system for repeatable installations of potentially very complex
projects. Instead of writing an INSTALL.txt that tells others who
to install the prerequites for a package (Python or not), with
Buildout these prerequisites can be installed automatically. The brilliance of Buildout is that it is easily extensible with new
installation recipes. These recipes themselves are also installed
automatically from PyPI. This has spawned a whole ecosystem of
Buildout recipes that can do a whole range of things, from generating
documentation to installing MySQL. Since Buildout came out of the Zope world, Buildout for a long time
was seen as something only Zope developers would use, but the
technology is not Zope-specific at all, and more and more developers
are picking up on it. In 2008, Ian Bicking created an alternative for easy_install called
pip, also building on Setuptools. Less ambitious than buildout, it
aimed to fix some of the shortcomings of easy_install. I haven't used
it myself yet, so I will leave it to others to go into details. The many improvements that Setuptools brought to the Python packaging
story hadn't made it into the Python Standard Library, where Distutils
was stagnating. Attempts had been made to bring Setuptools into the
standard library at some point during its development, but for one
reason or another these efforts had foundered. Setuptools probably got where it is so quickly because it worked
around often very slow process of adopting something into the standard
library, but that approach also helped confuse the situation for
Python developers. Last year Tarek Ziade started looking into the topic of bringing
improvements into Distutils. There was a discussion just before PyCon
2009 about this topic between various Python developers as well, which
probably explains why the topic was in the air. I understood that some
decisions were made: By 2008, Setuptools had become a vital part of the Python development
infrastructure. Unfortunately the Setuptools development process has
some flaws. It is very centered around Phillip Eby. While he had been
extremely active before, by that time he was spending a lot less
energy on it. Because of the importance of the technology to the wider
community, various developers had started contributing improvements
and fixes, but these were piling up. This year, after some period of trying to open up the Setuptools project itself, some of these developers led by Tarek Ziade decided to fork Setuptools. The fork is named Distribute. The aim is to develop the technology with a larger community of developers. One of the first big improvements of the Distribute project is Python 3 support. Quite understandably this fork led to some friction between Tarek,
Phillip and others. I trust that this friction will resolve itself and
that the developers involved will continue to work with each other, as
all have something valuable contribute. One point that always comes up in discussions about Python packaging
tools is operating system packaging. In particular Linux distributions
have developed extremely powerful ways to distribute and install
complex libraries and application, manage versions and dependencies
and so on. Naturally when the topic of Python packaging comes up, people think
about operating system packaging solutions like this. Let me start off
that I fully agree that Python packaging solutions can learn a lot
from operating system packaging solutions. Why don't we just use a solution like that directly, though? Why is a
Python specific packaging solution necessary at all? There are a number of answers to this. One is that operating packaging
solutions aren't universal: if we decided to use Debian's system, what
would we do on Windows? The most important answer however is that there are two related but
also very different use cases for packaging: The Python packaging systems described above primarily try to solve
the development use case: I'm a Python developer, and I'm developing
multiple projects at the same time, perhaps in multiple versions, that
have different dependencies. I need to reuse packages created by other
developers, so I need an easy way to depend on such packages. These
packages are sometimes in a rather early state of development, or
perhaps I'm even creating a new one. If I want to improve such a
package I depend on, I need an easy way to start hacking on it. Operating system packaging solutions as I've seen them used are ill
suited for the development use case. They are aimed at creating a
single consistent installation that is easy to upgrade with an eye on
security. Backwards compatibility is important. Packages tend to be
relatively mature. For all I know it might indeed be possible to use an operating system
packaging tool as a good development packaging tool. But I've heard
very little about such practices. Please enlighten me if you
have. It's also important to note that the Python world isn't as good as it
should be at supporting operating system packaging solutions. The
freeing up of package metadata from the confines of the setup.py
file into a more independently reusable format as was decided at PyCon
should help here. We are now in a time of consolidation and opening up. Many of the
solutions pioneered by Setuptools are going to be polished to go into
the Python Standard Library. At the same time, the community
surrounding these technologies is opening up. By making metadata used
by Distutils and Setuptools more easily available to other systems,
new tools can also more easily be created. The Python packaging story had many contributors over the years. We
now have a powerful infrastructure. Do we have an equivalent to CPAN?
I don't know enough about CPAN to be sure. But what we have is
certainly useful and valuable. In my parallel universe, I use advanced
Python packaging tools every day, and I recommend all Python
programmers to look into this technology if they haven't already. Join
me in my parallel universe! Update: I just found out there was a huge thread on python-dev about this in the last few days which focused around the question whether we have the equivalent of CPAN now. One of them funny coincidences... http://thread.gmane.org/gmane.comp.python.distutils.devel/11359
(8) Mon Nov 09 2009 15:53 A history of Python packaging:
Introduction
The dawn of Python packaging
Distutils
Metadata
PyPI
Setuptools
The problems of shared packages
Virtualenv
Buildout
Setuptools and the standard library
Distribute
Operating system packaging
Conclusions
- Comments:
Posted by Simon Michael at Mon Nov 09 2009 18:11
Very useful! Thanks Martijn.
Posted by anonymous@216.25.210.38 at Tue Nov 10 2009 21:24
I'll echo Simon. Great summary of an important and interesting topic.
Posted by Zach at Wed Nov 11 2009 15:44
I've always considered the lack of a CPAN equivalent to be both one of python's greatest weaknesses as well as its greatest strengths. I think I have this perspective because by day I'm a mild-mannered sysadmin, but under the cover of darkness I'm a hobbyist python programmer.You bring up that packaging actually has two different use cases, the developer and the packager. In the python world there is very little pain for the packager (who is usually a sysadmin.) The same can not be said for other languages (particularly perl and ruby.)I can't tell you the number of times I've spent several days sorting out dependencies in perl and ruby software. They're written with the assumption that the end-user will simple use CPAN or gems to install the dependencies, but if the end user is a sysadmin trying to setup a production environment, using CPAN or gems is a non-starter. Your app is not special enough to get special treatment, it's going to get crammed into an OS package so that I know that all my deployments are identical.Besides being very easy to package, python software rarely has a lot of external dependencies. Most python software I've ran has at most 1 or 2 dependencies. Most perl software has 5 or more. If I'm lucky I can use cpan2rpm to build my perl package, but that only works 60% of the time.I know that a CPAN equivalent is going to get written and included in python eventually, but I for one think our current situation is actually pretty sweet, and hope that it takes a long time to happen. Like the forced whitespace thing, this is another area where python introduces a little bit of pain to avoid the larger pain.
Posted by Martijn Faassen at Wed Nov 11 2009 19:03
Zach, you're implying that Python doesn't have a CPAN equivalent. So, what *is* a CPAN equivalent and what's missing from Python's infrastructure that I've described?I'll note that many of my Python apps have dozens of dependencies and that I use automated tools to install those. I like code reuse. With buildout one can use zc.sourcerelease to help build an RPM.
Posted by Zach at Wed Nov 11 2009 20:56
Martijn,Most of what it's missing is a blessed way to install packages automatically. Something equivalent to 'gem install mysql' or 'perl -MCPAN -e "install DBD::MySQL"'. There are several options for doing that with python today, but none of them are included with the python distribution. This creates confusion in the mind of casual users (for example, myself.)Most of the python I write is for work, and gets distributed in RPMs or through our configuration management system. Most of what I write at home is simple stuff that I don't release. I don't have the time or inclination to try 3-5 different tools and figure out which one I like, especially when none of them (almost) will create and/or install rpms, debs or openbsd packages.I was not aware of buildout, but if you can use it to build an RPM (what about other package formats?) that gives it a leg up on other languages. My main complaint about CPAN and gems is that it operates outside of whatever packaging system may already be in use, making repeatable deployments harder to produce.
Posted by Wes Winham at Thu Nov 12 2009 00:24
The two big pains I have with current options are with upgrading python packages and uninstalling python packages. Perhaps there's already a good upgrade solution, but I thought I looked pretty hard for a good uninstall solution and couldn't find one. Setuptools does some really cool stuff and I love using it, but a package management system that doesn't do installation is missing a pretty big piece of what I'd classify as core functionality. I really appreciate the work being done on Distribute and pip to solve these problems.Without being able to remove packages, I wouldn't call python packaging solved, but it's certainly moving in that direction at a good clip. Of course, everyone has a different definition of solved. If someone has never needed to remove a package to use a different version or they've never needed to install packages from different tags in different SCMs, packaging might have long since been solved. For the Pinax guys, for example, using just setuptools for distribution would have been quite a pita.
Posted by Kevin Dangoor at Thu Nov 12 2009 04:25
Wes: pip 0.6 was just released a day or two ago and adds the uninstall feature.
Posted by Martijn Faassen at Thu Nov 12 2009 04:53
easy_install has had an -U option to upgrade for a long time (but not uninstallation).I use buildout a lot and the installation/uninstallation/upgrade issue doesn't really exist for it. Instead you simply describe what packages you want in setup.py, potentially locking down versioning numbers in buildout.cfg, and (re)run bin/buildout. It will install and uninstall and upgrade whatever packages are needed according to this description (and the dependency information in this package).These packages are not shared with all Python programs, but are only available within that buildout. You can share packages between buildouts for caching reasons, but this needs to be done explicitly and a package installed into one buildout isn't installed into another one automatically.
