log/ posts/ Crossing an apt proxy with a mirror

Earlier today I watched the presentation Jonathan Oxer's gave during LCA about Package caching solutions.

Although it was certainly an interesting presentation and although I very much agree that my current local mirror is wasting a lot of diskspace and bandwidth, I'm still not going to switch from debmirror to any of the available caching solutions, because (unless I'm really missing something) none of them scratches my itch.

My local mirror currently consists of five architectures (i386, amd64, hppa, sparc and s390) and only has unstable and testing. I use it for:

  1. (fast and convenient) updating of my systems
  2. doing Debian Installer builds and installation tests
  3. test builds of installation CDs (using debian-cd)

Now, the last one is somewhat hard (debian-cd uses hardlinks to packages on a local mirror instead of retrieving packages), so let's concentrate on the first two.

Caching is great if you have a large number of machines – of the same architecture and that are all likely to need roughly the same packages – sitting behind the proxy: the first one triggers the download of the package and the rest gets it almost instantaneously. It is a lot less great if you have only one, maybe two systems per architecture: most of the time you'll still end up going down that (relatively) slow ADSL connection.

An important reason why I have my mirror is so that when I do my daily updates for sid or run an installation test, the packages are already available locally. I really don't want to double or even treble the time needed for installation tests just because some required packages aren't yet available locally and need to be downloaded over that slow line.

So, I have my partial mirror. Somewhat tuned (I exclude some ridiculously large debug packages for example, saving about 10GB), but still with a lot of junk^Wpackages on it I'll never ever use, especially for hppa, sparc and s390 as those systems only have fairly basic installations. Getting rid of that would significantly reduce my daily sync and allow me, for example, to also have a mirror of stable and oldstable, keep old versions of packages and probably still save a lot of diskspace.

Wishlist

What we should have is a hybrid solution: a program that will present itself as a proxy to clients, but is smart enough to pre-fetch new packages that are likely to be needed in a sync run, based on usage date from the the proxy and configuration settings.

Some ideas for features/configuration options it could have:

Possibly such an implementation could even be used on some of the lower tier Debian mirrors.

Unfortunately, unlike some of my esteemed colleagues, I'm not able to just whip up something like this, so I'm condemned to wait and see if there's someone else who'd like to pick up this idea. I am of course more than willing to help develop this idea further and to test it.

Now, if I've totally failed in my research and something like this already exists, a pointer in the right direction would be much appreciated.