This feed contains pages in the "debmirror" category.
The Debian FTP-masters recently changed the way gzipped meta files are
compressed in order to make them more efficient to update using the rsync
option. This was done by adding the --rsyncable option when calling gzip.
Consequence was however that when debmirror compressed Packages, Sources
and Contents files after updating them by applying diffs, the md5sum of the
gzipped file created by debmirror no longer matched the md5sum listed in
the Release file (because debmirror did not use --rsyncable).
Result was that debmirror would also download the full gzipped Packages,
Sources and Contents files from the parent mirror, something the diffs are
meant to avoid. Not nice.
Anyway, this has been fixed in debmirror 2.4 which now by default also
uses --rsyncable when gzipping the updated meta files.
I've also uploaded a fixed version for Lenny (20070123lenny1), which should
soon be available from proposed-updates and will be included in the next
stable point release.
For archives that also provide diffs (most archives don't have them) but do
not have rsyncable gzipped files, the default options used when calling gzip
can be overruled using the new option --gzip-options (only in version 2.4).
Tip: if you are using the rsync method to download files, using
--diff=none may well be more efficient now that the archive has rsyncable
gzipped meta files.
Version 2.4 also has a few other improvements and fixes. If you're currently using version 2.3.x an update to the new version is probably a good idea.
Posted Sat Dec 19 22:39:04 2009debmirror 2.3 should be hitting the mirrors about now. Main change is that
it will now use the available diffs to update Contents files, which should
give a nice bandwidth reduction for users who mirror those.
With that the option --pdiff (for "package diff") no longer really covered
its function, so I decided to change it to --diff.
There's also a fix for mirroring archives that don't have a Release file.
Question for users
The option --add-dir has been marked as deprecated (for quite some time now I
suspect). I'm considering to remove it in the next release as I cannot see any
use cases for it, but it's quite possible I'm missing something and there are
still people using it. If you would like that option preserved, then please
mail me at debmirror@packages.d.o with an explanation of why and how you
use it.
Managing the size of a local mirror
The archive has grown a lot over the past Debian releases and keeping even a
partial local mirror can require quite some disk space. Luckily debmirror
offers quite a few options to tune what is mirrored.
My own mirror covers testing and unstable 'main' for 6 architectures (i386, amd64, armel, hppa, sparc and s390), no source, no D-I images. It uses only 61G. I say "only" as that's about 33GB less than it could have been without tuning. In other words, I'm saving a bit more than one third!
Here are the options I added to achieve this:
--exclude-deb-section='^debug$'
--exclude='/(xen-)?linux-[a-z]+-2\.6[.0-9]*-[-[:alnum:]]*(openvz|vserver|xen)[-[:alnum:]]*_'
--exclude='(k/kde|g/gnome|o/openoffice\.org).*/.*_(armel|hppa|s390)\.deb'
--exclude='(a/axiom/|d/debian-edu-doc/|e/ember(|-media)/|e/eclipse(/|-))'
--exclude='(e/erlang|g/(gcl(cvs)?|ghc6)/|l/llvm(/|-)|p/paraview/|o/openturns/)'
--exclude='(s/scalapack(-doc)?/|f/festvox-|g/gcc-snapshot/)'
--exclude='(/acl2-books_|/digikam-doc_|/fluid-soundfont-gm_|/deal.ii-doc_)'
--exclude='(/libxmpp4r-ruby-doc_|/lilypond-doc_|/qt4-doc_|/vtk-doc_)'
--exclude='/i18n/Translation-.*\.bz2' --include='/i18n/Translation-(nl|de)\.bz2'
And the explanation is:
- I rarely use debug packages and they are relatively big; if I do need one I'll download it manually from a remote mirror.
- I don't run vserver or xen kernels (and if I did I'd probably compile custom kernels anyway). I do want "regular" kernels because of D-I work.
- I doubt I'll ever want to install KDE, GNOME or OpenOffice on my armel, hppa or s390 boxes, but I do want them for the other three arches.
Selected individual (mostly scientific) source packages that I doubt I'll ever use but use up significant disk space (and bandwidth when updated). These were found by a simple:
du -s pool/main/*/* | sort -rn | head -n 50Selected individual huge binary packages (mostly documentation), found using:
du -s pool/main/*/*/*.deb | sort -rn | head -n 50I'm only interested in Dutch and German translations of package descriptions. Well, actually I'm not even interested in those, but it's useful to have them for testing
debmirror.
Obviously I have nothing against any of the packages that I exclude. It's just that I don't need them.
Posted Sat Oct 3 18:09:08 2009I've just uploaded version 2.2 of debmirror, which introduces yet another
new feature: mirroring the i18n/Translation files that contain translations
of package descriptions. Many thanks to Joerg Jaspert for his quick response
to my request to include those files in the
Release file.
Joerg also implemented the change needed to use the diffs for Contents files
but that requires a fairly big code restructuring in debmirror.
The package has jumped from version 1.0 to 2.2 in just three weeks (closing 28 bug reports in the process), but I think the changes justify that. Here's an overview.
Automatic creation and update of
suite->codenamesymlinks (1.0)This also means it no longer makes any difference whether you tell
debmirrorto mirrorsidorunstable.Option to cache the mirror state between runs (2.0)
This significantly reduces the trashing of the hard disk during mirror updates and cleanup, and improves the efficiency of individual runs.
The disk trashing has always been the main reason I did not want to do more than one update per day for my local mirror. Now it hardly matters how many runs I do: almost everything is done based on the cache data.
To ensure the mirror stays consistent the cache has a (configurable) maximum life time after which a full check of the mirror will be done, if desired including an md5sum check of all files.
Significant speed increase for parsing
Packages/Sourcesfiles (2.0)For my mirror that stage now takes seconds rather than minutes. Additional speed increases should be possible in the stage that fetches the
PackagesandSourcesfiles.Mirroring of "current" Debian Installer images (2.0)
Which architectures and suites should be mirrored can be specified independently from the rest of the mirror.
Mirroring additional files from specific directories (2.1)
This allows mirroring of "trace files", of the contents of the
./docand./toolsdirectories (which are needed if you want to create CD images usingdebian-cd, and of the./indicesdirectory.The transfer method used for this is always
rsync, independent of the transfer method used for the rest of the archive. This is a restriction, butrsyncis also the only usable option for files for which no real index or checksums are available.Mirroring translation files (2.2)
As
debmirroris primarily intended to be used for local, often partial, mirrors, it is of course possible to mirror only selected languages. Interested only in German and French translations? Simple, just use:--i18n --exclude='/Translation-.*\.bz2$' --include='/Translation-(de|fr).*\.bz2$'I've used '
(de|fr).*' so that also country-specific variants (e.g.fr_FR) will be included.
If you're currently using the Lenny version of debmirror and would like to
use the new features: the package from unstable can be installed on Lenny
without any problems. The changes have been well tested, but I would advice to
do use --dry-run after the upgrade to check there are no unexpected problems.
One area where you may experience problems is when using debmirror for
other archives than the official Debian mirrors. If you do encounter issues
then please file a bug report.
Note that debmirror is not intended to be used for official mirrors.
There are different scripts
available for that from the Debian mirror team.
(Yeah, I've already had one post about debmirror, so one could argue this should be II, but a II without a I is also strange.)
In my second upload I've changed the package versioning from date-based to a more regular major.minor.bugfix, so, after a third bugfix upload we're now at 1.0.1. And with a nice set of changes too.
A nice bonus is a new script that gives a quite detailed overview of our archive size, at least a lot more detailed than this. It currently requires manual formatting, but if there is interest that could be coded and the script could be run cronned, on merkel for example.
Main changes in version 1.0
- Automatically create and update suite->codename symlinks based on info in the Release file. Directories for dists will always have the codename of the release. Conversion of existing mirrors that use suites for directories is supported.
- No longer keep uncompressed Packages files on the local mirror, similar to the official mirrors.
- Don't fetch (the huge) Contents files if they're unchanged. This is a significant improvement, but hopefully debmirror can soon support the diffs #436027.
Work in progress
I've also started work on a few new features.
Cashing the state of the mirror
Debmirror has two places where it's quite slow and where it trashes the hard disk a lot:
- when it checks md5sums for all files listed in Packages and Sources files;
- when it cleans up obsolete files.
This wishlist BR (#483922) has the solution:
to cache the state of the archive. After all, other than for meta data, 1) is
not really needed as nothing is normally going to change packages and source
files that have been downloaded. And 2) can be done much more efficently if
you already know what you need to do than when you have to run find over
the whole archive.
After thinking about it a bit, the implementation turned out to be quite easy, and I now have a version that I'm ready to try on my own local mirror. After that I'll just need to add a few bells and whistles, so expect this in version 1.1.
Activating the cache will be through --state-cache-days=<N>. It seems
wise to periodically do a full check of the mirror (the current mode of
operation). The <N> does just that. Whether to do it every 7 or 28 or
350 days is up to the user (I would suggest 7 or 14).
Mirroring Debian Installer images
This is a very old wishlist item #154966, but actually quite straightforward to implement as D-I does include index files with md5sums with its images. So I'll give that a shot soon.
Mirroring the tools/ and doc/ directories will be harder as they currently
lack index files with md5sums.
Various
I also have a couple of branches that need further thought and work.
- Improved accounting of download size. Quite complex.
- More generic support for subsections (such as main/debian-installer).
A few weeks ago I volunteered to adopt debmirror. It's a package I've been using for a long time myself and I expect maintaining it will give my (currently very basic) perl skills a boost.
The package is not in bad shape. It mostly just does what it needs to do, but there are a few interesting (wishlist) bug reports.
I've already done one upload (just migrated to testing) with mostly minor changes, mainly modernizing the packaging (using the magic 'dh' command). I'm now preparing another upload with the results of initial bug triaging and some general improvements.
As there already was an alioth project for debmirror, I've started using that. My changes can be seen in subversion, although I have some work-in-progress that I keep in a local git-svn checkout.
One wishlist bug concerned adding support to download the translations of package descriptions. Problem there is that debmirror currently only really supports mirroring files that are listed in the Release file (either directly or indirectly) and the i18n files are currently not listed.
But that may soon be fixed if the FTP-masters accept my patch to dak which adds an Index file for the translation files and lists that file in Release. After that supporting the package translation files in debmirror should be fairly straightforward.
So with that I've doubled the number of packages I maintain: from 1 to 2 