Upload
israel-herraiz
View
387
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Some comments about the sources of data stored in the Ultimate Debian Database
Citation preview
The Ultimate Debian
Database Israel Herraiz
Davis, CA, July 26th 2012
Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
1 / 25
Outline
1. Debian: what is it and sources of data
2. The UDD: what is it and where to get it
3. What has been done and what we can do
2 / 25
1. Debian: what is it and
sources of data
3 / 25
Debian
• GNU/Linux software distribution
• Goal: to deliver an entirely and exclusively free
distribution
• Maintained by volunteers
• Bureaucratic organization (policies, constitution,
social contract)
• Release when ready
• > 10 years history
• > 500 MSLOC
• > 15k packages
4 / 25
Debian Releases
5 / 25
6 / 25
Debian Source Packages
7 / 25
Source and Binary Packages
• A source package generates one or more binary
packages
octave
octave-core
octave-doc
liboctave
liboctave-dev
8 / 25
Package uploads
• There are no repositories like in other software
projects
• Although developers may privately use version
control systems
• When a bug is fixed, a new version is uploaded
• Uploads == commits
9 / 25
Source: octave
Section: math
Priority: extra
Maintainer: Debian Octave Group <[email protected]>
Uploaders: Thomas Weber <[email protected]>, Sébastien Villemot
DM-Upload-Allowed: yes
Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….
Standards-Version: 3.9.3
Homepage: http://www.octave.org/
Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git
Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git
Source Packages metadata
10 / 25
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <[email protected]>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
Binary Packages metadata
11 / 25
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <[email protected]>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
Binary Packages metadata
12 / 25
Debian Popcon: Tracking Installations
• Popularity: total
install counts
• Recent Use (< 30
days)
• Old Use (Beyond 30
days)
• Data collected daily
• Users voluntarily opt-
in
• Source of bias
13 / 25
Debian Bugs
• People find bugs in binary packages
• ~500 bugs per month
• But bugs are linked to source packages
• Bugs can be
• Accepted and solved in Debian
• Rejected
• Forwarded to upstream
• Everything else, similar to other bug tracking
systems
• Life cycle, comments, severity levels…
14 / 25
2. The UDD: what is it and
where to get it
15 / 25
Research work: main paper (at MSR 2010)
16 / 25
Other papers at MSR 2010
17 / 25
What is the UDD?
• PostgreSQL database with all the information of
the sources described so far
• http://udd.debian.org
• New dumps available every two days
• ~ 500 MB bz2
• Used for some Debian internal services
• Schema too complex and too big for a slide
• Technical detail: you need a Debian-based
system to load the dump of the UDD
18 / 25
Debian sources of data
• Sources / Packages
metadata
• Bugs
• including *all*
archived bugs
• 1995-96-97
• Carnivore
• Debtags
• Popularity Contest
• DEHS
• Lintian
• Migrations to testing
• Uploads
• All the way back to
1998!
• New packages queue
• Translations status
• Orphaned packages
• Screenshots
19 / 25
!
20 / 25
Bear in mind!
• You can also obtain the source code of the
packages
• Easy to automate
• And the modifications done by the Debian
maintainers
• So add product metrics to the set of data
sources
• But this is not included in the UDD
21 / 25
3. What has been done and
what we can do
22 / 25
What kind of questions does Debian solve with the
UDD?
• High priority packages that have Release
Candidate blocker bugs
• Developers with very buggy and/or outdated
packages
• Who uploaded this package to the unstable
release?
• Who reported the RC bugs since the last
release?
23 / 25
Some questions solved in the literature
• The popularity bias
• http://oa.upm.es/9585/
• Open source projects get more bug reports if
they are popular
• The actual number of bugs is not related to the
number of bugs reported
• So more bugs actually means more quality
• Well, at least more people who decide to use the
software
24 / 25
The popularity bias
Lo
g(B
ug
s)
Log(installations)
Required packages
25 / 25
Summary
• Packages and sources metadata
• And source code
• Bugs
• All the way back to 1995-96-97!
• Popularity contest
• Maintainers activity (uploads)
• All the way back to 1998!
• And much more….
• Now, what do you think we can do with this?