Trove Design Document: Introduction

Next Previous Contents

The `classical' model of Internet software archive (exemplified by Sunsite, WWW frosting on an FTP cake) is no longer adequate to the increasing size and evolutionary speed of the open-source community. It eats too much maintainer time; the classification/search mechanisms are woefully weak; and the package namespace has no collision detection.

One of us (Eric Raymond) had been Sunsite's principal maintainer for more than a year before Trove got started. Eric wrote the keeper tool, which does about as good a job as possible of automating away the scutwork under the present system. It's not good enough. The amount of maintainer time Sunsite requires is rising to the point where the archive is not sustainable. On present trends, Eric thinks Sunsite's system (or its maintainers) will collapse by the end of 1998.

Some prominent Python people (including Ken Manheimer, Andrew Kuchling, and Guido Van Rossum) had realized for a while they were facing similar problems in the future of the Python archive, and begun discussing a redesign they thought of as the `locator' project.

The concept of the Trove project was originally floated by Eric Raymond in early April 1998. Within a week, he was approached by Guido van Rossum about joining forces. By the end of April, when the project and the Trove web pages were officially launched, principals included Ken Manheimer and Andrew Kuchling of the Python Software Activity. Ken Manheimer proposed the name `Trove'. John Cowan provided valuable expertise in database design and IR pragmatics.

1.2 Terminology

For purposes of this document, a resource is a file such as a source or binary archive, an RPM or Debian installable package, a documentat, etc. A resource may have associated metadata (such as a description of the resource).

Related resources will be grouped into a package, which will have associated metadata of its own (including but not limited to author's name, the project home page location, etc.).

The metadata exists to provide a handle on packages and resources, making them discoverable through searching and browsing facilities. Resources may have associated metadata of their own

A search is any selection operation that returns a subset of the archive metadata.

A site ring is a collection of Trove sites that mirror each others' metadata (so that a search of any is effectively a search of all).

Next Previous Contents

1. Introduction

1.1 Why Trove?

1.2 Terminology