[ http://www.w3.org/Search/9605-Indexing-Workshop/Papers/Akscyn@KS.html ]


The PetaPlex Project is a project funded by the US Intelligence
Community to develop feasible architectures for very large-scale
digital libraries -- to meet the future needs of the community and
those of large-scale commercial applications.  The specific goals
targeted in the current phase of the project is to develop an
architecture capable of scaling to 20 petabytes on-line with subsecond
response time to access random, fine-grained URN-specified objects,
at a sustained rate in excess of 30 million tranactions per second.

The current statement of work calls for integrating one million, 20 Gb
disks into a coherent system that can attain these performance
objectives --- at acceptable cost.  To achieve this level of throughput,
the current prototype resolves URN's -- finds, fetches, and
displays/executes -- in a single packet round-trip and a single seek.
To achieve cost feasbility, the architecture is "massively simple" -- it
consists only of simple, commodity-cost, COTS technologies that
enable near-automatic construction and maintenance of the system.

A principal part of the architecture involves the full-text search of the
hypermedia-structured database for many concurrent searches, on the
order of 100,000 on-going searches at any time.  The scheme being
explored is highly-parallelized, both for the incremental maintenance
of the indexes, conducting searches, and storing results in persistent
and accessible form.

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.