Keeping up with data in
2002 Bob Bankay, Jeff Cobb, Eric
Person |
| Examples of
large-scale database modifications in
2002 |
| To keep up with such a fast rate of data accumulation, SETI@home
monitors and updates its data storage capacity on a regular basis.
For example, last year our Informix database underwent major updates
to handle our huge volume of spikes.
The graph below shows that we have processed and stored almost 5
billion spikes (as of January 2003).
The following are examples of large-scale database modifications
SETI@home performed in 2002:
Modification of Spike ID FieldEarly in 2002 our database
table for spikes had run out of assignable IDs; we hit the limit of
2 billion IDs used to identify each row in our spike table. This
limit was due to the fact that Informix uses a 32-bit integer by
default for a serial ID type. The solution was to define ID as
serial8 (64 bits long, or long integer) in a new spike table. A
major side effect of this change was an increase in row size, since
Informix fills out row lengths to 8-byte boundaries. Consequently,
we had to place the new spike table into a new, larger storage area.
To insulate the tables from on-line activity during the data
migration process, we stored all incoming results into temporary
storage for the duration (see Improved Flow Control Processsing
below). In approximately 2 weeks all of the spike data was migrated
to the new spike format. For the following week the new spike table
was renamed and tested against the old spike table to ensure that
the data had been moved correctly. Once testing was completed, the
new spike table was put into service.
Improved Flow Control ProcessingFlow control processing
involves storing incoming results into temporary storage during
database maintenance tasks, then moving these results into permanent
storage once maintenance is completed. Early in the project we could
simultaneously process both real-time returned results and
temporarily stored results without difficulty. However, in 2002 the
increasing volume of incoming results became large enough such that
we could not process results from temporary storage without
producing significant slowdowns for on-line users. Over a period of
a few weeks it was determined that a fixed number of results should
be accepted from temporary storage every 5 minutes if the queue for
the on-line results was sufficiently low. This solution resolved the
overloading problem.
Increase of Database Storage CapacityIn 2001 we distributed
the spike table among 8 Informix dbspaces. (A dbspace is a storage
allocation that can store up to a fixed number of data rows.) Later
in 2002 we found that growth of the number of stored spikes had
exceeded the capacity of the 8 dbspaces we had allocated. To solve
this problem we replaced the 8 dbspace spike table with one spread
across 16 dbspaces. We kept the old spike table because we need it
for data integrity
testing and signal archiving.
Improved Signal ArchivalNear the end of 2002, we improved
signal archiving by introducing a new table linking all results to
their source tapes. (Previously we needed to perform an expensive
query to access tape information for individual results.) This
modification accellerated our rate of archiving signals to tape and
deleting them from the online tables, keeping our database lean and
efficient.
For SETI@home, 2002 was a challenging year
for data storage and processing, and we expect new challenges to
arise in 2003 and beyond. As more data gets crunched, our ability to
discover signs of extraterrestrial intelligence gets better. Thanks
to everyone for their continued participation in this scientific
endeavor.
|
|