Because SETI@home is run by millions of users using many different types of computers, we
often get asked how we know that everyone is getting the right answers when they process
data for us. There are several reasons why a result returned by a SETI@home volunteer
might be incorrect. The most common reason we get incorrect results is processor
malfunction. If a processor overheats, perhaps because there is dust buildup inside the
machine, or maybe it's just a really hot day, the first part of the chip to fail will be
the most complex part, the floating-point unit. A failure of the floating-point unit,
which is responsible for most of the calculations performed by SETI@home, will usually not
cause a computer to crash. It will cause the computer to generate incorrect results. These
innocent failures are responsible for most of the incorrect results we see. The most
common symptom of this problem is that every result from malfunctioning computer contains
hundreds of potential signals. Of course, some valid results also contain hundreds of
signals.
Fortunately, most incorrect results of this type contain values that could not result
from a correct SETI@home calculation. By checking that the parameters of a signal are
within the allowed bounds we can exclude most signals of this type before they cause any
confusion.
There are also a few irresponsible people who are running hacked versions of the
SETI@home client that also send back bad results. Usually these results have no detected
signals at all. If someone sends back thousands of results, but never finds anything, even
test signals, we get suspicious.
There's a third type of incorrect result that occurs, too. Sometimes, very rarely, a
computer will get the wrong answer to a calculation for no apparent reason. This appears
to happen about one out of every 3,000,000,000,000,000,000 calculations. If you let your
computer run SETI@home for a thousand years, it would get the wrong answer once. (Of
course by then your computer would have failed for some other reason). But since SETI@home
gets a thousand years of CPU time every day, we see one or more of these failures per day.
Because these errors happen, it's good to have a check of the results. Fortunately,
SETI@home has enough volunteers that we can process each piece of data more than once and
compare the potential signals detected by different computers to each other. We use the
result of the comparison to rank our results by how confident we are that they were
processed correctly. The possible outcomes of the comparison of a signal are:
1. We mark the signal as fully verified if 60% or more of the results for this work
unit contain a matching signal.
2. If the signal cannot be verified we mark the signal as unverified. This can happen
for two reasons. Early in the project, when we had fewer users, we were unable to process
every work unit multiple times, so some early work units cannot be verified. There are
also many work units that were processed by more than one version of the SETI@home client.
More recent versions include analysis that was not present in the early versions, so
certain signals will only be found with new versions.
3. If a signal is present in more than one of the compared work units, but less than
60%, we mark it as questionable.
4. If a signal is present in only one work unit, but should have been detected in
others, we mark it as an incorrect signal.
Using the results of this comparison we assign each result for a workunit a numerical
score. Based upon this score, we choose the best one and copy it to our master database,
where it will be examined in further stages of the SETI@home data analysis. (Don't worry,
everyone who processed the work unit still receives credit for having processed it, and
will share in the credit should we discover E.T.)
So far we've run the results from 327 tapes through the result verifier. This
represents 47.5% of the SETI@home database.
The verification scores will be used in later processing when choosing potential
candidate signal, those that are fully verified will be given higher priority than those
that cannot be verified. Those that are marked as incorrect will not considered further.