Supplement on Parity and Error Correcting RAM




Parity Memory vs. ECC Memory

Sometimes, when you write a byte of data to RAM and later read it back, the eight bits that come back are not all identical to those you wrote. RAM is more reliable now than a few years ago, but when you have a multi-user system, it pays to avoid system hangs and crashes.

If the probability of an error is one in a hundred quadrillion, and if the memory system is running at 10 MHz (100 nanosecond), and if you have 125 Megabytes of RAM (1 billion bits), then you would expect on average to see one single-bit error every ten seconds and one double-bit error every thousand quadrillion seconds (somewhat more than the age of the universe). That is why ECC memory is worth using, and why it is designed to detect but not correct double-bit errors.

The above calculation of the probability of double-bit errors is optimistic, in that it assumes that the errors are all "statistically independent," that is, that there will not be single events that cause simultaneous multiple-bit errors. For example, a failure of the tiny wires (inside the integrated circuit chip's carrier) that connect the DC power from the circuit board to the chip itself will cause all of the bits stored on that chip to fail. By allocating the various bits of each byte to different chips, it is possible to reduce the vulnerability of the RAM to such errors.

Modern RAM chips store each bit as a small electric charge (or the absence of a small electric charge). Ionizing radiation resulting from cosmic rays or the radioactive decay of trace contaminants of the chip or its surrounding carrier can alter the stored value. High voltages, whether resulting from static electricity during improper handling or from transient events such as lightening strikes nearby, can also damage integrated circuits, either permanently or temporarily.


Error Correction Codes

Repeating each bit four times is the error correction code that is simplest to describe that can detect and correct single-bit errors, and can detect double-bit errors without confusing them with single-bit errors. For example:

Data:      0110

Encoded:   0000111111110000
If any one bit changes, there is no question as to the original value, so it is possible to report the correct value for each bit automatically. If, on the other hand, two bits change within the same group of four, then you cannot tell by inspection which two have changed, and so you cannot tell what the correct value is, but you can tell that something is wrong.

This is a very expensive coding, requiring four times as much physical RAM as the data itself. Sophisticated mathematical analysis demonstrates that much cheaper approaches are possible. Real ECC memory uses a much less expensive encoding, using 39 bits to encode 32, to provide just enough redundancy to detect double-bit errors and to correct single-bit errors.


Return to MIS 202 Page   Return to EDI Supplement

Dick Piccard revised this file (http://oak.cats.ohiou.edu/~piccard/mis300/eccram.htm) on October 27, 1998.

Please E-Mail comments or suggestions to "piccard@ohio.edu".