MicroVAX CPU parity error
I have a MicroVax in a BA23 case. It has a RD54 disc with 160 MB, and a TK50 tape drive. Also a raster video card is built in. This is the card cage:
The case label says “MicroVAX II/GPX”, but since the CPU is KA650, it is a uVAX 3200/3400/3600. (reference). DTJ 7 states that KA650 is for MicroVAX 3500/3600.
Other stickers say: Model: 6300V-B3, SN: AY 81901335.
Bad documentation, but enough other KA6xxx CPU’s
On the KA650 CPU is a CVAX 78034 VAX CPU chip ... the one Bob Supnik developed in his time at DEC.
Despite I searched a lot, I did not found any technical description for the KA650 CPU. vt100.net lists “EK-180AB-MG KA650 CPU SYS MAINTENANCE GUIDE” and “EK-KA650-UG KA650 GUIDE”, but has none of them. So I had to use lots of similar CPU documentation, as for KA640, KA655, KA660, KA680. This puts my further conclusion on an instable ground.
Luckily I found all schematics for the KA650 in a document called “MP02538 650QS Pedestial BA213 Field Maintenance Print Set”.
And the KA650 CPU and its cache is described in “Digital Technical Journal Number 7” (DTJ 7)
Self test on boot:
Did I told you? My MicroVAX has an error:
At boot, it displays:
Performing normal system tests.
?05.50 2 0C FE 04 0000
10000000 10012000 00002000 00000000 00000000
00000000 00000000 00000000 1000B4F8 00000000
1000B500 55555555 55555555 AAAAAAAA AAAAAAAA
00000960 10000000 AAAAAAAA 00002000 80C00040
Normal operation not possible.
I decoded this error as follows, according to :
The first line “?05.50 2 0C FE 04 0000” means this:
- "05.50" is the number of the test that bombed.
A list of test is printed with “>>>test 9e”. This lists “05.50” as
“05 50 6760 Cach2_integrty start_addr end_addr addr_step *******”
So the cache is the problem.
- "2" is the severity factor.
"2" causes the register dumps to be displayed and the autoboot prohibited.
"1" just prints this error message line, and doesn't disables the autboot functionality.
- "0c" "error" is a number, that in conjunction with listings files, isolates to within a few instructions where the diagnostic detected the error. This field is also called subtestlog.
- "FE" "de_error" is the code of the error found.
FF: normal error exit form diag,
FE: unanicipated interrupt,
FD: interrupt in cleanup mode,
FC: interrupt in interrupt handler,
FB: test script requirements not met,
FA: no such diagnostics,
EF: unanticipated exception in executive.
- "04" "vector" is the SCB vector (if non-zero) through which an unexpected exception or interrupt trapped, when the de_error field indicates an unexpected exception or interrupt (FE or FF)
“0000" "count" is the number of previous errors encountered
Line (2): P1..P5 are the first five longwords of the diagnostic state.
This is internal information that is used by repair personnel.
Line (3): P6..P10 are the last five longwords of the diangostic state.
Line (4): R0..R4 are the first five GPRs ate the moment the error was detected
Line (5): R5..R8 are additional GPRs and ERF is a diagnostic summary longword
The last 32 bit value is ERF and very important. I use KA655 documentation,“EK-306A-MG-001 KA655 CPU System Maintenance”, page 4-33. The KA655 has a “SOC” chip, which is a CVAX 78034 CPU, CFPA floating point processor, clock and 8KB second level cache combined. I hope also it’s ROM-based diagnostics are close enough to my KA650.
Here ERF=80C00040, also 82000180 and 80c00000
This seems to indicate a CDAL parity error on the KA650 CPU. “CDAL” are the “CVAX Data and Address Lines”, it is the multiplexed CPU front end bus. Interface to QBUS 22 is then through the “QBIC” chip, interface to memory boards is through the “MEMCTL” chip. The second level cache is build with discrete memory and 74Fxxx chips. Interface between CDAL and second level cache is through an port of five bidirectional 74F544 latches. Also connected to CDAL are some small on-board peripherals, as serial ports, LED regsiters etc.
Trying to repair
I had no clue what to do. I changed a few cache driver chips, but the bug was not influenced. I Even made a comparator adapter for running a test memory chip above with the built-in chips. My idea was: if a cache memory chip is defective, I will see differing signals between the output of the original and the reference chip. Lets call it the "Run-Reference-Chip-Parallel-Adapter ("RRCPA")!
But in practice the signals where quite to complex to get compared, and I did not trusted my RRCPA at thes high operating frequencies of > 10MHz.
Later I read int DEC Technical Journal 7, that the uVAX2 CPU design ist very compact for cost-reasons. They explicitly state that the source of an local bus parity error can not be traced to some component.
As usual, their repair strategy is "change part and throw it away".
THE 2nd KA650 is good
Just as I needed it, I found a KA650 on eBay.com. It was just $50 + $20 for shipment. It arrived after four weeks, and it was completly working. So once more, a big problem could be solved by a small deal.
Good for the VAX, but bad for my pride!