CRAY-1 Hardware Reference Manual

The source manual was loaned by DigiBarn. It is the first 3 chapters of the CRAY RESEARCH CRAY-1 Hardware Refernce Manual 2240004, Revision C - Conversion to HTML was done by Ed Thelen.

A .pdf version of the complete manual (204 pages) is now available from

www.bitsavers.org - 5.3 megabytes :-)) Jan 2007

Return to On-Line-Documents

CRAY-1
COMPUTER SYSTEM^®

HARDWARE REFERENCE MANUAL
2240004

Copyright©1977 by CRAY RESEARCH, INC. This manual or parts thereof may
not be reproduced in any form without permission of CRAY RESEARCH, INC.

RECORD OF REVISION PUBLICATION NUMBER

Revision Print Date Description
1/76 Original printing
A 5/76 Reprint with revision
A-01 9/76 Corrections to pages 3-20, 3-27, 4-9, 4-10, 4-28, 4-36, 4-43, 4-55, and 4-57.
B 10/76 Reprint with revision. Addition of: Floating point range error detection Vector floating point error Error correction
B-01 2/77 Changes to exchange package (p 3-36); additions to instructions 152 and 153 (p 4-53); corrections to syndrome bit description p 5-5; corrections to instruction summary, appendix D.
B-02 7/77 Corrections and changes to pages xi, 2-3, 3-19 through 3-28.1, 3-31, 3-34, 3-36, 3-38, 4-14 through 4-17, 4-54, 4-68, 5-1, 5-3, 5-4, 5-6, 6-2, A-4, D-1 through D-4.
C 11/77 This printing obsoletes revision B. Features added include 8-bank phasing and I/0 master clear procedure. Chart tape reflects only changes introduced with this revision.

Each time this manual is revised and reprinted, all changes issued against the previous version in the form of change packets are incorporated into the new version and the new version is assigned an alphabetic level. Between reprints, changes may be issued against the current version in the form of change packets. Each change packet is assigned a numeric designator starting with 01 for each new revision level. Every page changed by a reprint or by a change packet has the revision level and change packet number in the lower right- hand corner. All changes are noted by a change bar along the margin of the page. Requests for copies of CRAY RESEARCH, INC. publications should be directed to: CRAY RESEARCH, INC. 7850 Metro Parkway Suite 213 Bloomington, MN 55420

CONTENTS

1. INTRODUCTION 1-1
COMPUTATION SECTION 1-4
MEMORY SECTION 1-5
INPUT/OUTPUT SECTION 1-5
VECTOR PROCESSING 1-6
2. PHYSICAL ORGANIZATION 2-1
INTRODUCTION 2-1
MAINFRAME 2-1

Modules
2-1

Printed circuit board
2-4

Module assembly
2-5

Integrated circuit packages
2-5

IC high-speed logic gate
2-5

IC slow-speed logic gate
2-5

16x1 register chip
2-5

10241 memory chip
2-6

Resistors
2-6

Connector strips
2-6

Clock
2-7

Power supplies
2-7
PRIMARY POWER SYSTEM 2-8
COOLING 2-8
MAINTENANCE CONTROL UNIT 2-9
FRONT-END COMPUTER 2-10
EXTERNAL INTERFACE 2-10
MASS STORAGE SUBSYSTEM 2-11
3. COMPUTATION SECTION 3-1
INTRODUCTION 3-1
REGISTER CONVENTIONS 3-3
OPERATING REGISTERS 3-3

V registers
3-4

V register reservations
3-5

Vector control registers
3-6

VL register
3-6

VM register
3-6

S registers
3-7

T registers
3-8

A registers
3-8

B registers
3-9
FUNCTIONAL UNITS 3-10

Address functional units
3-11

Address add unit
3-11

Address multiply unit
3-11

Scalar functional units
3-12

Scalar add unit
3-12

Scalar shift unit
3-12

Scalar logical unit
3-13

Population/leading zero count unit
3-13

Vector functional units
3-13

Vector functional,unit reservation
3-13

Recursive characteristic of vector functional units
3-14

Vector add unit
3-17

Vector shift unit
3-17

Vector logical unit
3-17

Floating point functional units
3-17

Floating point add unit
3-18

Floating point multiply unit
3-18

Reciprocal approximation unit
3-18
ARITHMETIC OPERATIONS 3-19

Integer arithmetic
3-19

Floating point arithmetic
3-20

Normalized floating point
3-20

Floating point range errors
3-21

Floating point add unit
3-21

Floating point multiply unit
3-22

Floating point reciprocal approximation unit
3-22

Double precision numbers
3-23

Addition algorithm
3-23

Multiplication algorithm
3-24

Division algorithm
3-28
LOGICAL OPERATIONS 3-29
INSTRUCTION ISSUE AND CONTROL 3-30

P register
3-30

CIP register
3-31

NIP register
3-31

LIP register
3-32

Instruction buffers
3-32
EXCHANGE MECHANISM 3-35

XA register
3-35

M register
3-35

F register
3-36

Exchange package
3-36

Active exchange package
3-39

Exchange sequence
3-39

Initiated by dead start sequence
3-40

Initiated by interrupt flag set
3-40

Initiated by program exit
3-40

Exchange sequence issue conditions
3-41

Exchange package management
3-42
MEMORY FIELD PROTECTION 3-43

BA register
3-44

LA register
3-44
DEAD START SEQUENCE 3-44
4. INSTRUCTIONS 4-1
INSTRUCTION FORMAT 4-1

Arithmetic, logical format
4-1

Shift, mask format
4-2

Immediate constant format
4-2

Memory transfer format
4-3

Branch format
4-4
SPECIAL REGISTER VALUES 4-5
INSTRUCTION ISSUE 4-5
INSTRUCTION DESCRIPTIONS 4-6

000000 Error exit
4-7

001i jk Monitor functions
4-8

0020xk Transmi t (AK) to VL
4-10 2240004 v C

0021xx Set the floating point mode flag in the M register
4-11

0022xx Clear the floating point mode flag in the M register
4-11

003xjx Transmit (Sj) to vector mask
4-12

004xxx Normal exit
4-13

005xjk Branch to (Bjk)
4-14

006ijkm Branch to ijkm
4-15

007ijkm Return jump to ijkm; set Boo to (P)
4-16

010ijkm Branch to ijkm if (Ao) = 0
4-17

011ijkm Branch to ijkm if (Ao) 0
4-17

012ijkm Branch to ijkm if (A0) positive
4-17

013ijkm Branch to ijkm if (Ao) negative
4-17

014ijkm Branch to ijkm if (So) = 0
4-18

015ijkm Branch to ijkm if (So) 0
4-18

016ijkm Branch to ijkm if (So) positive
4-18

017ijkm Branch to ijkm if (So) negative
4-18

0201jkm Transmit j km to Ai
4-19

021ijkm Transmit complement of jkm to Ai
4-19

022ijk Transmit j k to Ai
4-20

023ijx Transmit (Sj) to Ai TD>4-21

024ijk Transmit (Bjk) to Ai
4-22

025ijk Transmit (Ai) to Bjk
4-22

026ijx Population count of (Sj) to Ai
4-23

027ijx Leading zero count of (Sj) to Ai
4-24

030ijk Integer sum of (Aj) and (Ak) to Ai
4-25

031ijk Integer difference (Aj) and (Ak) to Ai
4-25

032ijk Integer product of (Aj) and (Ak) to Ai
4-26

033ijk Transmit I/0 status to Ai
4-27

034ijk Block transfer (Ai) words from memory starting at

address (Ao) to B register starting at register jk
4-29

035ijk Block transfer (Ai) words from B registers starting

at register jk to memory starting at address (Ao)
4-29

036ijk Block transfer (Ai) words from memory starting at

address (Ao) to T registers starting at register jk
4-29

037ijk Block transfer (Ai) words from T registers starting

at register jk to memory starting at address (Ao)
4-29 2240004 vi
C

040ijkm Transmit jkm to Si
4-31

041ijkm Transmit complement of jkm to Si
4-31

042ijk Form 64-jk bits of one's mask in Si from right
4-32

043ijk Form jk bits of one's mask in Si from left
4-32

044ijk Logical product of (Sj) and (Sk) to Si
4-33

045ijk Logical product of (Sj) and complement of Sk to Si
4-33

046ijk Logical difference of (Sj) and (Sk) to Si
4-33

047ijk Logical difference of (Sk) and complement of Sk) to Si
4-33

050i j k Scalar merge
4-33

051ijk Logical sum of (Sj) and (Sk) to Si
4-33

052ijk Shift (Si) left jk places to So
4-36

053ijk Shift (Si) right 64-jk places to So
4-36

054ijk Shift (Si) left jk places to Si
4-36

055ijk Shift (Si) right 64-jk places to Si
4-36

056ijk Shift (Si) and (Sj) left by (Sk) places to Si
4-37

057ijk Shift (Sj) and (Si) right by (Ak) places to Si
4-37

060ijk Integer sum of (Sj) and (Sk) to Si
4-38

061ijk Integer difference of (Sj) and (Sk) to Si
4-38

062ijk Floating sum of (Sj) and (Sk) to Si
4-39

063ijk Floating difference of (Sj) and (Sk) to Si
4-39

064ijk Floating product of (Sj) and (Sk) to Si
4-40

065ijk Half-precision rounded floating product of (Sj) and (Sk) to Si
4-40

066ijk Rounded floating product of (Sj) and (Sk) to Si
4-40

067ijk Reciprocal iteration; 2-(Sj)*(Sk) to Si
4-40

070ijx Floating reciprocal approximation of (Sj) to Si
4-42

071ijk Transmit (Ak) or normalized floating point constant to Si
4-43

072ixx Transmit (RTC) to Si
4-45

073ixx Transmit (VM) to Si
4-45

074ijk Transmit (Tjk) to Si
4-45

075ijk Transmit (Si) to Tjk
4-45

076ijk Transmit (Vj element (Ak)) to Si
4-46

077ijk Transmit (SP to Vi element (Ak)
4-46 2240004 vii C

10hijkm Read from ((Ah) + jkm) to Ai
4-47

11hijkm Store (Ai) to (Ah) + jkm
4-47

12hijkm Read from ((Ah) + jkm) to Si
4-47

13hijkm Store (Si) to (Ah) + jkm
4-47

140ijk Logical products of (Sj) and (Vk elements) to Vi elements
4-49

141ijk Logical products of (Vj elements) and (Vk elements to Vi elements
4-49

142ijk Logical sums of (Sj) and (Vk elements) to Vi elements
4-49

143ijk Logical sums of (Vj elements) and (Vk elements) to Vi elements
4-49

144ijk Logical differences of (Sj) and (Vk elements) to Vi elements
4-49

145ijk Logical differences of (Vj elements) and (Vk elements) to Vi elements
4-49

146ijk If VM bit = 1, transmit (Sj) to Vi elements

If VM bit 1, transmit (Vk elements) to Vi elements
4-49

147ijk If VM bit = 1, transmit (Vj elements) to Vi elements

If VM bit 1, transmit (Vk elements) to Vi elements
4-49

150ijk Single shift of (Vj elements) left by (Ak) places to Vi elements
4-53

151ijk Single shift of (Vi elements) right by (Ak) places to Vi elements
4-53

152ijk Double shifts of (Vj elements) left (Ak) places to Vi elements
4-54

153ijk Double shifts of (Vj elements) right (Ak) places to Vi elements
4-54

154ijk Integer sums (Sj) and (Vk elements) to Vi elements 4-59

155ijk Integer sums (Vj elements) and (Vk elements) to Vi elements
4-59

156ijk Integer differences of (Sj) and (Vk elements) to Vi elements
4-59

157ijk Integer differences of (Vj elements) and (Vk elements) to Vi elements
4-59 2240004 viii
APPENDIXES

A TIMING SUMMARY
A-1

B MODULE TYPES
B-1

C SOFTWARE CONSIDERATIONS
C-1

D INSTRUCTION SUMMARY
D-1
FIGURES

1-1 Basic computer system
1-2

2-1 Physical organization of the mainframe
2-2

2-2 General chassis layout
2-3

2-3 Clock pulse waveform
2-7

3-1 Computation section
3-2

3-2 Integer data formats
3-19

3-3 Floating point data format
3-20

3-4 49-bit floating point addition
3-23

3-5 Floating point multiply pyramid
3-25

3-6 Relationship of instruction buffers and registers
3-30

3-7 Instruction buffers
3-33

3-8 Exchange package
3-37

4-1 General format for instructions
4-1

4-2 Format for arithmetic and logical instructions
4-2

4-3 Format for shift and mask instructions
4-2

4-4 Format for immediate constant instructions
4-3

4-5 Format for memory transfer instructions
4-4

4-6 Two-parcel format for branch instructions
4-4

5-1 Memory organization
5-2

5-2 Memory address
5-3

6-1 Channel I/0 control
6-2
TABLES

1-1 Characteristics of CRAY-1 Computer System
1-3

2-1 Characteristics of a DD-19 Disk Storage Unit
2-13

2240004 xi C

SECTION 1
INTRODUCTION

TECHNICAL COMMUNICATIONS
7850 Metro Parkway, Suite 213, Minneapolis, MN 55420 . (612) 854-7472

PUBLICATION CHANGE NOTICE
November 4, 1977

TITLE: CRAY-1 Hardware Reference Manual
PUBLICATION NO. 2240004 REV. C
This printing obsoletes version B and applies to CRAY-1 Computer Systems starting with Serial No. 3. Revision A remains relevant for Serial l.

INTRODUCTION 1

The CRAY-1 Computer System is a powerful general-purpose computer capable of extremely high processing rates. These rates are achieved by combining scalar and vector capabilities into a single central processor which is joined to a large, fast, bi-polar memory. Vector processing by performing iterative operations on sets of ordered data provide results at rates greatly exceeding result rates of conventional scalar processing. Scalar operations complement the vector capability by providing solutions to problems not readily adapted to vector techniques.
Figure 1-1 represents the basic organization of a CRAY-1 system. The central processor unit (CPU) is a single integrated processing unit consisting of a computation section, a memory section, and an input/ output section. The memory is expandable from 0.25 million 64-bit words to a maximum of 1.0 million words. The 12 input channels and 12 output channels in the input/output section connect to a maintenance control unit (MCU), a mass storage subsystem, and a variety of front-end systems or peripheral equipment. The MCU provides for system initializa- tion and for monitoring system performance. The mass storage subsystem provides secondary storage and consists of one to eight Cray Research DCU-2 Disk Controllers, each with one to four DD-19 Disk Storage Units. Each DD-19 has a capacity of 2.424 x 109 bits so that a maximum mass storage configuration could hold 9.7 x 109 8-bit characters.
I/0 channels can be connected to independent processors referred to as front-end computers or 1/0 stations or can be connected to peripheral equipment according to the requirements of the individual installation. At least one front-end system is considered standard to collect data and present it to the CRAY-1 for processing and to receive output from the CRAY-1 for distribution to slower devices.
Table 1-1 summarizes the characteristics of the system. The following paragraphs provide an additional introduction to the three sections of the CPU; later sections of this manual describe the features in detail.
2240004 1-1 C

2240004 1-2 C
Table 1-1. Characteristics of the CRAY-1 Computer System

COMPUTATION SECTION

64-bit word
12.5 nanosecond clock period
2's complement arithmetic
Scalar and vector processing modes
Twelve fully segmented functional units
Eight 24-bit address (A) registers
Sixty-four 24-bit intermediate address (B) registers
Eight 64-bit scalar (S) registers
Sixty-four 64-bit intermediate scalar (T) registers
Eight 64-element vector (V) registers, 64-bits per element
Four instruction buffers of 64 16-bit parcels each
Integer and floating point arithmetic
128 Instruction codes
MEMORY SECTION

Up to 1,048,576 words of bi-polar memory
(64 data bits and eight error correction bits)
Eight or sixteen banks of 65,536 words each
Four-clock-period bank cycle time
One word per clock period transfer rate to B, T, and V registers
One word per two clock periods transfer rate to A and S registers
Four words per clock period transfer rate to instruction buffers
Single error correction - double error detection (SEC-DED)
INPUT/OUTPUT SECTION

Twelve input channels and twelve output channels
Channel groups contain either six input or six output channels
Channel groups served equally by memory (scanned every four clock periods)
Channel priority resolved within channel groups
Sixteen data bits, three control bits per channel, and 4 parity bits
Lost data detection

COMPUTATION SECTION
The computation section contains instruction buffers, registers and functional units which operate together to execute a program of instructions stored in memory.
Arithmetic operations are either integer or floating point. Integer arithmetic is performed in two's complement mode. Floating point quantities have signed-magnitude representation.
The CRAY-1 executes 128 operation codes as either 16-bit (one parcel) or 32-bit (two-parcel) instructions. Operation codes provide for both scalar and vector processing.
Floating point instructions provide for addition, subtraction, multiplication, and reciprocal approximation. The reciprocal approximation instruction allows for the computation of a floating divide operation using a multiple instruction sequence.
Integer or fixed point operations are provided as follows: integer addition, integer subtraction, and integer multiplication. An integer multiply operation produces a 24-bit result; additions and subtractions produce either 24-bit or 64-bit results. No integer divide instruction is provided and the operation is accomplished through a software algorithm using floating point hardware.
The instruction set includes Boolean operations for OR, AND, and exclusive OR and for a mask-controlled merge operation. Shift operations allow the manipulation of either 64-bit or 128-bit operands to produce 64-bit results. With the exception of 24-bit integer arithmetic, all operations are implemented in vector as well as scalar instructions. The integer product is a scalar instruction designed for index calculation. Full indexing capability allows the programmer to index throughout memory in either scalar or vector modes. The index may be positive or negative in either mode. This allows matrix operations in vector mode to be performed on rows or the diagonal as well as conventional column-oriented operations.
Each functional unit implements an algorithm or a portion of the instruction set. Units are independent and are fully segmented. This means that a new set of operands for unrelated computation may enter a functional unit each clock period.
2240004 1-4 C
MEMORY SECTION
The memory for the CRAY-1 normally consists of 16 banks of bi-polar 1024- bit LSI memory. Three memory size options are available: 262,144 words, 524,288 words, or 1,048,576 words. Each word is 72 bits long and consists of 64 data bits and 8 check bits. The banks are independent of each other.
Sequentially addressed words reside in sequential banks. The memory cycle time is four clock periods (50 nsec). The access time, that is, the time required to fetch an operand from memory to a scalar register is 11 clock periods (132.5 nsec). There is no inherent memory degradation for 16-bank memories of less than one million words.
The maximum transfer rate for B, T, and V registers is one word per clock period. For A and S registers, it is one word per two clock periods. Transfers of instructions to the instruction buffers occur at a rate of 16 parcels (four words) per clock period. Thus, the high speed of memory supports the requirements of scientific applications while its low cycle time is well suited to random access applications. The phased memory banks allow high communication rates through the I/0 section and provide low read/store times for vector registers.
INPUT/OUTPUT SECTION
Input and output communication with the CRAY-1 is over 12 full duplex 16-bit channels. Associated with each channel are control lines that indicate the presence of data on the channel (ready), data received (resume), or transfer complete (disconnect).
The channels are divided into four channel groups. A channel group consists of either six input paths or six output paths. The four channel groups are scanned sequentially for I/0 requests at a rate of one channel group per clock period. The channel group will be reinterrogated four clock periods later whether any I/0 request is pending in the channel or not. If more than one channel of the channel group is active, the requests are resolved on a priority basis. The request from the lowest numbered channel is serviced first.
t See 8-Bank Phasing Option, section 5.
2240004 1-5 C
VECTOR PROCESSING
All operands processed by the CRAY-1 are held in registers prior to their being processed by the functional units and are received by registers after processing. In general, the sequence of operations is to load one or more vector registers from memory and pass them to functional units. Results from this operation are received by another vector register and may be processed additionally in another operation or returned to memory if the results are to be retained.
The contents of a V register are transferred to or from memory by specifying a first word address in memory, an increment for the memory address, and a length. The transfer proceeds beginning with the first element of the V register and incrementing by one in the V register at a rate of up to one word per clock period depending on memory conflicts.
A result may be received by a V register and re-entered as an operand to another vector computation in the same clock period. This mechanism allows for "chaining" two or more vector operations together. Chain operation allows the CRAY-1 to produce more than one result per clock period. Chain operation is detected automatically by the CRAY-1 and is not explicitly specified by the programmer, although the programmer may reorder certain code segments in order to enable chain operation.
There may be a conflict between scalar and vector operations only for the floating point operations and storage access. With the exception of these operations, the functional units are always available for scalar operations. A vector operation will occupy the selected functional unit until the vector has been processed.
Parallel vector operations may be processed in two ways:

Using different functional units and all different V registers.
Chain mode, using the result stream from one vector register simultaneously as the operand to another operation using a different functional unit.

Parallel operations on vectors allow the generation of two or more results per clock period. Most vector operations use two vector registers as
2240004 1-6 C
operands or one scalar and one vector register as operands. Exceptions are vector shifts, vector reciprocal, and the load or store instructions.
Since many vectors exceed 64 elements, a long vector is processed as one or more 64-element segments and a possible remainder of less than 64 elements. Generally, it is convenient to compute the remainder and process this short segment before processing the remaining number of 64-element segments; however, a programmer may choose to construct the vector loop code in any of a number of ways. The processing of long vectors in FORTRAN is handled by the compiler and is transparent to the programmer.
2240004 1-7

SECTION 2
PHYSICAL ORGANIZATION

PHYSICAL ORGANIZATION 2

INTRODUCTION
The CRAY-1 computer system consists of the following:
- The CPU mainframe
- A power cabinet
- Two condensing units
- Two motor generators and control cabinets
- A maintenance control unit (MCU)
- One or more disk systems
- An interface to a front-end computer
MAINFRAME The CRAY-1 mainframe, figure 2-1, is composed of 24 logic chassis. The 'chassis are arranged two per column in a 270^o arc which is 56.5 inches in diameter. The twelve columns are 77 inches high. At the base of the columns, 19 inches high and extending outward 30 inches, are cabinets for power supplies and cooling distribution systems.
Viewing the cabinet from the top, the chassis of the upper circle are labeled A through L proceeding in a counter-clockwise direction from the opening. The chassis of the lower circle are labeled M through X. The assignment of modules to chassis is illustrated in figure 2-2.
MODULES
The CRAY-1 computer system uses only one basic module construction through- out the entire machine. The module consists of two 6 x 8 inch printed circuit boards mounted on opposite sides of a heavy copper heat transfer plate. Each printed circuit board has capacity for a maximum of 144 integrated circuit (IC) packages and approximately 300 resistor packages.
2240004 2-1 C

- Dimensions
Base - 103 1/2 inches diameter by 19 inches high
Columns- 56 1/2 inches diameter by 77 inches high including height of base
- 24 chassis
- 1662 modules (16 banks); 113 module types
- Each module contains up to 288 IC packages per module
- Power consumption approximately 115 kw input for maximum memory size
- Freon cooled with Freon/water heat exchange
- Three memory options
- Weight 10,500 lbs (maximum memory size)
- Three basic chip types
5/4 NAND gates
Memory chips
Register chips
Figure 2-1. Physical organization of mainframe

2240004 2-2 C

2240004 2-3

There are 1662 modules in a standard 16-bank t CRAY-1 memory. Modules are arranged 72 per chassis as illustrated in figure 2-2. There are 113 module types. Usage varies from 1 to 708 modules per type. Module type and usage is summarized in Appendix B. Each module type is identified by two letters. The first indicates the module series (A, D, F, G, H, J, M, R, S, T, V, X, and Z). The second letter identifies types of modules within a series.
The computation and I/0 modules are on the eight chassis forming the center four columns. Each of the eight chassis on either side of the four center columns contains one of the 16 memory banks.
Modules are cooled by transferring heat via the heat transfer plate to cooling bars which in turn transfer the heat to Freon. Power dissipation depends on module density. The maximum module power dissipation by type is approximately 65 watts. The average module dissipation by usage is approximately 49 watts.
Two supply voltages are used for each module: -5.2 volts for IC power; -2.0 volts for line termination.
Each module has 96 pin pairs available for interconnecting to other modules. All interconnections are via twisted pair wire. The average utilization of pins is approximately 60 per cent.
Each module has 144 available test points which can be used for trouble shooting. Test points are driven by circuits which do not drive other loads.
Printed circuit board
The printed circuit board used in the CRAY-1 computer system is a 5-layer board. The two outer surfaces of the PC board are used for signal runs; the inner three layers are used for the -5.2 V, -2.0 V, and ground supplies. Signal foil runs are a nominal 0.0075 inch. The spacing of the signal layer to the adjacent voltage is a nominal 0.008 inch. The dimensions used provide signal lines with an impedance of 50 to 60 ohms.
Conventional PC techniques are used in the construction of the PC board.
2240004 2-4 C
Holes are drilled in the PC board for component mounting, interconnecting signal layers, and supplying signal and voltages to components. All holes are plated. The two signal layers are tin-lead plated before etching. The finished PC board is reflowed to eliminate slivers caused by the etching process.
Module assembly
The individual boards of the module are arranged, flow soldered, and inspected prior to being assembled as a module. Logic testing is done at the module level.
Integrated circuit packages
All integrated circuit devices used in the CRAY-1 are packaged in a common package type. The package is a 16-pin hermetically sealed flat pack. Gold or tin-lead plated leads are used depending on the vendor. The 16-pin flat pack was chosen for its reliability and compactness.
IC high-speed logic gate
With minor exceptions, one type of logic gate is used for the central processing unit. This is an ECL circuit with either four or five inputs and with both normal and inverted outputs available to drive loads. One four-input gate and one five-input gate are packaged in a 16-pin flat pack (5/4 gate). All latches, adders, subtracters, etc., are made of this basic gate. The high-speed logic gate has a minimum propagation delay of 0.5 nsec and a maximum propagation delay of 1 nsec. Edge speeds are 1 nsec or less.
IC slow-speed logic gate
The slow-speed gate is a MECL 10K version of the high-speed gate and is used in the memory module for address fanout. The speed is adequate for this application and the lower power requirement is an advantage.
16x1 register chip
The 16x1 register chip provides very fast temporary storage for scalar and vector functional units. The chips are used for instruction buffers and for B, T, and V registers. The chips have a 6 nsec read/write time, well within the 12.5 nsec clock period.
2240004 2-5 C
1024x1 memory chip
The bipolar 1024x1 LSI chip is the basic building block around which the CRAY-1 memory is built. The chip was developed by Fairchild using the isoplanar technology. The memory chip has a maximum 50 nsec read/write cycle time. Address decoding is internal to the package and is compatible with standard ECL logic levels.
Resistors
Only two resistor types are used throughout the entire CRAY-1 computer system. They are a center-tapped 120-ohm resistor providing two 60-ohm resistors per package; and a 300-ohm resistor tapped to provide -a 120-ohm and 180-ohm resistor. The basic resistor package is a three-lead device in a ceramic substrate. The resistance film is tantalum nitride. The lead frame is thermal pulse bonded. An epoxy covering is used to protect the film from mechanical damage.
All printed circuit boards lines are treated as transmission lines. To provide the proper termination of the transmission lines, each line is parallel-terminated to the -2.0 volt supply. A 60-ohm resistor is used to match the transmission line impedance. To minimize noise on the -2.0 V supply, all used logic gate inputs and outputs are terminated with a 60- ohm resistor to -2.0 volts.
The 16x1 register chip and the 1024x1 memory chip provide only a normal signal output (logic gates provide the normal and inverted output signals). To minimize the noise that could be introduced on the -2.0 volt bus by an unbalanced load, these two devices are terminated with a Thevenin equivalent to the -5.2 volt supply. The 300-ohm resistor is used for the Thevenin equivalent termination.
Connector strips
The module connector strip uses 96 individual sockets molded in plastic. The chassis connector strip uses 96 mating pins molded in plastic. Individual pins and sockets when assembled are mounted on 0.050-inch centers with mounting holes provided in the assembled plastic strip. Each board has 96 holes provided for connecting signals to the module connector
2240004 2-6 C
strip. The chassis connector strip is assembled with an 18-inch wire crimped to each pin. Wire pairs are twisted after assembly to provide the twisted pair wire transmission lines. The interconnection of twisted pair wires is made in the center of the line using a solder sleeve.
CLOCK
All timing within the mainframe cabinet is controlled by a single phase synchronous clock network. This clock has a period of 12.5 nsec. The lines that carry the clock signal from the central clock source to the individual modules of the CPU are all made of uniform length so that the leading edge of a clock signal arrives at all parts of the CPU cabinet at the same time. A three nanosecond pulse (figure 2-3) is formed on each module.

References to clock periods in this manual are often given in the form CPn where n indicates the number of the clock period during which an event occurs. Clock periods are numbered beginning with CPO. Thus, the third clock period would be referred to as CP2.
POWER SUPPLIES
Thirty-six power supplies are used for the CRAY-1 computer system. There are twenty -5.2 volt supplies and sixteen -2.0 volt supplies. The supplies are divided into twelve groups of three. Each group supplies one column. The power supply design assumes a constant load. The power supplies do not have internal regulation but depend on the motor-generator to isolate and regulate incoming power. The power supplies use a twelve-phase transformer,
2240004 2-7 C
Silicon diodes, balancing coil, and a filter choke to supply low ripple DC voltages. The entire supply is mounted on a Freon-cooled heat sink. Power is distributed via bus bars to the load.
PRIMARY POWER SYSTEM
The primary power system consists of a 150 KW motor generator, motor- generator control cabinet, and power distribution cabinet. The motor generator supplies 208 V, 400 cycle, three-phase power to the power distribution cabinet, which the power distribution cabinet supplies via a variac to each power supply. The power distribution cabinet also contains voltage and temperature monitoring equipment to detect power and cooling malfunctions.
COOLING
Modules in the CRAY-1 computer system are cooled by the exchange of heat from the module heat sink to a cold bar which is Freon cooled. The module heat sink is wedged along both 8-inch edges to a cold bar. Cold bars are arranged in vertical columns, with each column having capacity for 128 modules. The cold bar is a cast aluminum bar containing a stainless steel refrigerant tube.
To assure component reliability, the cooling system was designed to provide a maximum case temperature of 130^o F (54^o C). To meet this goal, the following temperature differentials are encountered:

IC case temperature at center of module 130^o F (54^o C)
IC case temperature at edge of module 118^o F (48^o C)
Cold plate temperature at wedge 78^o F (25^o C)
Cold bar temperature 70^o F (21^o C)
Refrigerant tube temperature 70^o F (21^o C)
Two 20-ton compressors are located external to the computer room to complete the cooling system.
2240004 2-8 C
MAINTENANCE CONTROL UNIT
The CRAY-1 computer system is equipped with a 16-bit minicomputer system that serves as a maintenance tool and provides control for the system initialization. After the CRAY-1 operating system has been initialized and is operational, communication with the MCU is via a software protocol. The MCU is connected to a CRAY-1 channel pair with additional control signals for execution of the master clear operation, I/0 master clear operation, dead dump operation, and sample parity error operation.
The maintenance control unit (MCU) includes:

A Data General ECLIPSE S-200 minicomputer or equivalent with 32K words of 16-bit memory
An 80-column card reader
A 132-column line printer
An 800 bpi 9-track tape unit
Two display terminals
A moving head disk drive
Included in the MCU system is a software package that enables it to serve as a local batch station during production hours. As a local station, diagnostic routines may be submitted for execution along with other batch jobs. These diagnostics are typically stored on the local disk and are submitted to the CRAY-1 by operator command.
The system initialization procedure is referred to in this manual as the dead start sequence. This sequence is described in detail in Section 3.
Detailed information about the MCU is presented in separate publications.
2240004 2-9 C
FRONT-END COMPUTER
The CRAY-1 computer system may be equipped with one or more front-end computer systems that provide input data to the CRAY-1 computer system and receive output from the CRAY-1 to be distributed to a variety of slow-speed peripheral equipments. A front-end computer system is a self- contained system that executes under the control of its own operating system. Peripheral equipment attached to the front-end computer will vary depending on the use to which the system is put.
A front-end computer may service the CRAY-1 in the following ways:

As a local operator station
As a local batch entry station
As a data concentrator for multiplexing several other stations into a single CRAY-1 channel
As a remote batch entry station
Detailed information about the front-end system is presented in separate publications.
EXTERNAL INTERFACE
The CRAY-1 is interfaced to front-end systems through special interface controllers that compensate for differences in channel widths, machine word size, electrical logic levels, and control protocols. The interface is a Cray Research, Inc. product implemented in ECL logic compatible with the host system. One or more interface controllers are contained in a small chassis located near the CRAY-1 mainframe. A primary goal of the interface is to maximize the utility of the front-end channel connected to the CRAY-1. Such a channel is generally slower than CRAY-1 channels. It is desirable that channel cables be limited to less than 75 feet. If site conditions require that the interconnected systems be physically located a considerable distance from each other, the effective transmission rate may be degraded.
2240004 2-10 C
MASS STORAGE SUBSYSTEM
Mass storage for the CRAY-1 computer system consists of two or more Cray Research Inc. DCU-2 Disk Controllers and multiple DD-19 Disk Storage Units. The disk controller is a Cray Research, Inc. product and is implemented in flat-pack ECL logic similar to that used in the CRAY-1 mainframe. The controller operates synchronously with the mainframe over a 16-bit full-duplex channel. The controller is in a DCC-1 Freon cooled cabinet located near the mainframe. Up to four controllers may be contained in one cabinet. The cabinet requires about five square feet of floor space and is 49 inches high.
Each controller may have from one to four DD-19 disk storage units attached to it. Data passes through the controller to or from one disk storage unit at a time. The controller may be connected to a 16-bit minicomputer station in addition to the CRAY-1. If this additional connection is made, the station and mainframe may share the controller operation on a function-by- function basis.
Each of the DD-19 disk storage units has two ports for controllers. A second independent data path may exist to each disk storage unit through another Cray Research controller. Reservation logic is provided to control access to each disk storage unit.
Operational characteristics of the DD-19 Disk Storage Units are summarized in Table 2-1. Further information about the mass storage subsystem is presented in separate publications.

Table 2-1. Characteristics of a DD-19 Disk Storage Unit
2240004 2-11 C

SECTION 3
COMPUTATION SECTION

COMPUTATION SECTION 3

INTRODUCTION
The computation section (figure 3-1) consists of an instruction control network, operating registers, and functional units. The instruction control network performs all decisions related to instruction issue and coordinates the activities for the three types of processing, vector, scalar, and address. Associated with each type of processing are registers and functional units that support the processing mode. For vector processing, there are: a set of 64-bit multi-element registers, three functional units dedicated solely to vector applications, and three :floating point functional units supporting both scalar and vector operations. For scalar processing, there are two levels of 64-bit scalar registers and four functional units dedicated solely to scalar processing in addition to the three floating point units shared with the vector operations. For address processing, there are two levels of 24-bit registers and two integer arithmetic functional units.
Vector and scalar processing is performed on data as opposed to address processing which operates on internal control information such as addresses and indexes. The flow of data in the computation section is generally from memory to registers and from registers to functional units. The flow of results is from functional units to registers and from registers to memory or back to functional units. Data flows along either the scalar or vector path depending on the mode of processing it is undergoing. An exception is that scalar registers can provide one of the operands required for vector operations performed in the vector functional units.
The flow of address information is from memory or from control registers to address registers. Information in the address registers can then be distribute to various parts of the control network for use in controlling the scalar, vector, and I/0 operations. The address registers can also supply operands to two integer functional units. The units generate address and index information and return the result to the address registers. Address information can also be transmitted to memory from the address registers.
2240004 3-1 C

2240004 3-2
REGISTER CONVENTIONS
Frequent use is made in this manual of parenthesized register names. This is shorthand notation for the expression "the contents of register ---." For example, "Branch to (P) means "Branch to the address indicated by the contents of the program parcel counter, P."
Extensive use is also made of subscripted designations for the A, B, S, T, and V registers. For example, "Transmit (Tjk) to Si" means "Transmit the contents of the T register specified by the jk designators to the S register specified by the i designator."
In this manual, register bit positions are numbered from left to right starting with bit 0. Bit 63 of an S, V, or T register value represents the least significant bit in the operand. Bit 23 of an A or B register value represents the least significant bit in the operand. When a power of two is meant rather than a bit position, it is referred to as 2n, where n is the power of two.
OPERATING REGISTERS
Operating registers are a primary programmable resource of the CRAY-1. They enhance the speed of the system by satisfying the heavy demands for data that are made by the functional units. A single functional unit may require one to three operands per clock period and may deliver results at a rate of one per clock period. Moreover, multiple functional units can be in use concurrently. To meet these requirements, the CRAY-1 has five sets of registers; three primary sets and two intermediate sets. The three primary sets of registers are vector, scalar, and address designated in this manual as V, S, and A, respectively. These registers are considered primary because functional units can access them directly. For the scalar and address registers, an intermediate level of registers exists which is not accessible to the functional units. These registers act as buffers for the primary registers. Block transfers are possible between these registers and memory so that the number of memory references required for scalar and address operands is greatly reduced. The intermediate registers that support scalar registers are referred to as T registers. The inter- mediate registers that support the address registers are referred to as B registers.
2240004 3-3 C
V REGISTERS
Eight V registers, each with 64 elements are the major computational registers of the CRAY-1. Each element of a V register has 64 bits. When associated data is grouped into successive elements of a V register, the register quantity may be considered a vector. Examples of vector quantities are rows or columns of a matrix or elements of a table.
Computational efficiency is achieved by processing each element of a vector identically. Vector instructions provide for the iterative processing of successive vector register elements. A vector operation begins by obtaining operands from the first element of one or more V registers and delivering the result to the first element of a V register. Successive elements are provided each clock period and as each operation is performed, the result is delivered to successive elements of the result V register. The vector operation continues until the number of operations performed by the instruction equals a count specified by the contents of the vector length (VL) register. Vectors having lengths exceeding 64 are handled under program control in groups of 64 and a remainder.
A result may be received by a V register and retransmitted as an operand to a subsequent operation in the same clock period. This use of a register as both a result and operand register allows for the "chaining" of two or more vector operations together. In this mode, two or more results may be produced per clock period.
The contents of a V register are transferred to or from memory in a block mode by specifying a first word address in memory, an increment for the memory address, and a vector length. The transfer then proceeds beginning with the first element of the V register at a maximum rate of one word per clock period, depending upon bank conflicts. Single-word data transfers are possible between an S register and an element of a V register.
In this manual, the V registers are individually referred to by the letter V and a numeric subscript in the range 0 through 7. Vector instructions
2240004 3-4
reference V registers by allowing specification of the subscript as the i, j, or k designator as described in section 4 of this manual. Individual elements of a V register are designated in this manual by decimal numbers in the range 00 through 63.
V register reservations
The term "reservation" describes the register condition when a register is in use and therefore not available for use as a result or as an operand register for another operation. During execution of a vector instruction, reservations are placed on the operand V registers and on the result V register. These reservations are placed on the registers themselves, not on individual elements of the V register.
A reservation for a result register is lifted during "chain slot" time. Chain slot time is the clock period that occurs at functional unit time plus two clock periods. During this clock period, the result is available for use as an operand in another vector operation. Chain slot time has no effect on the reservation placed on operand V registers. A V register may serve only one vector operation as the source of one or both operands.
No reservation is placed on the VL register during vector processing. If a vector instruction employs an S register, no reservation is placed on the S register. It may be modified in the next instruction after vector issue. The length of each vector operation is maintained apart from the VL register. Vector operations employing different lengths may proceed concurrently.
The A0 and Ak registers in a vector memory reference are treated in a similar fashion. They are available for modification immediately after use. The vector store instruction (177) is blocked from chain slot execution. The vector read instruction (176) is blocked from chain slot execution if the memory increment is a multiple of eight.
2240004 3-5 C
VECTOR CONTROL REGISTERS
Two registers are associated with vector registers and provide control information needed in the performance of vector operations.- They are the vector length (VL) register and the vector mask (VM) register.
VL register
The 7-bit vector length register can be set to 0 through 1008 and specifies the length of all vector operations performed by vector instructions and the length of the vectors held by the V registers. It controls the number of operations performed for instructions 140 through 177. The VL register may be set to an A register value through use of the 0020 instruction.
VM register
The vector mask register may be set from an S register through the 003 instruction or may be created by testing a vector register for condition using the 175 instruction. The mask controls element selection in the vector merge instructions (146 and 147).

2240004 3-6 C
S REGISTERS
The eight 64-bit S registers are the principal scalar registers for the CPU. These registers serve as the source and destination for operands in the execution of scalar arithmetic and logical instructions. The related functional units perform both integer and floating point arith- metic operations.
S registers may furnish one operand in vector instructions. Single-word transmissions of data between an S register and an element of a V register are also possible.
Data can move directly between memory and S registers or can be placed in T registers as an intermediate step. This allows buffering of scalar operands between S registers and memory.
Data can also be transferred between A and S registers. Another use of the S registers is for setting or reading the vector mask (VM) register or the real-time clock register.
At most, one S register can be entered with data during each clock period. Issue of an instruction is delayed if it would cause data to arrive at the S registers at the same time as data already being processed which is scheduled to arrive from another source.
When an instruction issues that will deliver new data to an S register, a reservation is set for that register to prevent issue of instructions that read the register until the new data has been delivered.
In this manual, the S registers are individually referred to by the letter S and a numeric subscript in the range 0 through 7. Instructions reference S registers by allowing specification of the subscript as the i, j, or k designator as described in section 4 of this manual. The only register to which an implicit reference is made is the So register. The use of this register is implied in the following branch instructions: 014 through 017. Refer to section 4 for additional information concerning the use of S registers by instructions.
2240004 3-7
T REGISTERS There are sixty-four 64-bit T registers in the computation section. The T registers are used as intermediate storage for the S registers.
Data may be transferred between T and S registers and between T registers and memory. The transfer of a value between a T register and an S register requires only one clock period. T registers reference memory through block read and block write instructions. Block transfers occur at a maximum rate of one word per clock period. No reservations are made for T registers and no instructions can issue during block transfers to and from T registers.
In this manual, T registers are referred to by the letter T and a 2-digit octal subscript in the range 00 through 77-. Instructions reference T registers by allowing specification of the octal subscript as the jk designator as described in section 4 of this manual.
A REGISTERS
The eight 24-bit A registers serve a variety of applications. They are primarily used as address registers for memory references and as index registers but also are used to provide values for shift counts, loop control, and channel I/0 operations. In address applications, they are used to index the base address for scalar memory references and for providing both a base address and an index address for vector memory references.
The address functional units support address and index generation by performing 24-bit integer arithmetic on operands obtained from A registers and delivering the results to A registers.
Data can move directly between memory and A registers or can be placed in B registers as an intermediate step. This allows buffering of the data between A registers and memory.
Data can also be transferred between A and S registers. The vector length register is set by transmitting a value to it from an A register.

2240004 3-8 C
At most, one A register can be entered with data during each clock period. Issue of an instruction is delayed if it would cause data to arrive at the A registers at the same time as data already being processed which is scheduled to arrive from another source.
When an instruction issues that will deliver new data to an A register, a reservation is set for that register to prevent issue of instructions that read the register until the new data has been delivered.
In this manual, the A registers are individually referred to by the letter A and a numeric subscript in the range 0 through 7. Instructions reference A registers by allowing specification of the subscript as the h, i, j, or k designator as described in section 4 of this manual. The only register to which an implicit reference is made is the A0 register. The use of this register is implied in the following instructions:
010 through 013
034 through 037
176 and 177
Refer to section 4 for additional information concerning the use of A registers by instructions.
B REGISTERS
There are sixty-four 24-bit B registers in the computation section. The B registers are used as intermediate storage for the A registers. Typically, the B registers will contain data to be referenced repeatedly over a sufficiently long span that it would not be desirable to retain the data in either A registers or in memory. Examples of uses are loop counts, variable array base addresses, and dimensions.
The transfer of a value between an A register and a B register requires only one clock period. A block of B registers may be transferred to or from memory at the maximum rate of one 24-bit value per clock period. No reservations are made for B registers and no instructions can issue during block transfers to and from B registers.
2240004 3-9
In this manual, B registers are individually referred to by the letter B and a 2-digit octal subscript in the range 00 through 77. Instructions reference B registers by allowing specification of the octal subscript as the jk designator as described in section 4 of this manual. The only B register to which an implicit reference is made is the Boo register. On execution of the return jump instruction (007), register Boo is set to the next instruction parcel address (P) and a branch to an address specified by ijkm occurs. Upon receiving control, the called routine will conventionally save (Boo) so that the Boo register will be free for the called routine to initiate return jumps of its own. When a called routine wishes to return to its caller, it restores the saved address and executes a 005 instruction. This instruction, which is a branch to (Bjk), causes the address saved in Bjk to be entered into P as the address of the next instruction parcel to be executed.
FUNCTIONAL UNITS
Instructions other than simple transmits or control operations are performed by hardware organizations known as functional units. Each unit implements an algorithm or a portion of the instruction set. Units are independent; a number of functional units can be in operation at the same time.
A functional unit receives operands from registers and delivers the result to a register when the function has been performed. The units operate essentially in three-address mode with source and destination addressing limited to register designators.
All functional units perform their algorithms in a fixed amount of time; no delays are possible once the operands have been delivered to the unit. The amount of time required from delivery of the operands to the unit to the completion of the calculation is termed the "functional unit time" and is measured in 12.5 nsec clock periods.
The functional units are all fully segmented. This means that a new set of operands for unrelated computation may enter a functional unit each
2240004 3-10 C
clock period even though the functional unit time may be more than one clock period. This segmentation is made possible by capturing and holding the information arriving at the unit or moving within the unit at the end of every clock period.
Twelve functional units are identified in this manual and are arbitrarily described in four groups: address, scalar, vector, and floating point. The first three groups each act in conjunction with one of the three primary register types, A, S, and V, to support the address, scalar, and vector modes of processing available in the CRAY-1. The fourth group, floating point, can support either scalar or vector operations and will accept operands from or deliver results to S or V registers accordingly.
ADDRESS FUNCTIONAL UNITS
The address functional units perform 24-bit integer arithmetic on operands obtained from A registers and deliver the results to an A register. The arithmetic is two's complement.
Address add unit
The address add unit performs 24-bit integer addition and subtraction. The unit executes instructions 030 and 031. The addition and subtraction are performed in a similar manner. However, the two's complement subtraction for the 031 instruction occurs as follows. The one's complement of the Ak operand is added to the Aj operand. Then a one is added in the low order bit position of the result.
No overflow is detected in the functional unit.
The functional unit time is two clock periods.
Address multiply unit
The address multiply unit executes instruction 032 which forms a 24-bit integer product from two 24-bit operands. No rounding is performed.
The functional unit does not detect overflow of the product.
The functional unit time is six clock periods.
2240004 3-11 C
SCALAR FUNCTIONAL UNITS
The scalar functional units perform operations on 64-bit operands obtained from S registers and in most cases deliver the 64-bit results to an S register. The exception is the population/leading zero count unit which delivers its 7-bit result to an A register.
Four functional units are exclusively associated with scalar operations and are described here. Three functional units are used for both scalar and vector operations and are described under the section entitled Floating Point Functional Units.
Scalar add unit
The scalar add unit performs 64-bit integer addition and subtraction. It executes instructions 060 and 061. The addition and subtraction are per- formed in a similar manner. However, the two's complement subtraction for the 061 instruction occurs as follows. The one's complement of the Sk operand is added to the Sj operand. Then a one is added in the low order bit position of the result.
No overflow is detected in the unit.
The functional unit time is three clock periods.
Scalar shift unit
The scalar shift unit shifts the entire 64-bit contents of an S register or shifts the double 128-bit contents of two concatenated S registers. Shift counts are obtained from an A register or from the jk portion of the instruction. Shifts are end off with zero fill. For a double shift, a circular shift is effected if the shift count does not exceed 64 and the i and j designators are equal and non-zero.
The scalar shift unit executes instructions 052 through 057. Single- register shift instructions, 052 through 055, are executed in two clock periods. Double-register shift instructions, 056 and 057, are executed in three clock periods.
2240004 3-12 C
Scalar logical unit
The scalar logical unit performs bit-by-bit manipulation of 64-bit quantities obtained from S registers. It executes instructions 042 through 051, the mask and Boolean instructions.
The scalar logical unit is an integral part of the modules containing the S registers. Since data does not have to leave the modules for the function to be performed, operations require only one clock period.
Population/leading zero count unit
This functional unit executes instructions 026 and 027. Instruction 026, which counts the number of bits having a value of one in the operand, executes in four clock periods. Instruction 027, which counts the number of bits of zero preceding a one bit in the operand, executes in three clock periods. For either instruction, the 64-bit operand is obtained from an S register and the 7-bit result is delivered to an A register.
VECTOR FUNCTIONAL UNITS
Most vector functional units perform operations on operands obtained from one or two V registers or from a V register and an S register. The reciprocal unit, which requires only one operand, is an exception. Results from a vector functional unit are delivered to a V register.
Successive operand pairs are transmitted to a functional unit each clock period. The corresponding result emerges from the functional unit n clock periods later where n is the functional unit time and is constant for a given functional unit. The vector length determines the number of operand pairs to be processed by a functional unit.
Three functional units are exclusively associated with vector operations and are described in this subsection. Three functional units are associated with both vector operations and scalar operations and are described in the subsection entitled Floating Point Functional Units. When a floating point unit is used for a vector operation, the general description of vector functional units given in this subsection applies.
Vector functional unit reservation
A functional unit engaged in a vector operation remains busy during each clock period and may not participate in other operations. In this state,
2240004 3-13 C
the functional unit is said to be reserved. Other instructions that require the same functional unit will not issue until the previous operation is completed. Only one functional unit of each type is available to the vector instruction hardware. When the vector operation completes, the reservation is dropped and the functional unit is then available for another operation.
Recursive characteristic of vector functional units
In a vector operation, the result register (designated by i in the instruction) is not normally the same V register as the source of either of the operands (designated by j or k). However, turning the output stream of a vector functional unit back into the input stream by setting i to the same register designator as j or k may be desirable under certain circumstances since it provides a facility for reducing 64 elements down to just a few. The number of terms generated by the partial reduction is determined by the number of values that can be in process in a functional unit at one time (i.e., functional unit time + 2CP).
When the i designator is the same as the j or k designator, a recursive characteristic is introduced into the vector processing because of the way in which element counters are handled. At the beginning of an operation for which i is the same as j or k, the element counters for both the operand register and the operand/result register are set to zero. The element counter for the operand/result register is held at zero and does not begin incrementing until the first result arrives from the functional unit at functional unit time + 2 CP. This counter then begins to advance by one each clock period. Note that until f.u. + 2, the initial contents of element zero of the operand/result register are repeatedly sent to the functional unit. The element counter for the other operand register, however, immediately begins advancing by one on each successive clock period
2240004 3-14 C
thus sending the contents of elements 0, 1, 2, ... on successive clock periods. Thus, the first f.u. + 2 elements of the operand/result register contain results based on the contents of element 0 of the operand/result register and on successive elements of the other operand register. These f.u. + 2 elements then provide one of the operands used in calculating the results for the next f.u. + 2 elements. The third group of f.u. + 2 elements of the operand/result register contains results based on the results delivered to the second group of f.u. + 2 elements, and so on until the final group of f.u. + 2 elements is generated as determined by the vector length.
As an example, consider the summation of a vector of floating point numbers where the initial conditions for the vector operation are the following:

All elements of register V1 contain floating point values.
Register V2 will provide one set of operands and will receive the results. Element 0 of this register contains a 0 value.
The vector length register (VL) contains 64.
A floating point add instruction (171212) is then executed using register V1 for one operand and using register V2 as an operand/result register. This instruction uses the floating point add unit which has a functional unit time of 6 CP causing sums to be generated in groups of eight (f.u. + 2 = 8). The final eight partial sums of the 64 elements of V1 are contained in elements 56 through 63 of V2. Specifically, elements of V2 contain the following sums:

2240004 3-15

Note that if an integer summation were performed instead of a floating point summation, five partial sums would be generated and placed in elements 59 through 63 since the functional unit time for the integer add unit is 3 CP. Assuming that the same registers are used as for the previous example but that the registers now contain integer values, the last five elements of V2 would contain the following values:

This recursive characteristic of vector processing is applicable to any vector operation, arithmetic or logical. The value initially placed in element 0 of the operand/result register will depend on the operation being performed. For example, when using the floating point multiply unit, element 0 of the operand/result register will usually be set to an initial value of 1.0.
2240004 3-16 C
Vector add unit
The vector add unit performs 64-bit integer addition and subtraction for a vector operation and delivers the results to elements of a V register. The unit executes instructions 154 through 157. The addition and subtrac- tion are performed in a similar manner. However, for the subtraction operations, 156 and 157, the Vk operand is complemented prior to addition and during the addition a one is added into the low order bit position of the result.
No overflow is detected by the unit.
The functional unit time for the vector add unit is three clock periods.
Vector shift unit
The vector shift unit shifts the entire 64-bit contents of a V register element or the 128-bit value formed from two consecutive elements of a V register. Shift counts are obtained from an A register. Shifts are end-off with zero fill.
The vector shift unit executes instructions 150 through 153. Functional unit time is four clock periods.
Vector logical unit
The vector logical unit performs bit-by-bit manipulation of 64-bit quantities for instructions 140 through 147. The unit also performs the logical operations associated with the vector mask instruction, 175. Because the 175 instruction uses the same functional unit as instructions 140 through 147, it cannot be chained with these logical operations.
Functional unit time is two clock periods.
FLOATING POINT FUNCTIONAL UNITS
The three floating point functional units perform floating point arithmetic for both scalar and vector operations. When executing a scalar instruction, operands are obtained from S registers and the result is delivered to an S register. When executing most vector instructions, operands are obtained from pairs of V registers or from a V register and an S register and the results are delivered to a V register. The reciprocal instruction, which has only one input operand, is an exception.
2240004 3-17 C
A floating point unit is reserved during execution of a vector instruction.
Information on floating point out-of-range conditions is contained in the subsection entitled Floating Point Arithmetic.
Floating point add unit
The floating point add unit performs addition or subtraction of 64-bit operands in floating point format. The unit executes instructions 062, 063, and 170 through 173. Functional unit time is six clock periods.
A result is normalized even if the operands are unnormalized.
Out-of-range exponents are detected as described under Floating Point Arithmetic.
Floating point multiply unit
The floating point multiply unit executes instructions 060 through 067 and 160 through 167. These instructions provide for full and half precision multiplication of 64-bit operands in floating point format and for computing two minus a floating point product for reciprocal iterations.
The half-precision product is rounded; the full-precision product is either rounded or unrounded. Input operands are assumed to be normalized. The unit delivers a normalized result except that the result is not guaranteed to be normalized if the input operands are not normalized.
Out-of-range exponents are detected as described under Floating Point Arithmetic. However, if both operands have zero exponents, the result is considered as an integer product and is not normalized. Functional unit time is seven clock periods.
Reciprocal approximation unit
The reciprocal approximation unit finds the approximate reciprocal of a 64-bit operand in floating point format. The unit executes instructions 070 and 174. Functional unit time is 14 clock periods.
The result is normalized. The input operand is assumed to be normalized; the uppermost bit of the coefficient is not tested but is assumed to be set in the computation.
2240004 3-18
ARITHMETIC OPERATIONS
Functional units in the CRAY-1 either perform two's complement integer arithmetic or perform floating point arithmetic.
INTEGER ARITHMETIC
All integer arithmetic, whether 24 bits or 64 bits, is two's complement and is so represented in the registers as illustrated in figure 3-2. The address add unit and address multiply unit perform 24-bit arithmetic. The scalar add unit and the vector add unit perform 64-bit arithmetic.

Multiplication of two fractional operands may be accomplished using the floating point multiply instruction. The floating point multiply unit recognizes the conditions where both operands have zero exponents as a special case and returns the upper 48 bits of the product of the coefficients as the coefficient of the result and leaves the exponent field zero.
Division of integers would require that they first be converted to floating point format and then divided using the floating point units.
2240004 3-19 C
FLOATING POINT ARITHMETIC
Floating point numbers are represented in a standard format throughout the CPU. This format is a packed representation of a binary coefficient and an exponent or power of two. The coefficient is a 48-bit signed fraction. The sign of the coefficient is separated from the rest of the coefficient as shown in figure 3-3. Since the coefficient is signed magnitude, it is not complemented for negative values.

A zero value or an underflow result is not biased and is represented as a word of all zeros. A negative zero is not generated by any functional unit.
2240004 3-20 C
Normalized floating point
A non-zero floating point number in packed format is normalized if the most significant bit of the coefficient is non-zero. This condition implies that the coefficient has been shifted to the left as far as possible and therefore the floating point number has no leading zeros in the coefficient.
When a floating point number has been created by inserting an exponent of 40060₈ into a word containing a 48-bit integer, the result should be normalized before being used in a floating point operation. Normalization is accomplished by adding the unnormalized floating point operand to zero. Since S_o provides a 64-bit zero when used in the Sj field of an instruction, a normalize of an operand in Sk can be performed using the following instruction:
062iOk
Si contains the normalized result.
Floating point range errors
Overflow of the floating point range is indicated by an exponent value of 60000₈ or greater in packed format. Underflow is indicated by an exponent value of 17777₈ or less in packed format. Detection of the overflow condition will initiate an interrupt if the floating point mode flag is set in the mode register and monitor mode is not in effect. The floating point mode flag can be set or cleared by an object program. The object program has the responsibility to clear the f.p. mode flag via a 0022 instruction at the beginning of each vector branch sequence and resetting it via a 0021 instruction after the merge.
Detection of floating point error conditions by the floating point units is described in the following paragraphs.
2240004 3-21 C
Floating point add unit - A floating point add range error condition is generated for scalar operands when the larger incoming exponent is greater than or equal to 600008. The floating point error flag is set and an exponent of 600008 is sent to the result register along with the computed coefficient, as in the following example:

Underflow is also generated when either, but not both, of the incoming exponents is zero. Both exponents equal to zero is treated as an integer multiply and the result is treated normally with no normalization shift of the result allowed.
2240004 3-22 C
Double precision numbers
The CRAY-1 does not provide special hardware for performing double or multiple precision operations. Double precision computations with 95-bit accuracy are available through software routines provided by Cray Research.
Addition algorithm
Floating point addition or subtraction is performed in a 49-bit register. Trial subtraction of the exponents occurs to select the operand to be shifted down for aligning the operands. The larger exponent operand carries the sign and the shift is always to the right. Bits shifted out of the register are lost; no round-up takes place.

2240004 3-23
Multiplication algorithm
The floating point multiply unit in the CRAY-1 computer has an input of 48 bits of coefficient into a multiply pyramid (figure 3-5). The pyramid truncates part of the lower bits of the 96-bit product. To adjust for this truncation, a constant is unconditionally added above the truncation.
2240004 3-24 C

2240004 3-25 C
Note that reversing the multiplier and multiplicand operands could cause slightly different results, that is, A x B is not necessarily the same as R x A.

A few simplified examples may help to illustrate the CRAY-1 multiplication algorithm. Each of these examples uses only 6-bit arithmetic to aid understanding of this algorithm and its differences from the conventional algorithm. The multiplication is shown in the usual school presentation (intermediate additions are not shown).

2240004 3-26 C

240004 3-27 C
Division algorithm
The CRAY-1 performs floating point division by the method of reciprocal approximation. This facilitates the hardware implementation of a fully- segmented functional unit. Operands may enter the reciprocal unit each clock period because of this segmentation. In vector mode, results are produced at a one clock period rate. These results may be used in other vector operations during chaining because all functional units in the CRAY-1 have the same result rate.

The approximation is based on Newton's method. The reciprocal approxima- tion at step 1 is correct to 30 bits. The additional Newton iteration at step 2 increases this accuracy to 47 bits. This iteration is applied as a correction factor with a full-precision multiply operation.
Where 31 bits of accuracy is sufficient, the reciprocal approximation instruction may be used with the half-precision multiply to produce a half-precision quotient.
The 18 low-order bits of the half-precision results are returned as zeros with a round applied to the low-order bit of the 30-bit result
A scalar quotient is computed in 29 clock periods since operations 2 and 3 issue in successive clock periods.
A vector quotient requires effectively three vector times since operations 1 and 3 are chained together. This hides one of the multiply operations. A vector time is one clock period for each element in the vector.
For example, two 50-element vectors are divided in about 3 * 50 clock periods. This estimate does not include overhead associated with the functional units.
2240004 3-28 C
LOGICAL OPERATIONS
The scalar and vector logical units perform bit-by-bit manipulation of 64-bit quantities. Operations provide for forming logical products, differences, sums and merges.

2240004 3-29 C
INSTRUCTION ISSUE AND CONTROL
This section describes the instruction buffers and registers involved with instruction issue and control. Figure 3-6 illustrates the general flow of instruction parcels through the registers and buffers.

P REGISTER
The P register is a 22-bit register which indicates the next parcel of program code to enter the next instruction parcel (NIP) register in a linear program sequence. The upper 20 bits of the P register indicate the word address for the program word in memory. The lower two bits indicate the parcel within the word. The content of the P register is normally advanced as each parcel successfully enters the NIP register. The value in the P register normally corresponds to the parcel address for the parcel currently moving to the NIP register.
2240004 3-30 C
The P register is entered with new data on an instruction branch or on an exchange sequence. It is then advanced sequentially until the next branch or exchange sequence. The value in the P register is stored directly into the terminating exchange package during an exchange sequence.
The P register is not master cleared. A noisy value is stored in the terminating exchange package at address zero during the dead start sequence.
CIP REGISTER
The CIP (current instruction parcel) register is a 16-bit register which holds the instruction waiting to issue. If this instruction is a two-parcel instruction, the CIP register holds the upper half of the instruction and the LIP holds the lower half. Once an instruction enters the CIP register, it must issue. Issue may be delayed until previous operations have been completed but then the current instruction waiting for issue must proceed. Data arrives at the CIP register from the NIP register. The indicators which make up the instruction are distributed to all modules which have mode selection requirements when the instruction issues.
The control flags associated with the CIP register are generally master cleared. The register itself is not and a noisy instruction will issue during the master clear sequence.
NIP REGISTER
The NIP (next instruction parcel) register is a 16-bit register which holds a parcel of program code prior to entering the CIP register. A parcel of program code which has entered the NIP register must be executed. There is no mechanism to discard it.
2240004 3-31 C
If issue of the instruction in the CIP register is delayed, the data in the NIP register is held over for the next clock period. Data entry to the NIP register is blocked for the second parcel of a two-parcel instruction. The resulting blank is then issued at the CIP register at the proper time as a do nothing instruction. A blank instruction differs from a 000 instruction only in that a true 000 instruction causes an error interrupt.
Data entry in the NIP register is also blocked when the arriving data is not valid. This occurs on crossing buffer boundaries and on branching as well as on interrupt conditions.
The NIP register is not master cleared. A noisy instruction may issue during the master clear interval before the interrupt condition blocks data entry into the NIP register.
LIP REGISTER
The LIP (lower instruction parcel) register is a 16-bit register which holds the lower half of a two-parcel instruction at the time the two- parcel instruction issues from the CIP register. This register is almost the same as the NIP register except that it contains valid data at times when the same data has been blocked from entering the NIP register.
INSTRUCTION BUFFERS
There are four instruction buffers in the CRAY-1, each of which holds 64 consecutive 16-bit instruction parcels (figure 3-7). Instruction parcels are held in the buffers prior to being delivered to the NIP or LIP registers.
The beginning instruction parcel in a buffer always has a parcel address that is an even multiple of 100₈. This allows the entire range of addresses for instructions in a buffer to be defined by the high-order 16 bits of the beginning parcel address. For each buffer, there is a 16-bit beginning address register that contains this value.
The beginning address registers are scanned each clock period. If the high-order 18 bits of the P register match one of the beginning addresses,
2240004 3-32 C

an in-buffer condition exists and the proper instruction parcel is selected from the instruction buffer. An instruction parcel to be executed is normally sent to the NIP. However, the second half of a two-parcel instruction is blocked from entering the NIP and is sent to the LIP, instead, and is available when the upper half issues from the CIP. At the same time, a blank parcel is entered into the NIP.
On an in-buffer condition, if the instruction is in a different buffer than the previous instruction, a change of buffers occurs necessitating a two clock period delay of issue.
An out-of-buffer condition exists when the high-order 18 bits of the P register do not match any instruction buffer beginning address. When this condition occurs, instructions must be loaded into one of the instruction buffers from memory before execution can continue. The
2240004 3-33 C
instruction buffer that receives the instructions is determined by a two- bit counter. Each occurrence of an out-of-buffer condition causes the counter to be incremented by one so that the buffers are selected in rotation.
Buffers are loaded from memory four words per clock period, an operation that fully occupies memory. The first group of 16 parcels delivered to the buffer always contains the instruction required for execution. For this reason, the branch out of buffer time is a constant 14 clock periods .t The remaining groups arrive at a rate of 16 parcels per clock period and circularly fill the buffer.
An instruction buffer is loaded with one word of instructions from each of the 16 memory bankst The first four instruction parcels residing in an instruction buffer are always from bank 0. Figure 3-7 illustrates the organization of pprcels and words in an instruction buffer.
An exchange sequence voids the instruction buffers by setting their beginning address registers to all ones. This prevents a match with the P register and causes one of the buffers to be loaded.
Both forward and backward branching is possible within the buffers. A branch does not cause reloading of an instruction buffer if the instruc- tion being branched to is within one of the buffers. Multiple copies of instruction parcels cannot occur in the instruction buffers. Because instructions are held in instruction buffers prior to issue, no attempt should be made to dynamically modify instruction sequences. As long as the unmodified instruction is in an instruction buffer, the modified instruction in memory will not be loaded into an instruction buffer.
Although optimization of code segment lengths for instruction buffers is not a prime consideration when programming the CRAY-1, the number and size of the buffers and the capability for both forward and backward branching can be used to good advantage. Large loops containing up to 256 consecutive instruction parcels can be maintained in the four buffers or as an alternative, one could have a main program sequence in one or two of the buffers which makes repeated calls to short subroutines maintained in the other buffers. The program and subroutines remain in the buffers undisturbed as long as no out-of-buffer condition causes a buffer to be reloaded.
Refer to 8 Bank Phasing Option, section 5.
2240004 3-34 C
EXCHANGE MECHANISM
Exchange mechanism refers to the technique employed in the CRAY-1 for switching instruction execution from program to program. This technique involves the use of blocks or program parameters known as exchange packages and a CPU operation referred to as an exchange sequence. Three special registers are instrumental in the exchange mechanism. These are the exchange address (XA) register, the mode (M) register, and the flag (F) register.
XA REGISTER
The XA (exchange address) register specifies the first word address of a 16-word exchange package loaded by an exchange operation. The register contains the upper eight bits of a 12-bit field that specifies the address. The lower bits of the field are always zero; an exchange package must begin on a 16-word boundary. The 12-bit limit requires that the absolute address be in the lower 4096 words of memory.
When an execution interval terminates, the exchange sequence exchanges the contents of the registers with the contents of the exchange package at (XA)*16 in memory.
M REGISTER
The P1 (mode) register is a four-bit register that contains part of the exchange package for a currently active program. The four bits are selectively set during an exchange sequence. Bit 37, the floating point error mode flag, can be set or cleared during the execution interval for a program through use of the 0021 and 0022 instructions. The remaining bits are not altered during the execution interval for the exchange package and can be altered only when the exchange package is inactive in storage. Bits are assigned as follows in word two of the exchange package.

Bit 36 Correctable memory error mode flag. When this bit is set, interrupts on correctable errors are enabled.
Bit 37 Floating point error mode flag. When this bit is set, interrupts on floating point errors are enabled.
Bit 38 Uncorrectable memory error mode flag. When this bit is set, interrupts on uncorrectable memory errors are enabled.
Bit 39 Monitor mode flag. When this bit is set, all interrupts other than memory errors are inhibited.

2240004 3-35 C
F REGISTER
The F (flag) register is a nine-bit register that contains part of the exchange package.for the currently active program. This register contains nine flags which are individually identified with the exchange package in figure 3-8. Setting any of these flags causes interruption of the program execution. When one or more flags are set, a request interrupt signal is sent to initiate an exchange sequence. The content of the F register is stored along with the rest of the exchange package and the monitor program can analyze the nine flags for the cause of the interruption. Before the monitor program exchanges back to the package, it may clear the flags in the F register area of the package. If any of the bits is set, another exchange will occur immediately.
Any flag, other than the memory error flag, can be set in the F register only if the currently active exchange package is not in monitor mode. This means that these flags will set only if the highest order bit of the M register is zero. With the exception of the memory error flag, if the program is in monitor mode and the conditions for setting an F register are otherwise present, the flag remains cleared and no exchange sequence is initiated.
EXCHANGE PACKAGE
An exchange package is a 16-word block of data in memory which is associated with a particular computer program. It contains the basic parameters necessary to provide continuity from one execution interval for the program to the next. These parameters consist of the following:
Program address register (P) - 22 bits
Base address register (BA) - 18 bits
Limit address register (LA) - 18 bits
Mode register (M) - 4 bits
Exchange address register (XA) - 8 bits
Vector length register (VL) - 7 bits
Flag register (F) - 9 bits
Current contents of the eight A registers
Current contents of the eight S registers
2240004 3-36 C

2240004 3-37 C
The exchange package contents are arranged in a 16-word block as shown in figure 3-8. Data is swapped from memory to the computer operating registers and back to memory by the exchange sequence. This sequence exchanges the data in a currently active exchange package, which is residing in the operating registers, with an inactive exchange package in memory. The XA address of the currently active exchange package specifies the address of the inactive exchange package to be used in the swap. The data is exchanged and a new program execution interval is initiated by the exchange sequence.
The B register, T register, and V register contents are not swapped in the exchange sequence. The data in these registers must be stored and replaced as required by specific coding in the monitor program which supervises the object program execution.
Memory error data
Two bits in the Mode (M) register determine whether or not the exchange package contains data relevant to a memory error if one occurs prior to an exchange sequence. These are bit 36, the "Interrupt on correctable memory error bit" and bit 38, the "Interrupt on uncorrectable memory error bit". The error data, consisting of four fields of information, appears in the exchange package if bit 38 is set and an uncorrectable memory error is detected or if bit 36 is set and correctable memory error is encountered.
Error type (E) - The type of error encountered, uncorrectable or correctable, is indicated in bits 0 and 1 of the first word of the exchange package. Bit 0 is set for an uncorrectable memory error; bit 1 is set for a correctable memory error.
Syndrome (S) - The eight syndrome bits used in detecting the error are returned in bits 2 through 9 of the first word of the exchange package. Refer to section 5 for additional information.
2240004 3-38 C
Read mode (R) - This field indicates the read mode in progress when the error occurred and consists of bits 10 and 11 of the first word of the exchange package. These bits assume the following values:
00 Scalar
01 I/0
10 Vector
11 Instruction fetch
Read address (RAB) - The RAB field contains the address at which the error occurred. Bits 12 through 15 (B) of the first word of the exchange package contain bits 2³ through 2� of the address and may be considered as the bank address; bits 0 through 15 (RA) of the second word of the exchange package contain bits 2¹⁹ through 2⁴ of the address.
Active exchange package
An active exchange package is an exchange package which is currently residing in the computer operating registers. The interval of time in which the exchange package is active is called the execution interval for the exchange package and also for the program with which it is associated. The execution interval begins with an exchange sequence in which the subject exchange package moves from memory to the operating registers. The execution interval ends as the exchange package moves back to memory in a subsequent exchange sequence.
EXCHANGE SEQUENCE
The exchange sequence is the vehicle for moving an inactive exchange package from memory into the operating registers and at the same time moving the currently active exchange package from the operating registers back into memory. This swapping operation is done in a fixed sequence when all computational activity associated with the currently active exchange package has stopped. The same 16-word block of memory is used as the source of the inactive exchange package and the destination of the currently active exchange package. The location of this block is specified by the content of the exchange address register and is a part of
2240004 3-39 C
the currently active exchange package. The exchange sequence may be initiated in three different ways.
1. Dead start sequence
2. Interrupt flag set
3. Program exit
Initiated by dead start sequence
The dead start sequence forces the exchange address register content to zero and also forces a 000 code in the NIP register. These two actions cause the execution of a program error exit using memory address zero as the location of the exchange package. The inactive exchange package at address zero is then moved into the operating registers and a program is initiated using these parameters. The exchange package stored at address zero is largely noise as a result of the dead start operation and is in effect discarded by the subsequent entry of new data at these storage addresses.
Initiated by interrupt flag set
An exchange sequence can be initiated by setting any one of the nine interrupt flags in the F register. One or more flags set result in a request interrupt signal which initiates an exchange sequence.
Initiated by program exit
There are two program exit instructions that cause the initiation of an exchange sequence. The timing of the instruction execution is the same in either case and the difference is only in which of the two flags in the F register is set. The two instructions are:
Program code 000 - Error exit
Program code 004 - Normal exit
The two exits provide a means for a program to request its own termination. A non-monitor (object) program will usually use the normal exit instruction. to exchange back to the monitor program. The error exit allows for termination of an object program that has branched into an unused area of memory or into a data area. The exchange address selected is the same as for a normal exit.
2240004 3-40 C
There is a flag in the F register for each of these instructions. The appropriate flag is set providing the currently active exchange package is not in monitor mode. The inactive exchange package called in this case is normally one that executes in monitor mode and the flags are sensed for evaluation of the cause of program termination.
The monitor program selects an inactive exchange package for activation by setting the address of the inactive exchange package into the XA register and then executing a normal exit instruction.
Exchange sequence issue conditions
An exchange sequence initiated by other than a 000 or 004 instruction has the following hold issue conditions, execution time, and special cases. The corresponding information for the 000 and 004 instructions is provided with the instruction descriptions in Section 4 of this manual.
Hold issue conditions:
Instruction buffer data invalid
NIP not blank
Wait exchange flag not set
S, V, or A registers busy
Execution time: 49 CPs
Special cases:
Block instruction issue
Block I/0 references
Block fetch
2240004 3-41 C
EXCHANGE PACKAGE MANAGEMENT
Each 16-word exchange package resides in an area defined during system dead start that must lie within the lower 4096 words of memory. The package at address 0 is that of the monitor program. Other packages provide for object programs and monitor tasks. These packages lie outside of the field lengths for the programs they represent as determined by the base and limit addresses for the programs. Only the monitor program has a field defined so ,that it can access all of memory including the exchange package areas. This allows the monitor program to define or alter all exchange packages other than its own when it is the currently active exchange package.
Proper management of exchange packages dictates that a non-monitor program always exchange back to the monitor program that exchanged to it. This assures that the program information is always swapped back into its proper exchange package.
Consider the case where exchange packages exist for programs A, B, and C. Program A is the monitor program, program B is a user program, and program C is an interrupt processing program.
The monitor program, A, begins an execution interval following dead start. No interrupts can terminate its execution interval since it is in monitor mode. The monitor program voluntarily exits by issuing a 004 exit instruction. Before doing so, however, it sets the contents of the XA register to point to B's exchange package so that B will be the next program to execute and it sets the exit address in B's exchange package to point back to the monitor.
The exchange sequence to B causes the exit address from B's exchange package to be entered in the XA register. At the same time, the exchange address in the XA register goes to B's exchange package area along with all other program parameters for the monitor program. When the exchange is complete, program B begins its execution interval.
2240004 3-42 C
Suppose further that while 'B is executing, an interrupt flag sets initiating an exchange sequence. Since B cannot alter the XA register, the exit is back to the monitor program. Program B's parameters swap back into B's exchange package area; the monitor program parameters held in B's package during the execution interval swap back into the operating registers.
The monitor, upon resuming execution, determines that an interrupt has caused the exchange and sets the XA register to call the proper interrupt processor into execution. It does this by setting XA to point to the exchange package for program C. Then, it clears the interrupt and initiates execution of C by executing a 004 exit instruction. Depending on the design of the operating system, the interrupt processor program could execute in monitor mode or in user mode.
MEMORY FIELD PROTECTION
Each object program at execution time has a designated field of memory holding instructions and data. The field limits are specified by the monitor program when the object program is loaded and initiated. The field may begin at any word address that is a multiple of 16 and may continue to another address that is also a multiple of 16. The field limits are contained in two registers, the base address register (BA) and the limit address register (LA), which are described later in this subsection.
All memory addresses contained in the object program code are relative to the base address which begins the defined field. It is, therefore, not possible for an object program to read or alter any memory location with a lower absolute address than the base address. Each object program reference to memory is also checked against the limit address to determine if the address is within the bounds assigned. A memory reference beyond the assigned field limit is prevented from altering the memory content and for a non-monitor mode program, creates an error condition that terminates program execution. The program or operand range flag is set
2240004 3-43 C
to indicate the error correction. The monitor program upon resuming execution determines the cause of the interrupt and takes appropriate action, perhaps terminating the user program.
BA REGISTER
The 18-bit BA register holds the base address of the user field during the execution interval for each exchange package. The contents of this register are interpreted as the upper 18 bits of a 22-bit memory address. The lower four bits of the address are assumed zero. Absolute memory addresses are formed by adding (BA) * 16 to the relative address specified by the CPU instructions. The BA register always indicates a bank 0 memory address.
LA REGISTER
The 18-bit LA register holds the limit address of the user field during the execution interval for each exchange package. The contents of LA are interpreted as the upper 18 bits of a 22-bit memory address. The lower four bits of the address are assumed zero. The LA register always indicates a bank 0 memory address.

DEAD START SEQUENCE
The dead start sequence is that sequence of operations required to start a program running in the CPU after power has been turned off and then turned on again. All registers in the machine, all control latches, and all words in memory are assumed to be noisy after power has been turned on. The sequence of operations required to begin a program is initiated by the maintenance control unit. This unit sequences the following operations:
1. Turns on master clear signal.
2. Turns on I/0 clear signal.
2240004 3-44 C

3. Turns off I/0 clear signal.
4. Loads memory via MCU channel.
5. Turns off master clear signal.
The master clear signal stops all internal computation and forces the critical control latches to predetermined states. The I/0 signal clears the input channel address registers to zero and sets an active status. The maintenance control unit then loads an initial exchange package and monitor program. The exchange package must be located at address zero in memory. Turning off the master clear signal initiates the exchange sequence to read this package and begin execution of the monitor program. Subsequent actions are dictated by the design of the operating system.
2240004 3-45 C

Return to On-Line-Documents

CRAY-1
COMPUTER SYSTEM^®

HARDWARE REFERENCE MANUAL
2240004

SECTION 1
INTRODUCTION

INTRODUCTION 1

SECTION 2
PHYSICAL ORGANIZATION

PHYSICAL ORGANIZATION 2

SECTION 3
COMPUTATION SECTION

COMPUTATION SECTION 3

Revision	Print Date	Description
	1/76	Original printing
A	5/76	Reprint with revision
A-01	9/76	Corrections to pages 3-20, 3-27, 4-9, 4-10, 4-28, 4-36, 4-43, 4-55, and 4-57.
B	10/76	Reprint with revision. Addition of: Floating point range error detection Vector floating point error Error correction
B-01	2/77	Changes to exchange package (p 3-36); additions to instructions 152 and 153 (p 4-53); corrections to syndrome bit description p 5-5; corrections to instruction summary, appendix D.
B-02	7/77	Corrections and changes to pages xi, 2-3, 3-19 through 3-28.1, 3-31, 3-34, 3-36, 3-38, 4-14 through 4-17, 4-54, 4-68, 5-1, 5-3, 5-4, 5-6, 6-2, A-4, D-1 through D-4.
C	11/77	This printing obsoletes revision B. Features added include 8-bank phasing and I/0 master clear procedure. Chart tape reflects only changes introduced with this revision.

1.	INTRODUCTION	1-1
	COMPUTATION SECTION	1-4
	MEMORY SECTION	1-5
	INPUT/OUTPUT SECTION	1-5
	VECTOR PROCESSING	1-6
2.	PHYSICAL ORGANIZATION	2-1
	INTRODUCTION	2-1
	MAINFRAME	2-1
	Modules	2-1
	Printed circuit board	2-4
	Module assembly	2-5
	Integrated circuit packages	2-5
	IC high-speed logic gate	2-5
	IC slow-speed logic gate	2-5
	16x1 register chip	2-5
	10241 memory chip	2-6
	Resistors	2-6
	Connector strips	2-6
	Clock	2-7
	Power supplies	2-7
	PRIMARY POWER SYSTEM	2-8
	COOLING	2-8
	MAINTENANCE CONTROL UNIT	2-9
	FRONT-END COMPUTER	2-10
	EXTERNAL INTERFACE	2-10
	MASS STORAGE SUBSYSTEM	2-11
3.	COMPUTATION SECTION	3-1
	INTRODUCTION	3-1
	REGISTER CONVENTIONS	3-3
	OPERATING REGISTERS	3-3
	V registers	3-4
	V register reservations	3-5
	Vector control registers	3-6
	VL register	3-6
	VM register	3-6
	S registers	3-7
	T registers	3-8
	A registers	3-8
	B registers	3-9
	FUNCTIONAL UNITS	3-10
	Address functional units	3-11
	Address add unit	3-11
	Address multiply unit	3-11
	Scalar functional units	3-12
	Scalar add unit	3-12
	Scalar shift unit	3-12
	Scalar logical unit	3-13
	Population/leading zero count unit	3-13
	Vector functional units	3-13
	Vector functional,unit reservation	3-13
	Recursive characteristic of vector functional units	3-14
	Vector add unit	3-17
	Vector shift unit	3-17
	Vector logical unit	3-17
	Floating point functional units	3-17
	Floating point add unit	3-18
	Floating point multiply unit	3-18
	Reciprocal approximation unit	3-18
	ARITHMETIC OPERATIONS	3-19
	Integer arithmetic	3-19
	Floating point arithmetic	3-20
	Normalized floating point	3-20
	Floating point range errors	3-21
	Floating point add unit	3-21
	Floating point multiply unit	3-22
	Floating point reciprocal approximation unit	3-22
	Double precision numbers	3-23
	Addition algorithm	3-23
	Multiplication algorithm	3-24
	Division algorithm	3-28
	LOGICAL OPERATIONS	3-29
	INSTRUCTION ISSUE AND CONTROL	3-30
	P register	3-30
	CIP register	3-31
	NIP register	3-31
	LIP register	3-32
	Instruction buffers	3-32
	EXCHANGE MECHANISM	3-35
	XA register	3-35
	M register	3-35
	F register	3-36
	Exchange package	3-36
	Active exchange package	3-39
	Exchange sequence	3-39
	Initiated by dead start sequence	3-40
	Initiated by interrupt flag set	3-40
	Initiated by program exit	3-40
	Exchange sequence issue conditions	3-41
	Exchange package management	3-42
	MEMORY FIELD PROTECTION	3-43
	BA register	3-44
	LA register	3-44
	DEAD START SEQUENCE	3-44
4.	INSTRUCTIONS	4-1
	INSTRUCTION FORMAT	4-1
	Arithmetic, logical format	4-1
	Shift, mask format	4-2
	Immediate constant format	4-2
	Memory transfer format	4-3
	Branch format	4-4
	SPECIAL REGISTER VALUES	4-5
	INSTRUCTION ISSUE	4-5
	INSTRUCTION DESCRIPTIONS	4-6
	000000 Error exit	4-7
	001i jk Monitor functions	4-8
	0020xk Transmi t (AK) to VL	4-10 2240004 v C
	0021xx Set the floating point mode flag in the M register	4-11
	0022xx Clear the floating point mode flag in the M register	4-11
	003xjx Transmit (Sj) to vector mask	4-12
	004xxx Normal exit	4-13
	005xjk Branch to (Bjk)	4-14
	006ijkm Branch to ijkm	4-15
	007ijkm Return jump to ijkm; set Boo to (P)	4-16
	010ijkm Branch to ijkm if (Ao) = 0	4-17
	011ijkm Branch to ijkm if (Ao) 0	4-17
	012ijkm Branch to ijkm if (A0) positive	4-17
	013ijkm Branch to ijkm if (Ao) negative	4-17
	014ijkm Branch to ijkm if (So) = 0	4-18
	015ijkm Branch to ijkm if (So) 0	4-18
	016ijkm Branch to ijkm if (So) positive	4-18
	017ijkm Branch to ijkm if (So) negative	4-18
	0201jkm Transmit j km to Ai	4-19
	021ijkm Transmit complement of jkm to Ai	4-19
	022ijk Transmit j k to Ai	4-20
	023ijx Transmit (Sj) to Ai TD>4-21
	024ijk Transmit (Bjk) to Ai	4-22
	025ijk Transmit (Ai) to Bjk	4-22
	026ijx Population count of (Sj) to Ai	4-23
	027ijx Leading zero count of (Sj) to Ai	4-24
	030ijk Integer sum of (Aj) and (Ak) to Ai	4-25
	031ijk Integer difference (Aj) and (Ak) to Ai	4-25
	032ijk Integer product of (Aj) and (Ak) to Ai	4-26
	033ijk Transmit I/0 status to Ai	4-27
	034ijk Block transfer (Ai) words from memory starting at address (Ao) to B register starting at register jk	4-29
	035ijk Block transfer (Ai) words from B registers starting at register jk to memory starting at address (Ao)	4-29
	036ijk Block transfer (Ai) words from memory starting at address (Ao) to T registers starting at register jk	4-29
	037ijk Block transfer (Ai) words from T registers starting at register jk to memory starting at address (Ao)	4-29 2240004 vi C
	040ijkm Transmit jkm to Si	4-31
	041ijkm Transmit complement of jkm to Si	4-31
	042ijk Form 64-jk bits of one's mask in Si from right	4-32
	043ijk Form jk bits of one's mask in Si from left	4-32
	044ijk Logical product of (Sj) and (Sk) to Si	4-33
	045ijk Logical product of (Sj) and complement of Sk to Si	4-33
	046ijk Logical difference of (Sj) and (Sk) to Si	4-33
	047ijk Logical difference of (Sk) and complement of Sk) to Si	4-33
	050i j k Scalar merge	4-33
	051ijk Logical sum of (Sj) and (Sk) to Si	4-33
	052ijk Shift (Si) left jk places to So	4-36
	053ijk Shift (Si) right 64-jk places to So	4-36
	054ijk Shift (Si) left jk places to Si	4-36
	055ijk Shift (Si) right 64-jk places to Si	4-36
	056ijk Shift (Si) and (Sj) left by (Sk) places to Si	4-37
	057ijk Shift (Sj) and (Si) right by (Ak) places to Si	4-37
	060ijk Integer sum of (Sj) and (Sk) to Si	4-38
	061ijk Integer difference of (Sj) and (Sk) to Si	4-38
	062ijk Floating sum of (Sj) and (Sk) to Si	4-39
	063ijk Floating difference of (Sj) and (Sk) to Si	4-39
	064ijk Floating product of (Sj) and (Sk) to Si	4-40
	065ijk Half-precision rounded floating product of (Sj) and (Sk) to Si	4-40
	066ijk Rounded floating product of (Sj) and (Sk) to Si	4-40
	067ijk Reciprocal iteration; 2-(Sj)*(Sk) to Si	4-40
	070ijx Floating reciprocal approximation of (Sj) to Si	4-42
	071ijk Transmit (Ak) or normalized floating point constant to Si	4-43
	072ixx Transmit (RTC) to Si	4-45
	073ixx Transmit (VM) to Si	4-45
	074ijk Transmit (Tjk) to Si	4-45
	075ijk Transmit (Si) to Tjk	4-45
	076ijk Transmit (Vj element (Ak)) to Si	4-46
	077ijk Transmit (SP to Vi element (Ak)	4-46 2240004 vii C
	10hijkm Read from ((Ah) + jkm) to Ai	4-47
	11hijkm Store (Ai) to (Ah) + jkm	4-47
	12hijkm Read from ((Ah) + jkm) to Si	4-47
	13hijkm Store (Si) to (Ah) + jkm	4-47
	140ijk Logical products of (Sj) and (Vk elements) to Vi elements	4-49
	141ijk Logical products of (Vj elements) and (Vk elements to Vi elements	4-49
	142ijk Logical sums of (Sj) and (Vk elements) to Vi elements	4-49
	143ijk Logical sums of (Vj elements) and (Vk elements) to Vi elements	4-49
	144ijk Logical differences of (Sj) and (Vk elements) to Vi elements	4-49
	145ijk Logical differences of (Vj elements) and (Vk elements) to Vi elements	4-49
	146ijk If VM bit = 1, transmit (Sj) to Vi elements If VM bit 1, transmit (Vk elements) to Vi elements	4-49
	147ijk If VM bit = 1, transmit (Vj elements) to Vi elements If VM bit 1, transmit (Vk elements) to Vi elements	4-49
	150ijk Single shift of (Vj elements) left by (Ak) places to Vi elements	4-53
	151ijk Single shift of (Vi elements) right by (Ak) places to Vi elements	4-53
	152ijk Double shifts of (Vj elements) left (Ak) places to Vi elements	4-54
	153ijk Double shifts of (Vj elements) right (Ak) places to Vi elements	4-54
	154ijk Integer sums (Sj) and (Vk elements) to Vi elements 4-59
	155ijk Integer sums (Vj elements) and (Vk elements) to Vi elements	4-59
	156ijk Integer differences of (Sj) and (Vk elements) to Vi elements	4-59
	157ijk Integer differences of (Vj elements) and (Vk elements) to Vi elements	4-59 2240004 viii
	APPENDIXES
	A TIMING SUMMARY	A-1
	B MODULE TYPES	B-1
	C SOFTWARE CONSIDERATIONS	C-1
	D INSTRUCTION SUMMARY	D-1
	FIGURES
	1-1 Basic computer system	1-2
	2-1 Physical organization of the mainframe	2-2
	2-2 General chassis layout	2-3
	2-3 Clock pulse waveform	2-7
	3-1 Computation section	3-2
	3-2 Integer data formats	3-19
	3-3 Floating point data format	3-20
	3-4 49-bit floating point addition	3-23
	3-5 Floating point multiply pyramid	3-25
	3-6 Relationship of instruction buffers and registers	3-30
	3-7 Instruction buffers	3-33
	3-8 Exchange package	3-37
	4-1 General format for instructions	4-1
	4-2 Format for arithmetic and logical instructions	4-2
	4-3 Format for shift and mask instructions	4-2
	4-4 Format for immediate constant instructions	4-3
	4-5 Format for memory transfer instructions	4-4
	4-6 Two-parcel format for branch instructions	4-4
	5-1 Memory organization	5-2
	5-2 Memory address	5-3
	6-1 Channel I/0 control	6-2
	TABLES
	1-1 Characteristics of CRAY-1 Computer System	1-3
	2-1 Characteristics of a DD-19 Disk Storage Unit	2-13

IC case temperature at center of module	130^o F (54^o C)
IC case temperature at edge of module	118^o F (48^o C)
Cold plate temperature at wedge	78^o F (25^o C)
Cold bar temperature	70^o F (21^o C)
Refrigerant tube temperature	70^o F (21^o C)

Bit 36	Correctable memory error mode flag. When this bit is set, interrupts on correctable errors are enabled.
Bit 37	Floating point error mode flag. When this bit is set, interrupts on floating point errors are enabled.
Bit 38	Uncorrectable memory error mode flag. When this bit is set, interrupts on uncorrectable memory errors are enabled.
Bit 39	Monitor mode flag. When this bit is set, all interrupts other than memory errors are inhibited.

CRAY-1 COMPUTER SYSTEM®

HARDWARE REFERENCE MANUAL 2240004

SECTION 1 INTRODUCTION