Return to "Visible Storage"

*** Please note: This website (comp-hist) was completed before I found out about Wikipedia in 2002.
Since then I have added material occasionally.
Items are certainly not complete, and may be inaccurate.
Your information, comments, corrections, etc. are eagerly requested.
Send e-mail to Ed Thelen. Please include the URL under discussion. Thank you ***

Cray Research

Manufacturer Cray Research
Identification,ID Cray-1A, Cray 1M/4400, Cray-2
Date of first manufacture1976, 1978, 1985
Number produced Cray-1 - 85 - (a dead link - http://www.dg.com/about/html/cray-1.html)
Estimated price or cost-
location in museum -
donor Lawrence Livermore Laboratory, Cray Research, Lawrence Berkeley Lab

Contents of this page:

Photo
Cray 1A, Cray 2, Cray YMP-EL
click to enlarge

Placard

Architecture

Special features
  • A vector machine, with 8 "vector registers" (a vector register is 64 words, each word is 64 bits)
  • You can load a vector register with one instruction. Loading takes anywhere from 1-8 cycles.
  • You can multiply or add(subtract) one vector register by another vector register giving a third vector register.
  • There is one product and sum produced each clock - 12 nanoseconds in a Cray 1
  • The machine was also very fast in scalar operations - badly beating many competing machines in this also important function. It was about twice as fast as the 7600 in scalar (normal - not vector) operations. 64 scalar registers available for easier code sequencing and optimizing.
  • Pipelines could be linked - you could have the multiply pipeline connect to the add pipeline for the relatively common
    (vector A times vector B) plus vector C giving vector D
  • All the above concurrently, a floating point result every 12 nanoseconds
  • The above gives the sometimes quoted speed of 160 million floating point operations (Mflops) per second
  • And of course while the above computing is going on, a write is sending a previous 64 word "vector" to memory, or filling a 64 word vector with new data from memory.
  • 1 millions words of Error Correcting main memory was an available option
  • Used ECL (Emitter Coupled Logic) - very fast, very power hungry
  • There was no divide instruction! If you needed to divide, you performed a "reciprical approximation" ( makes 1/value ) then multiplied by the reciprical. (Same thing - faster in hardware - a fast divide is the bane of computer designers.)
  • Gorden Bell says the Cray 1 "is remarkably similar to the 6600... and extended for vectors"

Special features - Cray 2
  • Production model had 4 processors
  • Ran in a tank of cooling liquid called fluorinert, the liquid that is used for heart transplant. This liquid was pumped through the Cray 2 and the heat of the modules was carried away by the liquid which was then cooled by refrigeration and recirculated.
  • This helped keep the system at a uniform, stable temperature and within the speed range of the semiconductors (their speed is temperature dependent)
  • To trouble shoot this system, you needed to drain the coolant. How to run it at a stable temperature for trouble shooting? - Turn the power on and run the machine for 2 milliseconds, then turn off the power for at least 1 second. Examine the results.

Cycle times from http://netlib2.cs.utk.edu/utk/lsi/pcwLSI/text/node9.html#SECTION00410000000000000000
Year of
Introduction
Model Name Cycle Time
in Nanoseconds
1976 CRAY 1 12.5
1982 CRAY X-MP 9.5
1985 CRAY 2 4.1*
1988 CRAY Y-MP 6.5
1992 CRAY Y-MP C-90 4.0
*Instructions could only be issued every other cycle,
so the effective cycle time is 8.2 nanoseconds

From Eugene Miya
Production modes only had 4 CPUs. Ours is the only 8 CPU ever made. We also have a single CPU.

The X-MP later became a 8.5 ns cycle.
The Y-MP was 6.3 ns and later 6.0 ns.

From Tera, acquired Cray Research assets from SGI April 2000 (SGI acquired Cray Research in August 1996)

Historical Notes
The first Cray-1® system was installed at Los Alamos National Laboratory in 1976 for $8.8 million. It boasted a world-record speed of 160 million floating-point operations per second (160 megaflops) and an 8 megabyte (1 million word) main memory. The Cray-1's architecture reflected its designer's penchant for bridging technical hurdles with revolutionary ideas. In order to increase the speed of this system, the Cray-1 had a unique "C" shape which enabled integrated circuits to be closer together. No wire in the system was more than four feet long. To handle the intense heat generated by the computer, Cray developed an innovative refrigeration system using Freon.

In order to concentrate his efforts on design, Cray left the CEO position in 1980 and became an independent contractor. As he worked on the follow-on to the Cray-1, another group within the company developed the first multiprocessor supercomputer, the Cray X-MP, which was introduced in 1982. The Cray-2 system appeared in 1985, providing a tenfold increase in performance over the Cray-1.

In 1988, Cray Research introduced the Cray Y-MP®, the world's first supercomputer to sustain over 1 gigaflop on many applications. Multiple 333 MFLOPS processors powered the system to a record sustained speed of 2.3 gigaflops.

Always a visionary, Seymour Cray had been exploring the use of gallium arsenide in creating a semiconductor faster than silicon. However, the costs and complexities of this material made it difficult for the company to support both the Cray 3 and the Cray C90ä development efforts. In 1989, Cray Research spun off the Cray 3 project into a separate company, Cray Computer Corporation, headed by Seymour Cray and based in Colorado Springs, Colorado. (Tragically, Seymour Cray died of injuries suffered in an auto accident in September, 1996 at the age of 71.

The 1990s brought a number of transforming events to Cray Research. The company continued its leadership in providing the most powerful supercomputers for production applications. The Cray C90 featured a new central processor with industry-leading sustained performance of 1 gigaflop. Using 16 of these powerful processors and 256 million words of central memory, the system boasted unrivaled total performance. The company also produced its first "minisupercomputer," the Cray XMS system, followed by the Cray Y-MP EL series and the subsequent Cray J90.

In 1993, Cray Research offered its first massively parallel processing (MPP) system, the Cray T3D supercomputer, and quickly captured MPP market leadership from early MPP companies such as Thinking Machines and MasPar. The Cray T3D proved to be exceptionally robust, reliable, sharable and easy-to-administer, compared with competing MPP systems.

Since its debut in 1995, the successor Cray T3E supercomputer has been the world's best selling MPP system. The Cray T3E-1200E system has the distinction of being the only supercomputer to ever sustain one teraflop (1 trillion calculations per second) on a real-world application. In November 1998, a joint scientific team from Oak Ridge National Laboratory, the National Energy Research Scientific Computing Center (NERSC), Pittsburgh Supercomputing Center and the University of Bristol (UK) ran a magnetic magnetism application at a sustained speed of 1.02 teraflops.

In another technological landmark, the Cray T90 became the world's first wireless supercomputer when it was unveiled in 1994. Also introduced that year, the Cray J90 series has since become the world's most popular supercomputer, with over 400 systems sold.

Cray Research merged with SGI (Silicon Graphics, Inc.) in February 1996. In August 1999, SGI created a separate Cray Research business unit to focus exclusively on the unique requirements of high-end supercomputing customers. Assets of this business unit were sold to Tera Computer Company in March 2000.


From Yahoo, news wire info
{SGI paid $760 million for Cray Research in 1996}
{In April 2000, Cray (was Tera) paid SGI $58 million for the remnants of Cray Research, SGI lost over 92 percent on their "investment" in 3.5 years. The SGI Cray T3E is based on the Dec Alpha chip. The Cray C090 and T90 were the last of the Cray style vector processing from Cray Research/SGI. }

from http://www.cs.uiuc.edu/whatsnew/newsletter/fall98/chen.html
After earning his MS in 1972, Chen came to Illinois to work with Professor Dave Kuck and graduate student Duncan Lawrie, who were championing the new concept of parallelism in the ILLIAC IV project.
After a year at Floating Point Systems, Chen joined Cray Research as its chief designer, where he led the development of the world’s most commercially successful parallel vector supercomputers, the Cray X-MP, and its successor the Cray Y-MP. Chen began by making some architectural changes to the Cray-1, which was introduced in 1971. In the Cray X-MP (Chen said that the "X" stood for "extraordinary"), Chen introduced shared-memory multiprocessing to vector supercomputing. The machine contained two pipelined processors compatible with the Cray-1 and shared memory. The X-MP series was expanded to include 1- and 4-processor machines. The X-MP4 was the first supercomputer installed at the National Center for Supercomputing Applications (NCSA) at Illinois (summer 1985).
The first of the Y-MP series, Cray’s new multiprocessor vector supercomputer introduced in 1988, contained 1 processor, followed by 8, and then 16. All these machines shared essentially the same architecture, and the majority were designed by Chen and his team. Cray Research enjoyed tremendous growth from 1982-86 as its customer base expanded beyond government laboratories to commercial applications. This was the "heroic age" of the supercomputing industry.


http://wotug.ukc.ac.uk/parallel/documents/misc/timeline/timeline.txt

========1972========

Seymour Cray leaves Control Data Corporation, founds Cray Research
Inc.  (GVW: CDC, CRI)

This Artifact
-

Interesting Web Sites

Other information, details
How to boot an FPGA simulation? Jan 2012
from http://www.archive.org/details/Cos1.17DiskImageForCray-1x-mp

COS 1.17 disk image for Cray-1/X-MP

This disk contains a backup of the last remaining copy of the Cray Operating System, the original operating system for the Cray-1 and Cray X-MP supercomputer line. The disk likely contains a binary image of COS 1.17 (the last revision of of the OS), as well as other unknown software, and initially belonged to a single-processor Cray X-MP machine. The disk was written in 1989. The machine's I/O Subsystem (IOS) would have likely booted from this disk, and then loaded the OS into main CPU memory. This was recovered from a CDC 9877 80 Megabyte disk pack, formatted with 5 heads/cylinder, 808 tracks, 512B/sector, 32 Sectors/track (~64 Megabytes after formatting).

We need help deciphering the file system of the image in order to bring us closer to finally being able to boot our FPGA-based Cray-1 (http://chrisfenton.com/homebrew-cray-1a/). If you make any progress / are interested in helping, contact Chris Fenton (christopher.h.fenton@gmail.com) and help us bring back the Cray-1!

A Case Study: The Cray 1 and Family

The Cray 1 was first delivered in 1976. This was around the same time that 8-bit microprocessors were beginning to gain popularity, typical memory components were 1K bit SRAM and 4 K bit DRAM. Most machines were operating at about a 1 MHz clock rate, had 32-bit words, and large mainframes had 1 MB to 8 MB of RAM.

The Cray 1 had (Baron and Higbie CS manual)

  • 64-bit words
  • 8 MB of RAM
  • 16-way interleaving on low-order bits
  • 50 ns memory cycle
  • 12.5 ns clock cycle (80 MHz)
  • 12 pipelined functional units

The Cray 1 has 3 basic data types: addesses (24-bit integer), integers (64-bit), floating point (64-bit, 48-bit mantissa).

The 12 functional units are divided into four groups.

Group 1 -- Vector units

Vector (integer) Add: 3 stages
Vector Logical: 2 stages
Vector Shift: 4 stages

Group 2 -- Vector and scalar units

Floating Add: 6 stages
Floating Multiply: 7 stages
Floating Reciprocal Approximation: 14 stages

Group 3 -- Scalar units

Integer Add: 3 stages
Logical: 1 stage
Shift: 2 stages
Scalar population count and leading zero count: 3 stages

Group 4 -- Address units

Add: 2 stages
Multiply: 6 stages

The machine itself is divided into six major subsystems

  • Memory
  • Instruction component
  • Address component
  • Scalar component
  • Vector component
  • I/O component
  • Instruction Component

Cray 1 instructions are 32 or 16 bits, so from 2 to 4 instructions can be packed into a word. Instructions are thus addressed on 16-bit boundaries while data is addressed on 64-bit boundaries.

The instruction unit has four 16-word instruction buffers, three instruction registers, and one instruction counter. Each 16-bit field in a word is called an instruction parcel.

The three instruction registers are

  • Next Instruction Parcel -- holds first parcel of the next instruction, prefetched from buffer
  • Current Instruction Parcel -- holds the high-order portion of the instruction to be issued
  • Lower Instruction Parcel -- holds low-order portion of instruction to be issued

For a 32-bit instruction, the low-order portion is fetched to the NIP and then moved to the LIP. There is no mechanism for discarding instructions in the pipe -- once in the CIP/LIP, they will be issued. At most they will be delayed for some time.

The instruction buffers are tied to the memory via the 16-way interleaving, so it is possible to fill a buffer in 4 clock cycles (recall that the clock is 12.5 ns and memory is 50 ns). Buffers are filled on a demand basis in a round-robin pattern. They thus act as an instruction cache of 256 instructions, organized into four lines of 64 instructions. Each buffer has its own address comparator, so we would call this a fully associative cache (easy to implement when there are only 4 lines). The buffers cannot be written to -- a write bypasses the instruction cache and only goes to main memory.

Scalar instruction issue requires that all of the instruction's required resources be free -- otherwise the instruction waits. Vector instruction issue in the Cray involves reserving functional units, including memory, operand registers and result registers, and then releasing an instruction once all of its resources are available. In addition, some data paths are shared between the vector and scalar components, and these must be available.

The control unit is able to detect when a result register for one vector operation is an operand for another vector operation and, if the two vector instructions do not conflict in any other resource requirements, it sets up a vector chaining operation between the two instructions.

Address Component

There are 8 24-bit address registers, 64 24-bit spill registers, an adder, and a multiplier in this component. Its purpose is to perform index arithmetic and send the results to the scalar and vector components so that they can fetch the appropriate operands.

Arithmetic is performed on the address registers directly. The spill registers are used to hold address values that do not fit into the address registers. A set of 8 addresses can be transferred between the address registers and their spill registers in a single cycle. Thus, they bear a certain similarity to the register windows of the SPARC (or vice versa). The spill registers can be thought of as an explicitly managed data cache with 8 lines. Their value is that they reduce the traffic to main memory, freeing that resource for vector operations.

Scalar Component

Similar to the address component, the scalar component has 8 64-bit registers and 64 64-bit spill registers. It has sole access to four functional units: Integer Add, Logical, Shift, and Population Count. The Scalar Component also has access to three functional units that are shared with the Vector Component: Floating Add, Multiply, and Reciprocal Approximation.

Because the scalar component has its own integer units, it can always execute integer operations in parallel with a vector operation. However, for floating point, the vector unit takes priority.

Vector Component

The are 8 64-word vector registers in the vector component. It takes four memory loads to fill a vector register. Normally, this would require 16 instruction cycles. However, careful pipelining in the memory unit reduces the time to just 11 cycles.

A vector mask register contains a bit-map of the elements in a register operand that will participate in an instruction. A vector length register determines whether fewer than 64 operands are contained in a set of vector operands. Manipulating these values is the primary reason for the population and leading zeros counter.

Vector loads and stores specify the first location, the length, and the stride.

I/O Component

The I/O component has 24 programmable I/O channel units. I/O has the lowest priority for memory access.


Cray X-MP

  • Extended the Cray-1 architecture to 4-way multiprocessing.
  • Cycle reduced to 8.5 ns (117 MHz)
  • Increased instruction buffers to 32 words
  • Added a multiport memory system.
  • Redesigned the vector unit to support arbitrary chaining.
  • Added Gather/Scatter to support sparse arrays.
  • Increased memory to 16 M words, 32-way interleave
  • Provides a set of shared registers to support fine-grained (loop-level) multiprocessing. There are N+1 sets of these registers for an N-processor system. They include eight address registers, 8 scalar registers, and 32 binary semaphores.
  • The I/O system was improved and a solid state disk cache was added.


Cray Y-MP

  • Extends the X-MP architecture to 8 processors.
  • Cycle reduced to 6 ns (166 MHz)
  • Extends memory to 128 M words


Cray 2

  • One foreground and four background processors.
  • 4.1 ns cycle (244 MHz)
  • Up to 256 M words of memory
  • 64 or 128 way interleave depending on configuration
  • Eliminates the spill registers in favor of a 16K word cache
  • Cache feeds all three computational components with 4-cycle access time
  • Has 8 16-word instruction buffers
  • Foreground processor controls the I/O subsystem, which has up to 4 high speed ] communication channels (4 Gb/s).


Practical Considerations in Supercomputer Design

To achieve such high speeds, high-power (i.e. hot) drivers are employed, signals are detected with specialized analog circuits, conductors are all shielded and precisely tuned in both impedance and length, and data is encoded with error-correcting so that losses can be recovered.

In addition, the circuits are usually designed to operate in balanced mode so that there is no change in power drawn as drivers switch. As one driver switches from low to high, another switches from high to low, so that the power supply sees a DC load and there is no coupling of switching noise back into the logic via the power supply. In addition, using balanced signal lines can increase the signal to noise ratio by 6dB, although these are not often used. In a design such as the Cray-1, roughly 40% of the transistors supposedly do nothing but balance the power loading.

Even so, these machines dissipate large amounts of heat. The IBM 3090 uses special thermal conduction modules in which a multichip substrate is mounted in a carrier with built-in plumbing for a chilled water jacket. CDC used a similar system in its designs, and on one instance a maintenance crew pumped live steam through the building air conditioning system, which crossed over to the processor, with predictable results. This raises the issue that these machines usually need thermal shut-down systems, and possibly even fire suppression gear.

The Cray-1 series uses piped freon, and each board has a copper sheet to conduct heat to the edges of the cage, where freon lines draw it away. The first Cray-1 was in fact delayed six months due to problems in the cooling system: lubricant that is normally mixed with the freon to keep the compressor running would leak through the seals as a mist and eventually coat the boards with oil until they shorted out.

The Cray-2 is unique in that it uses a liquid bath to cool the processor boards. A special nonconductive liquid (flourinert) is pumped through the system and the chips are immersed in this.

Special fountains aerate the liquid, and reservoirs are provided for storing the liquid when it is pumped out for service. This is somewhat remeniscent of the oil cooling bath that was sometimes used in magnetic core memory units.

As a final note, Lawrence Livermore National Labs has announced that it will henceforth buy no more vector supercomputers. The handwriting is clearly on the wall for this breed of system, and all of the major manufacturers are moving, finally, to parallel processing.

Cooling
"The last word"
3M says:
   - Cray 2 was cooled by FC-74,  product now obsoleted -
   - Called 800-833-5045, asked for "Thermal Management"
   - Lou Tousignant -3M - 1-651-736-5242 handled Cray cooling,
         - he says that cooling was definitely convective, not boiling
         -  says product boiled at 97oC.
         -  he says further that most of the bubbles you see
           are actually air that desolved when the Fluorinert
           was cool and exposed to air.  When the Fluorinert
           warms, it can desolve less air (like water) and
           the air appears as bubbles.
         - any steady bubbling was definitely a trouble.
         - there are occasionally pockets of air at the top
           of the machine - no problem as no heat released there.
   Lou also said that if you look down toward the bottom
   you will see a 6 x 9 inch LED panel that tells which
   memory bank is being accessed.

>Question for the day - from Ed Thelen
>
>There is a question about the use of Fluorinert in a CRAY 2
>
>Some say it was used as a non-boiling coolant as in an auto engine
>Some say it was used as a boiling coolant to fix a temperature
>        as in boiling water (100 degrees C at standard conditions).
>
>And to further complicate the question - it turns out that Fluorinert
>is a trade name for a family of chemicals   :-((
from Terry Greyzck
I was an on-site analyst for Cray, primarily software. I was responsible for porting most of the Cray compiler/etc. products to CTSS/LTSS/NLTSS. I'm still with Cray (now Cray Inc.) as manager of compiler optimization and code generation (and loaders, and ...).

The Cray-2 did not boil fluorinert.


Actually, the primary Cray-2 I worked with is in the Supercomputer Museum in Chippewa Falls, Wisconsin. The one in your photograph is the one-and-only 8-processor Cray-2 produced; it's a bit taller than the standard 4-processor model. The Fluorinert was chosen in part because of its heat properties, but primarily because it is non-conductive. The fluid was run directly over the circuit boards, referred to as "direct-immersion cooling", at about a rate of 1 inch per second (pretty fast, really).

The Cray-3 ran at about 10 inches per second, fast enough that they had to worry about erosion. The Cray T90 also uses direct immersion cooling.

And it better not boil. It's just there for heat transfer. Occasionally you would see a bubble or two from a hot spot, which was no big deal.

Fluorinert can also be used as artificial plasma, but it's way too expensive for that purpose. It was also used in the movie "The Abyss" in the mouse-in-the-cup demonstration. It's a versatile liquid.

Off hand, I don't know which type of Fluorinert was used. I do know that if you heat it up enough (around 500 degrees) one of the decomposition products was phosgene - which is why a big vent hood can be found over Cray-2 installations in case of fire, to vent the phosgene outside. Aside from that, it's about the most harmless stuff around.

One final note - the vent hood at NMFECC (now NERSC) led from the Cray-2 to the fire alarm assembly area outside the building. Hmm... poor planning. :)

-- Terry Greyzck

from Eugene Miya
Basically convection, but near boiling point (optimal). First, I'm software, so my knowledge of this is from what I heard from the hardware guys.

The 2 was convection and circulation but evaporation was considered in the earliest phases. The 2 was not the first immersion machine. I think maybe the STRETCH used oil, I have those notes some place on an FAQ somewhere. Boiling would take too long. I heard the temps: the Fluorinert came in at 80F at the bottom near power supplies and exited the top of the 2 around 90F. One critical problem was the formation of bubbles. Too big a bubble could create heating problems, so bubbles could not get too big.

I can't remember 55 gal, but certainly little 4 gallon ones.

We use FC-74 weight Fluorinert. There are other fluids which do similar work and in some cases better and safer. Apparently the electronics of the B-2 use some form of immersion cooling with a different fluid developed by Standard oil. This fluid could not develop corrosive or toxic gases because it was used in flight with a crew.

> > >I wonder if our plastic Flourinert can/tank is authentic - and will give us > > >a clue??

Oh it is.
I got one from the NAS at work from Alex Woo, since retired on his wife's VC money. They are popular was watering cans for gardens, so I had to promise Alex to get me one for the Museum. It was used for the C-90.


>Items for reference
>http://www.chemicals-technology.com/contractors/heattransfer/3m/3m3.html

FC-74 is not mentioned on this chart. 


>also
>http://www.ultrasonic.com/tables/liquids_bottom.htm
>

>
>This says that the CRAY III used convective mode Fluorinert
>http://archives.e-insite.net/archives/ednmag/reg/1994/021794/04df5.htm#figa
>
>  Ed Thelen - volunteer at the Computer History Museum, MountainView, CA


If you have comments or suggestions, Send e-mail to Ed Thelen

Go to Antique Computer home page
Go to Visual Storage page
Go to top

Updated Feb, 2017