Further Reading

Chapter 1


A plea to change the nature of computer components so that they use much less power when lightly utilized.


Two videotapes on the history of computing, produced by Gordon and Gwen Bell, including the following machines and their inventors: Harvard Mark-I, ENIAC, EDSAC, IAS machine, and many others.


A classic paper explaining computer hardware and software before the first stored-program computer was built. We quote extensively from it in Chapter 3. It simultaneously explained computers to the world and was a source of controversy because the first draft did not give credit to Eckert and Mauchly.


Two historians chronicle the dramatic story. The New York Times calls it well written and authoritative.


Describes the first major synthetic benchmark, Whetstone, and how it was created.


Describes some of the underlying principles in using different means to summarize performance results.

A personal view of computing by one of the pioneers who worked with von Neumann.


An overview of the parallel computing challenge written for the layman.


Section 1.5 goes into more detail on power, Section 1.6 contains much more detail on the cost of integrated circuits and explains the reasons for the difference between price and cost, and Section 1.8 gives more details on evaluating performance.


*These two papers describe the software and hardware of the landmark Alto.*


A collection of essays that describe the people, software, computers, and laboratories involved in the first experimental and commercial computers. Most of the authors were personally involved in the projects. An excellent bibliography of early reports concludes this interesting book.


*These five one-hour programs include rare footage and interviews with pioneers of the computer industry.*


*Short biographies of 31 computer pioneers.*


*A historian’s perspective on Atanasoff versus Eckert and Mauchly.*


*A personal view of computing by one of the pioneers.*

**Chapter 2**

Bayko, J. [1996]. “Great microprocessors of the past and present,” search for it on the www.jbayko.sasktelwebsite.net/cpu.html

*A personal view of the history of both representative and unusual microprocessors, from the Intel 4004 to the Patriot Scientific ShBoom!*

This book describes the MIPS architecture in greater detail than Appendix A.


This book concentrates on the VAX, but also includes descriptions of the Intel 8086, IBM 360, and CDC 6600.


The architecture history of the Intel from the 4004 to the 8086, according to the people who participated in the designs.


**Chapter 3**

If you are interested in learning more about floating point, two publications by David Goldberg [1991, 2002] are good starting points; they abound with pointers to further reading. Several of the stories told on the CD come from Kahan [1972, 1983]. The latest word on the state of the art in computer arithmetic is often found in the *Proceedings* of the latest IEEE-sponsored Symposium on Computer Arithmetic, held every two years; the 16th was held in 2003.


This classic paper includes arguments against floating-point hardware.


A more advanced introduction to integer and floating-point arithmetic, with emphasis on hardware. It covers Sections 3.4–3.6 of this book in just 10 pages, leaving another 45 pages for advanced topics.


Another good introduction to floating-point arithmetic by the same author, this time with emphasis on software.

This survey is a source of stories on the importance of accurate arithmetic.


The title refers to silicon and is another source of stories illustrating the importance of accurate arithmetic.


What the 8087 floating-point architecture could have been.


A collection of memos related to floating point, including “Beastly numbers” (another less famous Pentium bug), “Notes on the IEEE floating point arithmetic” (including comments on how some features are atrophying), and “The baleful effects of computing benchmarks” (on the unhealthy preoccupation on speed versus correctness, accuracy, ease of use, flexibility, . . .).


A textbook aimed at seniors and first-year graduate students that explains fundamental principles of basic arithmetic, as well as complex operations such as logarithmic and trigonometric functions.


This computer pioneer’s recollections include the derivation of the standard hardware for multiply and divide developed by von Neumann.

**Chapter 4**


A quantitative comparison of RISC and CISC written by scholars who argued for CISCs as well as built them; they conclude that MIPS is between 2 and 4 times faster than a VAX built with similar technology, with a mean of 2.7.


This entire issue is devoted to the topic of exploiting ILP. It contains papers on both the architecture and software and is a wonderful source for further references.
Further Reading


Chapter 3 and Appendix C go into considerably more detail about pipelined processors (almost 200 pages), including superscalar processors and VLIW processors. Appendix G describes Itanium.


A comparison of deeply pipelined (also called superpipelined) and superscalar systems.


A formal text on pipelined control, with emphasis on underlying principles.


A short summary of a classic computer that uses vectors of operations to remove pipeline stalls.


An early survey on branch prediction.


Covers the difficulties in interrupting pipelined computers.


A classic book describing a classic computer, considered the first supercomputer.

Chapter 5


A reference paper of cache miss rates for many cache sizes for the SPEC2000 benchmarks.


A classic paper that describes the first commercial computer to use a cache and its resulting performance.

For more in-depth coverage of a variety of topics including protection, cache performance of out-of-order processors, virtually addressed caches, multilevel caches, compiler optimizations, additional latency tolerance mechanisms, and cache coherency.


This classic paper is the first proposal for virtual memory.


This paper shows the difference between complexity analysis of an algorithm, instruction count performance, and memory hierarchy for four sorting algorithms.


A widely used microbenchmark that measures the performance of the memory system behind the caches.


A thorough exploration of multilevel memory hierarchies and their performance.


The history of UNIX from one of its inventors.


A paper describing the most elegant operating system ever invented.


An operating systems textbook with a thorough discussion of virtual memory processes and process management, and protection issues.


The classic survey paper on caches. This paper defined the terminology for the field and has served as a reference for many computer designers.

*A popular book that explains the role of Xerox PARC in laying the foundation for today's computing, but which Xerox did not substantially benefit from.*


*An operating system textbook with a good discussion of virtual memory.*


*The first classic paper on caches.*

**Chapter 6**


*A textbook covering parallel computers.*


*Written in response to the claims of the Iliac IV, this three-page article describes Amdahl's law and gives the classic reply to arguments for abandoning the current form of computing.*


*A text that gives the principles of parallel programming.*


*Classic survey paper of shared-bus cache coherence protocols.*


*How a world record sort was performed on a cluster, including architecture critique of the workstation and network interface. By April 1, 1997, they pushed the record to 8.6 GB in 1 minute and 2.2 seconds to sort 100 MB.*

Asanovic, K., R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. [2006].
“The landscape of parallel computing research: A view from Berkeley,” Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley (December 18).

Nicknamed the “Berkeley View,” this report lays out the landscape of the multicore challenge.


Describes the NAS parallel benchmarks.


Distinguishes shared address and nonshared address multiprocessors based on micro processors.


Describes the PARSEC parallel benchmarks. Also see http://parsec.cs.princeton.edu/.


Presents the “Yahoo! Cloud Serving Benchmark” (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems.


A textbook on parallel computers.


The original document describing Linpack, which became a widely used parallel benchmark.


Classic article showing SISD/SIMD/MISD/MIMD classifications.

A more in-depth coverage of a variety of multiprocessor and cluster topics, including programs and measurements.


Gives the history of SPEC, including the use of SPECrate to measure performance on independent jobs, which is being used as a parallel benchmark.

Hord, R. M. [1982]. The Illiac-IV, the First Supercomputer, Computer Science Press, Rockville, MD.

A historical accounting of the Illiac IV project.


Another textbook covering parallel computers.


Examination of a vector architecture for the MIPS instruction set in media and signal processing.


Certainly the earliest reference on multiprocessors, this mathematician made this comment while translating papers on Babbage’s mechanical computer.


An entertaining book that advocates clusters and is critical of NUMA multiprocessors.


Describes the work of researchers at Intel Labs, who have experimented with alternative solutions that improve the server’s ability to process TCP/IP packets efficiently and at very high rates.
A tutorial article on a parallel processor connected via a hypertree. The Cosmic Cube is the ancestor of the Intel supercomputers.

Recollections of the beginnings of parallel processing by the architect of the Illiac I V.


Paper containing the results of the four multicores for LBMHD.


Paper containing the results of the four multicores for SPmV.


Dissertation containing the roofline model.


Paper describing the second version of the Stanford parallel benchmarks.

Appendix A

Slightly dated and lacking in coverage of modern architectures, but still the standard reference on compilers.

A complete, detailed, and engaging introduction to the MIPS instruction set and assembly language programming on these machines.

Detailed documentation on the MIPS-32 architecture is available on the Web:
MIPS32™ Architecture for Programmers Volume I: Introduction to the MIPS32™ Architecture (http://mips.com/content/Documentation/MIPSDocumentation/)
Further Reading

MIPS32™ Architecture for Programmers Volume II: The MIPS32™ Instruction Set
(http://mips.com/content/Documentation/MIPSDocumentation/ProcessorArchitecture/ArchitectureProgrammingPublicationsforMIPS32/MD00082-2B-MIPS32INT-AFP-02.00.pdf/getDownload)


Appendix B

There are a number of good texts on logic design. Here are some you might like to look into.


A thorough book on logic design using Verilog.


A unique and modern approach to digital design using VHDL and SystemVerilog.


A general text on logic design.


A general text on logic design.

Appendix C


Microsoft Corporation. [2003]. Microsoft DirectX 9 Programmable Graphics Pipeline, Microsoft Press, Redmond, WA.


Nguyen, H., ed. [2008]. GPU Gems 3, Addison-Wesley, Reading, MA.


Appendix E


Silicon Graphics [1996]. MIPS V Instruction Set. (See www.sgi.com/MIPS/arch/ISA5/#MIPSV_index.)


