Chapter 3 - IS MU

Chapter 3 - IS MU

Chapter 3 Arithmetic for Computers  Operations on integers     Addition and subtraction Multiplication and division Dealing with overflow §...

868KB Sizes 0 Downloads 10 Views

Recommend Documents

chapter 3 - IS MU
6 McIlvaine (1923), 'The life and works of Lewis H. Morgan', p. 56. 7 See especially William Stanton (1960), The Leopard

Edice - IS MU
Josef Maria Graf Aloisia Maria Anna Helene Anton Nikol. .... Wird für Rosalia Franziska de Paula Katharina Gräfin Logo

MASARYKOVA UNIVERZITA - IS MU
gesorgt”, sagte die auf Kinderrechte spezialisierte Professorin Lee Yang. Hee. - auf die ... Die Englischlehrerin Shin

filmseminar - IS MU
ANDREAS HOFER – DIE FREIHEIT DES ADLERS Ö/D 2002, R: Xaver Schwarzenberger, B: Felix Mitterer,110 Min. Eine Gruppe SÃ

diplomarbeit - IS MU
Ländern heute immer noch so stark gefördert wie zu Zeiten der DDR. ...... chinesischen Staatszirkus aus, sondern wie d

Pflichtlektüre - IS MU
Jutta Treiber: Solange die Zikaden schlafen. • Judith Kerr: Als Hitler das rosa Kaninchen stahl. • Jürgen Seidel: B

Jamaican Creole - IS MU
The soldiers who came to invade Jamaica in 1655 and stayed to settle the land were Englishmen recruited from Midlands, S

Einleitung - IS MU
Nachkriegsliteratur. Er ist ihr Klassiker.“ (Karl. Heinz Bohrer, 1967, FAZ, Nr. 246, 23. Oktober. 1967) ..... 18 Poser

the thing - IS MU
Mar 14, 2014 - portraits of one person may both be comparably truthful when they focus on the subject from different per

Chapter 3 Arithmetic for Computers



Operations on integers   



Addition and subtraction Multiplication and division Dealing with overflow

§3.1 Introduction

Arithmetic for Computers

Floating-point real numbers 

Representation and operations

Chapter 3 — Arithmetic for Computers — 2

§3.2 Addition and Subtraction

Integer Addition 

Example: 7 + 6



Overflow if result out of range  

Adding +ve and –ve operands, no overflow Adding two +ve operands 



Overflow if result sign is 1

Adding two –ve operands 

Overflow if result sign is 0 Chapter 3 — Arithmetic for Computers — 3

Integer Subtraction  

Add negation of second operand Example: 7 – 6 = 7 + (–6) +7: –6: +1:



0000 0000 … 0000 0111 1111 1111 … 1111 1010 0000 0000 … 0000 0001

Overflow if result out of range  

Subtracting two +ve or two –ve operands, no overflow Subtracting +ve from –ve operand 



Overflow if result sign is 0

Subtracting –ve from +ve operand 

Overflow if result sign is 1 Chapter 3 — Arithmetic for Computers — 4

Dealing with Overflow 

Some languages (e.g., C) ignore overflow 



Use MIPS addu, addui, subu instructions

Other languages (e.g., Ada, Fortran) require raising an exception  

Use MIPS add, addi, sub instructions On overflow, invoke exception handler 

 

Save PC in exception program counter (EPC) register Jump to predefined handler address mfc0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective action Chapter 3 — Arithmetic for Computers — 5

Arithmetic for Multimedia 

Graphics and media processing operates on vectors of 8-bit and 16-bit data 

Use 64-bit adder, with partitioned carry chain 





Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors

SIMD (single-instruction, multiple-data)

Saturating operations 

On overflow, result is largest representable value 



c.f. 2s-complement modulo arithmetic

E.g., clipping in audio, saturation in video Chapter 3 — Arithmetic for Computers — 6



Start with long-multiplication approach

§3.3 Multiplication

Multiplication multiplicand multiplier

product

1000 × 1001 1000 0000 0000 1000 1001000

Length of product is the sum of operand lengths

Chapter 3 — Arithmetic for Computers — 7

Multiplication Hardware

Initially 0

Chapter 3 — Arithmetic for Computers — 8

Optimized Multiplier 

Perform steps in parallel: add/shift



One cycle per partial-product addition 

That’s ok, if frequency of multiplications is low Chapter 3 — Arithmetic for Computers — 9

Faster Multiplier 

Uses multiple adders 



Cost/performance tradeoff

Can be pipelined 

Several multiplication performed in parallel Chapter 3 — Arithmetic for Computers — 10

MIPS Multiplication 

Two 32-bit registers for product  



HI: most-significant 32 bits LO: least-significant 32-bits

Instructions 

mult rs, rt 







multu rs, rt

64-bit product in HI/LO

mfhi rd 

/

/

mflo rd

Move from HI/LO to rd Can test HI value to see if product overflows 32 bits

mul rd, rs, rt 

Least-significant 32 bits of product –> rd

Chapter 3 — Arithmetic for Computers — 11

 

quotient

Check for 0 divisor Long division approach 

dividend

divisor

1001 1000 1001010 -1000 10 101 1010 -1000 10 remainder

n-bit operands yield n-bit quotient and remainder

If divisor ≤ dividend bits 



0 bit in quotient, bring down next dividend bit

Restoring division 



1 bit in quotient, subtract

Otherwise 



§3.4 Division

Division

Do the subtract, and if remainder goes < 0, add divisor back

Signed division  

Divide using absolute values Adjust sign of quotient and remainder as required

Chapter 3 — Arithmetic for Computers — 12

Division Hardware Initially divisor in left half

Initially dividend

Chapter 3 — Arithmetic for Computers — 13

Division Example Iteration 0 1

11 0010 0111 - 10 11 -10 1 n+1=4+1 steps

2

3

4

5

Step

Quotient

Divisor

Remainder

Initial values

0000

0010 0000

0000 0111

1: Rem = Rem - Div

0000

0010 0000

1110 0111

2b: Rem < 0 → +Div, sll Q, Q0 = 0

0000

0010 0000

0000 0111

3: Shift Div right

0000

0001 0000

0000 0111

1: Rem = Rem - Div

0000

0001 0000

1111 0111

2b: Rem < 0 → +Div, sll Q, Q0 = 0

0000

0001 0000

0000 0111

3: Shift Div right

0000

0000 1000

0000 0111

1: Rem = Rem - Div

0000

0000 1000

1111 1111

2b: Rem < 0 → +Div, sll Q, Q0 = 0

0000

0000 1000

0000 0111

3: Shift Div right

0000

0000 0100

0000 0111

1: Rem = Rem - Div

0000

0000 0100

0000 0011

2a: Rem ≥ 0 → sll Q, Q0 = 1

0001

0000 0100

0000 0011

3: Shift Div right

0001

0000 0010

0000 0011

1: Rem = Rem - Div

0001

0000 0010

0000 0001

2a: Rem ≥ 0 → sll Q, Q0 = 1

0011

0000 0010

0000 0001

3: Shift Div right

0011

0000 0001

0000 0001

Chapter 3 — Arithmetic for Computers — 14

Optimized Divider

 

One cycle per partial-remainder subtraction Looks a lot like a multiplier! 

Same hardware can be used for both Chapter 3 — Arithmetic for Computers — 15

Faster Division 

Can’t use parallel hardware as in multiplier 



Subtraction is conditional on sign of remainder

Faster dividers (e.g. SRT devision) generate multiple quotient bits per step 

Still require multiple steps

Chapter 3 — Arithmetic for Computers — 16

MIPS Division 

Use HI/LO registers for result  



HI: 32-bit remainder LO: 32-bit quotient

Instructions  

div rs, rt / divu rs, rt No overflow or divide-by-0 checking 



Software must perform checks if required

Use mfhi, mflo to access result

Chapter 3 — Arithmetic for Computers — 17



Representation for non-integral numbers 



Like scientific notation   



–2.34 × 1056 +0.002 × 10–4 +987.02 × 109

normalized not normalized

In binary 



Including very small and very large numbers

§3.5 Floating Point

Floating Point

±1.xxxxxxx2 × 2yyyy

Types float and double in C Chapter 3 — Arithmetic for Computers — 18

Floating Point Standard  

Defined by IEEE Std 754-1985 Developed in response to divergence of representations 

 

Portability issues for scientific code

Now almost universally adopted Two representations  

Single precision (32-bit) Double precision (64-bit)

Chapter 3 — Arithmetic for Computers — 19

IEEE Floating-Point Format single: 8 bits double: 11 bits

S Exponent

single: 23 bits double: 52 bits

Fraction

x = ( −1)S × (1+ Fraction) × 2(Exponent −Bias)  

S: sign bit (0 ⇒ non-negative, 1 ⇒ negative) Normalize significand: 1.0 ≤ |significand| < 2.0 





Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the “1.” restored

Exponent: excess representation: actual exponent + Bias  

Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1023

Chapter 3 — Arithmetic for Computers — 20

Single-Precision Range  

Exponents 00000000 and 11111111 reserved Smallest value 

 



Exponent: 00000001 ⇒ actual exponent = 1 – 127 = –126 Fraction: 000…00 ⇒ significand = 1.0 ±1.0 × 2–126 ≈ ±1.2 × 10–38

Largest value 

 

exponent: 11111110 ⇒ actual exponent = 254 – 127 = +127 Fraction: 111…11 ⇒ significand ≈ 2.0 ±2.0 × 2+127 ≈ ±3.4 × 10+38 Chapter 3 — Arithmetic for Computers — 21

Double-Precision Range  

Exponents 0000…00 and 1111…11 reserved Smallest value 

 



Exponent: 00000000001 ⇒ actual exponent = 1 – 1023 = –1022 Fraction: 000…00 ⇒ significand = 1.0 ±1.0 × 2–1022 ≈ ±2.2 × 10–308

Largest value 

 

Exponent: 11111111110 ⇒ actual exponent = 2046 – 1023 = +1023 Fraction: 111…11 ⇒ significand ≈ 2.0 ±2.0 × 2+1023 ≈ ±1.8 × 10+308 Chapter 3 — Arithmetic for Computers — 22

Floating-Point Precision 

Relative precision  

all fraction bits are significant Single: approx 2–23 



Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision

Double: approx 2–52 

Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision

Chapter 3 — Arithmetic for Computers — 23

Floating-Point Example 

Represent –0.75    

–0.75 = (–1)1 × 1.12 × 2–1 S=1 Fraction = 1000…002 Exponent = –1 + Bias  

 

Single: –1 + 127 = 126 = 011111102 Double: –1 + 1023 = 1022 = 011111111102

Single: 1011111101000…00 Double: 1011111111101000…00 Chapter 3 — Arithmetic for Computers — 24

Floating-Point Example 

What number is represented by the singleprecision float 11000000101000…00   



S=1 Fraction = 01000…002 Exponent = 100000012 = 129

x = (–1)1 × (1 + .012) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0

Chapter 3 — Arithmetic for Computers — 25

Denormal Numbers 

Exponent = 000...0 ⇒ hidden bit is 0 x = ( −1)S × (0 + Fraction) × 2−Bias



Smaller than normal numbers 



allow for gradual underflow, with diminishing precision

Denormal with fraction = 000...0 x = ( −1)S × (0 + 0) × 2−Bias = ±0.0 Two representations of 0.0! Chapter 3 — Arithmetic for Computers — 26

Infinities and NaNs 

Exponent = 111...1, Fraction = 000...0  



±Infinity Can be used in subsequent calculations, avoiding need for overflow check

Exponent = 111...1, Fraction ≠ 000...0  

Not-a-Number (NaN) Indicates illegal or undefined result 



e.g., 0.0 / 0.0

Can be used in subsequent calculations Chapter 3 — Arithmetic for Computers — 27

Floating-Point Addition 

Consider a 4-digit decimal example 



1. Align decimal points  



9.999 × 101 + 0.016 × 101 = 10.015 × 101

3. Normalize result & check for over/underflow 



Shift number with smaller exponent 9.999 × 101 + 0.016 × 101

2. Add significands 



9.999 × 101 + 1.610 × 10–1

1.0015 × 102

4. Round and renormalize if necessary 

1.002 × 102

Chapter 3 — Arithmetic for Computers — 28

Floating-Point Addition 

Now consider a 4-digit binary example 



1. Align binary points  



1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1

3. Normalize result & check for over/underflow 



Shift number with smaller exponent 1.0002 × 2–1 + –0.1112 × 2–1

2. Add significands 



1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)

1.0002 × 2–4, with no over/underflow

4. Round and renormalize if necessary 

1.0002 × 2–4 (no change) = 0.0625

Chapter 3 — Arithmetic for Computers — 29

FP Adder Hardware  

Much more complex than integer adder Doing it in one clock cycle would take too long  



Much longer than integer operations Slower clock would penalize all instructions

FP adder usually takes several cycles 

Can be pipelined

Chapter 3 — Arithmetic for Computers — 30

FP Adder Hardware

Step 1

Step 2

Step 3 Step 4

Chapter 3 — Arithmetic for Computers — 31

Floating-Point Multiplication 

Consider a 4-digit decimal example 



1. Add exponents  



1.0212 × 106

4. Round and renormalize if necessary 



1.110 × 9.200 = 10.212 ⇒ 10.212 × 105

3. Normalize result & check for over/underflow 



For biased exponents, subtract bias from sum New exponent = 10 + –5 = 5

2. Multiply significands 



1.110 × 1010 × 9.200 × 10–5

1.021 × 106

5. Determine sign of result from signs of operands 

+1.021 × 106

Chapter 3 — Arithmetic for Computers — 32

Floating-Point Multiplication 

Now consider a 4-digit binary example 



1. Add exponents  



1.1102 × 2–3 (no change) with no over/underflow

4. Round and renormalize if necessary 



1.0002 × 1.1102 = 1.1102 ⇒ 1.1102 × 2–3

3. Normalize result & check for over/underflow 



Unbiased: –1 + –2 = –3 Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127

2. Multiply significands 



1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)

1.1102 × 2–3 (no change)

5. Determine sign: +ve × –ve ⇒ –ve 

–1.1102 × 2–3 = –0.21875

Chapter 3 — Arithmetic for Computers — 33

FP Arithmetic Hardware 

FP multiplier is of similar complexity to FP adder 



FP arithmetic hardware usually does 





But uses a multiplier for significands instead of an adder Addition, subtraction, multiplication, division, reciprocal, square-root FP ↔ integer conversion

Operations usually takes several cycles 

Can be pipelined

Chapter 3 — Arithmetic for Computers — 34

FP Instructions in MIPS 

FP hardware is coprocessor 1 



Adjunct processor that extends the ISA

Separate FP registers  

32 single-precision: $f0, $f1, … $f31 Paired for double-precision: $f0/$f1, $f2/$f3, … 



FP instructions operate only on FP registers 





Release 2 of MIPs ISA supports 32 × 64-bit FP reg’s

Programs generally don’t do integer ops on FP data, or vice versa More registers with minimal code-size impact

FP load and store instructions 

lwc1, ldc1, swc1, sdc1 

e.g., ldc1 $f8, 32($sp) Chapter 3 — Arithmetic for Computers — 35

FP Instructions in MIPS 

Single-precision arithmetic 

add.s, sub.s, mul.s, div.s 



Double-precision arithmetic 

add.d, sub.d, mul.d, div.d 



e.g., mul.d $f4, $f4, $f6

Single- and double-precision comparison  

c.xx.s, c.xx.d (xx is eq, lt, le, …) Sets or clears FP condition-code bit 



e.g., add.s $f0, $f1, $f6

e.g. c.lt.s $f3, $f4

Branch on FP condition code true or false 

bc1t, bc1f 

e.g., bc1t TargetLabel Chapter 3 — Arithmetic for Computers — 36

FP Example: °F to °C 

C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); }  fahr in $f12, result in $f0, literals in global memory space



Compiled MIPS code: f2c: lwc1 lwc2 div.s lwc1 sub.s mul.s jr

$f16, $f18, $f16, $f18, $f18, $f0, $ra

const5($gp) const9($gp) $f16, $f18 const32($gp) $f12, $f18 $f16, $f18 Chapter 3 — Arithmetic for Computers — 37

FP Example: Array Multiplication 

X=X+Y×Z 



All 32 × 32 matrices, 64-bit double-precision elements

C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; }  Addresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2 Chapter 3 — Arithmetic for Computers — 38

FP Example: Array Multiplication 

MIPS code:

li li L1: li L2: li sll addu sll addu l.d L3: sll addu sll addu l.d …

$t1, 32 $s0, 0 $s1, 0 $s2, 0 $t2, $s0, 5 $t2, $t2, $s1 $t2, $t2, 3 $t2, $a0, $t2 $f4, 0($t2) $t0, $s2, 5 $t0, $t0, $s1 $t0, $t0, 3 $t0, $a2, $t0 $f16, 0($t0)

# # # # # # # # # # # # # #

$t1 = 32 (row size/loop end) i = 0; initialize 1st for loop j = 0; restart 2nd for loop k = 0; restart 3rd for loop $t2 = i * 32 (size of row of x) $t2 = i * size(row) + j $t2 = byte offset of [i][j] $t2 = byte address of x[i][j] $f4 = 8 bytes of x[i][j] $t0 = k * 32 (size of row of z) $t0 = k * size(row) + j $t0 = byte offset of [k][j] $t0 = byte address of z[k][j] $f16 = 8 bytes of z[k][j]

Chapter 3 — Arithmetic for Computers — 39

FP Example: Array Multiplication … sll $t0, $s0, 5 addu $t0, $t0, $s2 sll $t0, $t0, 3 addu $t0, $a1, $t0 l.d $f18, 0($t0) mul.d $f16, $f18, $f16 add.d $f4, $f4, $f16 addiu $s2, $s2, 1 bne $s2, $t1, L3 s.d $f4, 0($t2) addiu $s1, $s1, 1 bne $s1, $t1, L2 addiu $s0, $s0, 1 bne $s0, $t1, L1

# # # # # # # # # # # # # #

$t0 = i*32 (size of row of y) $t0 = i*size(row) + k $t0 = byte offset of [i][k] $t0 = byte address of y[i][k] $f18 = 8 bytes of y[i][k] $f16 = y[i][k] * z[k][j] f4=x[i][j] + y[i][k]*z[k][j] $k k + 1 if (k != 32) go to L3 x[i][j] = $f4 $j = j + 1 if (j != 32) go to L2 $i = i + 1 if (i != 32) go to L1

Chapter 3 — Arithmetic for Computers — 40

Accurate Arithmetic 

IEEE Std 754 specifies additional rounding control   



Not all FP units implement all options 



Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine-tune numerical behavior of a computation Most programming languages and FP libraries just use defaults

Trade-off between hardware complexity, performance, and market requirements

Chapter 3 — Arithmetic for Computers — 41

Interpretation of Data The BIG Picture 

Bits have no inherent meaning 



Interpretation depends on the instructions applied

Computer representations of numbers  

Finite range and precision Need to account for this in programs

Chapter 3 — Arithmetic for Computers — 42



Parallel programs may interleave operations in unexpected orders 

Assumptions of associativity may fail (x+y)+z

x+(y+z) -1.50E+38

x -1.50E+38 y 1.50E+38 0.00E+00 z 1.0 1.0 1.50E+38 1.00E+00 0.00E+00 

Need to validate parallel programs under varying degrees of parallelism

§3.6 Parallelism and Computer Arithmetic: Associativity

Associativity

Chapter 3 — Arithmetic for Computers — 43



Originally based on 8087 FP coprocessor   



FP values are 32-bit or 64 in memory  



8 × 80-bit extended-precision registers Used as a push-down stack Registers indexed from TOS: ST(0), ST(1), … Converted on load/store of memory operand Integer operands can also be converted on load/store

§3.7 Real Stuff: Floating Point in the x86

x86 FP Architecture

Very difficult to generate and optimize code 

Result: poor FP performance

Chapter 3 — Arithmetic for Computers — 44

x86 FP Instructions Data transfer

Arithmetic

Compare

Transcendental

FILD mem/ST(i) FISTP mem/ST(i) FLDPI FLD1 FLDZ

FIADDP FISUBRP FIMULP FIDIVRP FSQRT FABS FRNDINT

FICOMP FIUCOMP FSTSW AX/mem

FPATAN F2XMI FCOS FPTAN FPREM FPSIN FYL2X



mem/ST(i) mem/ST(i) mem/ST(i) mem/ST(i)

Optional variations    

I: integer operand P: pop operand from stack R: reverse operand order But not all combinations allowed Chapter 3 — Arithmetic for Computers — 45

Streaming SIMD Extension 2 (SSE2) 

Adds 4 × 128-bit registers 



Extended to 8 registers in AMD64/EM64T

Can be used for multiple FP operands   

2 × 64-bit double precision 4 × 32-bit single precision Instructions operate on them simultaneously 

Single-Instruction Multiple-Data

Chapter 3 — Arithmetic for Computers — 46





Left shift by i places multiplies an integer by 2i Right shift divides by 2i? 



Only for unsigned integers

§3.8 Fallacies and Pitfalls

Right Shift and Division

For signed integers  

Arithmetic right shift: replicate the sign bit e.g., –5 / 4  



111110112 >> 2 = 111111102 = –2 Rounds toward –∞

c.f. 111110112 >>> 2 = 001111102 = +62 Chapter 3 — Arithmetic for Computers — 47

Who Cares About FP Accuracy? 

Important for scientific code 

But for everyday consumer use? 



“My bank balance is out by 0.0002¢!” 

The Intel Pentium FDIV bug  

The market expects accuracy See Colwell, The Pentium Chronicles

Chapter 3 — Arithmetic for Computers — 48



ISAs support arithmetic  



Bounded range and precision 



Signed and unsigned integers Floating-point approximation to reals

§3.9 Concluding Remarks

Concluding Remarks

Operations can overflow and underflow

MIPS ISA 

Core instructions: 54 most frequently used 



100% of SPECINT, 97% of SPECFP

Other instructions: less frequent Chapter 3 — Arithmetic for Computers — 49

Exercises 

 

Answer the following exercises, and send your answers as a PDF attachment to the email address listed below [email protected] Leave body of the email blank Deadline is April 8th

Chapter 1 — Computer Abstractions and Technology — 50

Exercise 1 

Calculate the product of the octal unsigned 6-bit integers A = 50 and B = 23 using the hardware described below (adjust the register sizes). You should show the contents of each register on each step.

Chapter 1 — Computer Abstractions and Technology — 51

Exercise 2 

Calculate the product of the hexadecimal unsigned 8-bit integers A = 66 and B = 04 using the hardware described below (adjust the register sizes). You should show the contents of each register on each step.

Chapter 1 — Computer Abstractions and Technology — 52

Exercise 3 

Calculate A = 50 divided by B = 23 using the hardware described below. You should show the contents of each register on each step. Assume A and B are octal unsigned 6-bit integers (adjust the register sizes in the hardware).

Chapter 1 — Computer Abstractions and Technology — 53

Exercise 4 

Calculate A = 50 divided by B = 23 using the hardware described below. You should show the contents of each register on each step. Assume A and B are octal unsigned 6-bit integers (adjust the register sizes in the hardware).

Chapter 1 — Computer Abstractions and Technology — 54

Exercise 5 

What decimal number does the following bit pattern represent if it is a floating-point number? Use the IEEE 754 standard. 0xAFBF0000

Chapter 1 — Computer Abstractions and Technology — 55

Exercise 6 

Write down the binary representation of the following decimal number: -

 

- 938.8125

a) assuming the IEEE 754 single precision format. b) assuming the IEEE 754 double precision format.

Chapter 1 — Computer Abstractions and Technology — 56

Exercise 7 





NVIDIA has a “half” format, which is similar to IEEE 754 except that it is only 16 bits wide. The leftmost bit is still the sign bit, the exponent is 5 bits wide (exponent bias = 011112 = 15), and the mantissa is 10 bits long. A hidden 1 is assumed. a) Calculate the sum of the following decimal numbers A and B by hand, assuming A and B are stored in the 16-bit NVIDIA format. Assume one guard bit, one round bit and one sticky bit, and round to the nearest even. Show all the steps. B = 6.391601562 × 10-1 A = 2.3109375 × 101 b) Calculate the product of the following decimal numbers A and B by hand, assuming A and B are stored in the 16-bit NVIDIA format. Assume one guard bit, one round bit and one sticky bit, and round to the nearest even. Show all the steps; however, do the multiplication in human-readable format instead of using any techniques. Write your answer as a 16-bit pattern. How accurate is your result? B = 5.796875 × 101 A = 6.18 × 102 Chapter 1 — Computer Abstractions and Technology — 57