A SERVICE OF

logo

Appendix C Instruction Latencies 313
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
FXCH D9h 11-001-xxx DirectPath FADD/FMUL/
FSTORE
22
FXRSTOR [mem512byte] 0Fh AEh mm-001-xxx VectorPath - 68 (108)
FXSAVE [mem512byte] 0Fh AEh mm-000-xxx VectorPath - 31 (79)
FXTRACT D9h 11-110-100 VectorPath - 9
FYL2X D9h 11-110-001 VectorPath - ~
FYL2XP1 D9h 11-111-001 VectorPath - 113
Table 15. x87 Floating-Point Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency Note
First
byte
Second
byte
ModRM byte
Notes:
1. The last three bits of the ModRM byte select the stack entry ST(i).
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP
with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of
three per cycle and can use any of the three execution resources.
3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).
4. There is additional latency associated with this instruction. ā€œeā€ represents the difference between the exponents
of the divisor and the dividend. If ā€œsā€ is the number of normalization shifts performed on the result, then
n = (s+1)/2 where (0 <= n <= 32).
5. The latency provided for this operation is the best-case latency.
6. The three latency numbers represent the latency values for precision control settings of single precision, double
precision, and extended precision, respectively.