35.2.4.1. FLOATING POINT VECTOR SUMMARY



+-----------+
|  IA-64    |		Floating Point
+-----------+

Vector Data Format:

	[ SINGLE x 2	] = 64 bit


FPABS			- FP Parallel Absolute Value
FPACK			- FP Pack
FPAMAX.sf		- FP Parallel Absolute Maximum
FPAMIN.sf		- FP Parallel Absolute Minimum
FPCMP.frel.sf		- FP Parallel Compare
FPCVT.fx/fxu(.trunc).sf	- Convert Parallel FP to Integer
FPMA.sf			- FP Parallel Multiply Add
FPMAX.sf		- FP Parallel Maximum
FPMERGE.ns/s/se		- FP Parallel Merge
FPMIN.sf		- FP Parallel Minimum
FPMPY.sf		- FP Parallel Multiply
FPMS.sf			- FP Parallel Multiply Subtract
FPNEG			- FP Parallel Negate
FPNEGABS		- FP Parallel Negate Absolute Value
FPNMA.sf		- FP Parallel Negative Multiply Add
FPNMPY.sf		- FP Parallel Negative multiply
FPRCPA.sf		- FPP Reciprocal Apprioximation
FPRSQRTA.sf		- FPP Reciprocal Square Root Approximation


+----------+
| SPARC	   |
+----------+	

SPARC V9 Extended Instructions. (UltraSPARC)

	UltraSPACE = V9 + VIS I

Vector Data Format:


	Pixel:		[  BYTE x 4 ] = 32 bit	(8-bit Unsigned Integer)
	Fixed Point:	[  WORD x 4 ] = 64 bit	(16-bit Signed Fixed Point)
	Fixed Point:	[ DWORD x 2 ] = 64 bit	(32-bit Signed Fixed Point)


FPADD16				- Partitioned Add			VIS I
FPADD16S                                                                VIS I
FPADD32                                                                 VIS I
FPADD32S                                                                VIS I
FPSUB16				- Partitioned Subtract                  VIS I
FPSUB16S                                                                VIS I
FPSUB32                                                                 VIS I
FPSUB32S		                                                VIS I
FPACK16				- Pixel formatting                      VIS I
FPACK32									VIS I
FPACKFIX                                                                VIS I
FEXPAND                                                                 VIS I
FPMERGE	                                                                VIS I
FMUL8x16			- Partitioned Multiply             	VIS I
FMUL8x16AU                                                              VIS I
FMUL8x16AL			                                        VIS I
FMUL8SUx16                                                              VIS I
FMUL8ULx16                                                              VIS I
FMULD8SUx16                                                             VIS I
FMULD8ULx16                                                             VIS I
FCMPGT16			- Pixel compare                         VIS I
FCMPGT32                                                                VIS I
FCMPLE16                                                                VIS I
FCMPLE32                                                                VIS I
FCMPNE16                                                                VIS I
FCMPNE32                                                                VIS I
FCMPEQ16                                                                VIS I
FCMPEQ32                                                                VIS I
EDGE8				- Edge handling                         VIS I
EDGE8L                                                                  VIS I
EDGE8N                                                                  VIS II
EDGE8LN                                                                 VIS II
EDGE16                                                                  VIS I
EDGE16L                                                                 VIS I
EDGE16N                                                                 VIS II
EDGE16LN                                                                VIS II
EDGE32                                                                  VIS I
EDGE32L                                                                 VIS I
EDGE32N                                                                 VIS II
EDGE32LN								VIS II
PDIST				- Pixel component Distance		VIS I
LDDFA				- Load Array/etc   			VIS I
LDDA				- Load Quadword atomic			VIS I
STDFA				- Partial store/etc                     VIS I   
ARRAY8 				- 3D Array addressing                   VIS I
ARRAY16                                                                 VIS I
ARRAY32                                                                 VIS I
ALIGNADDRESS			- Alignment				VIS I
ALIGNADDRESS_LITTLE                                                     VIS I
FALIGNDATA                                                              VIS I
RDASR				- Read Graphic status reg   		US
WRASR				- Write Graphic status reg  		US
SIAM				- Set Interval Arithmetic mode		VIS II
FZERO		FZEROS		- Logical                               VIS I
FONE		FONES                                                   VIS I
FSRC1		FSRC1S                                                  VIS I
FSRC2		FSRC2S                                                  VIS I
FNOT1		FNOT1S                                                  VIS I
FNOT2		FNOT2S                                                  VIS I
FOR		FORS                                                    VIS I
FNOR		FNORS                                                   VIS I
FAND		FANDS                                                   VIS I
FNAND		FNANDS                                                  VIS I
FXOR		FXORS                                                   VIS I
FXNOR		FXNORS                                                  VIS I
FORNOT1		FORNOT1S                                                VIS I
FORNOT2		FORNOT2S                                                VIS I
FANDNOT1	FANDNOT1S                                               VIS I
FANDNOT2	FANDNOT2S                                               VIS I
BMASK				- Set GSR.MASK for Shuffle instr.	VIS II
BSHUFFLE			- Shuffle 				VIS II



+----------+
| MIPS	   |
+----------+

MIPS V 

.PS	[ SINGLE x 2  ] = 64 bit	PS - paired single

LUXC1			- Load PS Unaligned
LDXC1			- Load PS Aligned
LWXC1			- Load PS
	
SUXC1			- Store PS Unaligned
SDXC1			- Store PS
SWXC1			- Store PS Unaligned

ALNV.PS			- Handled not 8-bit Alignment Vectors

			Example:
				luxc1	f1,bc
				ldxc1	f2,de

				f0	B	C	
				f1	D	E

				alnv.ps	f2,f0,f1,T0
		
				f2	C	D

ADD.PS/	SUB.PS		- Add/Subtract
MUL.PS			- Multiply
ABS.PS			- Absolute Value
MOV.PS			- Move
NEG.PS			- Negate
CVT.S.PU		- Convert from PS Upper to S
CVT.S.PL		- Convert from PS Lower to S
CVT.PS.S		- Convert to PS from 2 S
PLL.PS			- PS from two PS (Low,Low)
PLU.PS			- PS from two PS (Low,Up)
PUL.PS			- PS from two PS (Up,Low)
PUU.PS			- PS from two PS (Up,Up)
C.XX			- FP Vector Compare/Branch

MADD.PS			- Multiply/Add	
MSUB.PS			- Multiply/Sub
NMADD.PS		- Negatte Muliply/Add
NMSUB.PS		- Negate Multiply/Sub


MIPS 64 - is superset of MIPS V Instruction Set Architecture.
Contain new data type - paired-single. This datatype prowide 2-way
SIMD capability for two 32-bit single precension floating packed in
one 64-bit register.

Vector Data Format:

.PS	[ SINGLE x 2  ] = 64bit

MIPS-3D ASE (Application Specific Extensions):

ADDR.PS		- F.P. Reduction add
MULR.PS		- F.P. Reduction multiply
RECIP1.S	- Reciprocal with reduced precension result
RECIP1.D
RECIP1.PS
RECIP2.S	- Reciprocal 2nd step
RECIP2.D
RECIP2.PS
RSQRT1.S	- Reciprocal square root with reduced precension result
RSQRT.D
RSQRT.PS
RSQRT2.S	- Reciprocal square root 2nd step
RSQRT2.D
RSQRT2.PS
CVT.PS.PW	- Convert two 32-bit integers to F.P. paired-single
CVT.PW.PS	- Convert F.P paired single to two paired words
CABS.cond.S	- F.P. Absolute Values Compare
CABS.cond.D
CABS.cond.PS	
BC1ANY2F cc	- Branch on any of two FP condition code false	
BC1ANY2T cc	- Branch on any of two FP condition code true
BC1ANY4F cc	- Branch on any of four FP condition code false
BC1ANY4T cc	- Branch on any of four FP condition code true


+----------+
|  x86	   |
+----------+

KNI (Katmai New Instructions)  SIMD  [Pentium 3]
   ----------------------------------

	XMM Registers (128-bit)


Vector Data Format:

	[ SINGLE x 4 ]	= 128 bit


ADDPS		- Packed Single F.P Add
ANDNPS		- Bit-wide Logical And-Not for Single-FP
ANDPS		- Bit-wide Logical And For Single-FP
CMPPS		- Packed Single-FP Compare
CVTPI2PS	- Packed Signed INT32 to Packed Single-FP Conversion
CVTPS2PI	- Packed Single-FP to Packed INT32 Conversion
CVTTPS2PI	- Packed Single-FP to Packed INT32 Conversion (Truncate)
DIVPS		- Packed Single FP Divide
MAXPS		- Packed Single FP Maximum
MINPS		- Packed Single	FP Minimum
MOVAPS		- Move Aligned Four Packed Single-FP
MOVUPS		- Move Unaligned Four Packed Single-FP
MULPS		- Packed Single-FP Multiply
ORPS		- Bit-wise Logical OR for Single-FP Data
RCPPS		- Packed Single-FP Reciprocal
RSQRTPS		- Packed Single-FP Square Root Reciprocal
SHUFPS		- Shuffle Single-FP 
SQRTPS		- Packed Single-FP Square Root
SUBPS		- Packed Single-FP Subtract
XORPS		- Bit-wise Logical XOR for Single-FP Data
(And some other operations)



Williamette SIMD2 Extensions [Pentium 4]:
   -------------------------


	[ DOUBLE x 2 ] = 128 bit
	
ADDPD		- Add packed Double-FP
ANDNPD		- AND-NOT packed Double-FP
ANDPD		- AND packed Double-FP
CMPxxPD		- Packed Double-FP Compare
DIVPD		- Divide packed Double-FP
MAXPD		- Packed Double-FP Maximum
MINPD		- Packed Double-FP Minimum
MULPD		- Multiply packed Double-FP
ORPD		- OR packed Double-FP
SHUFPD		- Shuffle
SQRTPD		- Packed Double-FP Square Root
SUBPD		- Sub packed Double-FP
XORPD		- XOR packed Double-FP

Also, conversions:
CVTPD2PI	CVTPI2PD
CVTPD2DQ
CVTPD2PS	CVTPS2PD
CVTTPD2PI
CVTTPD2PS

And moving:
MOVAPD		- Aligned
MOVHPD		- High
MOVLPD		- Low
MOVMSKPD	- Byte Mask
MOVUPD		- Unaligned

Packs/Unpacks operations not described here.



Prescott New Instructions (PNI):
  ---------------------------

ADDSUBPD	- Double FP Add/Sub
ADDSUBPS	- Single FP Add/Sub
HADDPD		- Packed double-FP  Horizontal Add
HADDPS		- Packed single-FP  Horizontal Add
HSUBPD		- Packed double-FP  Horizontal Sub
HSUBPS		- Packed single-FP  Horizontal Sub
LDDQU		- Load Unaligned 128-bit integer
MOVDDUP		- Move one double-FP and duplicate
MOVSHDUP	- Move one single-FP high and duplicate
MOVSLDUP	- Move one single-FP low and duplicate



+-----------+
| Power PC  |
+-----------+

PowerPC G4 AntiVect ISA extension:

Vector Data Format:

	V-registers (128 bit)

	[ 4  x SINGLE FP	 ] = 128bit



LVEWX	- Load Vector Element Word Indexed
LVX	- Load Vector Indexed
LVXL	- Load Vector Indexed LRU
STVEWX	 - Store Vector Element Word Indexed
STVX	 - Store Vector Indexed
STVXL	 - Store Vector Indexed LRU
VADDFP	 - Vector Add FP
VCFSX	- Vector Convert from Signed Fixed-point word
VCFUX
VCMPBFP/VCMPBFP.   - Vector Compare Bounds FP
VCMPEQFP/VCMPEQFP. - Vector Compare Equal-to-FP
VCMPGEFP/VCMPEQFP. - Vector compare Greater than or Equal to FP
VCMPGTFP/VCMPGTFP. - Vector compare Greater to FP
VCTSXS		Vector convert to Signed FP Word Saturate
VCTUXS		Vector convert to Unsigned FP Word Saturate
VEXPTEFP	Vector 2 Raised to the Exponent Estimate Floating Point
VLOGEFP		Vector Log2 Estimate Floating Point
VMADDFP		Vector Multiply Add Floating Point
VMAXFP		Vector Maximum Floating Point
VMINFP		Vector Minimum FP
VNMSSUBFP	Vector Negative Multiply-Subtract FP
VREPFP		Vector Reciprocal Estimate FP
VRFIM		Vector Round to FP Integer toward Minus Infinity
VRFIN		Vector Round to FP Integer Nearest
VRFIP		Vector Round to FP Integer toward Plus Infinity
VRFIZ		Vector Round to FP Integer toward Zero
VRSQRTEFP	Vector Reciprocal Square Root Estimate Floating Point
VSUBFP		Vector Subtract FP
Index Prev Next