AMD-K6™ Processor Multimedia Technology

Introduction-

Next generation PC performance requirements are being driven by emerging multimedia and communications software. 3D graphics, video, audio, and telephony capabilities are evolving across education, entertainment, and internet applications. As multimedia applications continue to proliferate in the marketplace, PC systems suppliers are being challenged to deliver multimedia-enabled PC solutions covering all mainstream price/performance points.
In response to the growing need to provide improved PC multimedia capabilities, the AMD-K6™ MMX™ enhanced processor is the first member in the AMD family of processors to incorporate a robust multimedia technology that is fully software compatible with the MMX™ technology as defined by Intel. This multimedia technology enables scaleable multimedia capabilities across a broad range of PC system price/performance points.

The AMD-K6 processor features a decode-decoupled super scalar micro architecture and state-of-the-art design techniques to deliver true sixth-generation performance while maintaining full x86 binary software compatibility. An x86 binary-compatible processor implements the industry-standard x86 instruction set by decoding and executing the x862 AMD-K6™ Processor Multimedia Technology AMD-K6™ MMX™ Enhanced Processor Multimedia Technology 20726C/0 — June 1997 Preliminary Information instruction set as its native mode of operation. Only this native mode enables delivery of maximum performance when running PC software.

The AMD-K6 processor delivers leading-edge performance to mainstream PC systems running industry-standard x86 software. The AMD-K6 processor implements advanced design techniques like instruction predecoding, dual x86 opcode decoding, single-cycle internal RISC operations, parallel execution units, out-of-order execution, data forwarding, register renaming, and dynamic branch prediction. In other words, the AMD-K6 is capable of issuing, executing, and retiring multiple x86 instructions per cycle, resulting in superior scaleable performance.

Multimedia Technology Architecture

The multimedia technology in the AMD-K6 MMX enhanced processor is designed to accelerate media and communication applications. Specialized applications that use music synthesis, speech synthesis, speech recognition, audio and video compression and decompression, full motion video, 2D and 3D graphics, and video conferencing, can take advantage of the AMD-K6 processor multimedia technology. The multimedia technology implements new instructions, new data types, and powerful parallel processing (Single Instruction Multiple Data, SIMD) techniques that can significantly increase the performance of these applications.

Key Functionality

At the lowest levels, multimedia applications (audio, video, 3Dgraphics, and telephony, etc.) contain many similar functions. When these functions are performed on a processor that does not have MMX capability, the processor is heavily burdened by the computational requirements of this information. Processors executing the MMX instructions increase the performance of multimedia applications. This performance increase is a direct result of the increased multimedia bandwidth of the processor.
Multimedia applications must process large amounts of data. Parallel data computing is exemplified by applications that manipulate screen pixel information. Instead of acting on one pixel at a time, multimedia technology enables the system to act on multiple pixels simultaneously. This Single Instruction Multiple Data (SIMD) model is a key feature of MMX technology.
The AMD-K6 processor multimedia technology architecture includes four new MMX data types, 57 new MMX instructions, eight new 64-bit MMX registers, and an SIMD processing pipeline. The multimedia technology is compatible with existing x86 applications. The 57 new MMX instructions include arithmetic functions, packing and unpacking functions, logical operations, and
moves. These are the basic functions that are most commonly used in repetitive computational multimedia programs.
Multimedia applications often use smaller operands—8-bit data is commonly used for pixel information and 16-bit data is used for audio samples. The new MMX registers allow data to be packed into 64-bit operands. For example, 8-bit data (1 byte) can be packed in sets of eight in a single 64-bit register, and all eight bytes can be operated on simultaneously by a single MMX instruction.
For 256-color video modes, this translates to computing eight pixels per instruction. When an entire screen is being re-drawn,  these pixel manipulation routines often use highly repetitive loops. Parallel processing of eight pieces of data can reduce the processing time of a code loop by up to a factor of eight.
Multimedia applications frequently multiply and accumulate data. The multimedia technology provides instructions that add, multiply, and even combine these operations. For example, the PMADDWD instruction can multiply and then add words of data in a single instruction that uses far less processor cycles than the equivalent x86 operations.

Executing MMX™Instructions

A programmer must approach the use of MMX instructions differently, based on whether the code being developed is at the system level or at the application level. Before using the MMX instructions, the programmer must use the CPUID instruction to determine if the processor supports multimedia technology. See the AMD Processor Recognition Application Note, order# 20734, for more information.
Function 1 (EAX=1) of the AMD-K6 processor CPUID instruction returns the processor feature bits in the EDX register. Software can then test bit 23 of the feature bits to determine if the processor supports the multimedia technology. If bit 23 is set to 1, MMX instructions are supported. All AMD-K6 processors have bit 23 set. Once it is determined that multimedia technology is supported, subsequent code can use the MMX instructions. Alternatively, the AMD 8000_0001h extended CPUID function can be used to test whether the processor supports multimedia technology.
After a module of MMX code has executed, the programmer must empty the MMX state by executing the EMMS command. Because the MMX registers share the floating-point registers, an instruction is needed to prevent MMX code from interfering with floating-point. The EMMS command clears the multimedia state and resets all the floating-point tag bits. Emptying the MMX state sets the floating-point tag bits to empty (all ones), which marks the MMX/FP registers as invalid and available.

Register Set

The AMD-K6 processor implements eight new 64-bit MMX registers. These registers are mapped on the floating-point registers. As shown in Figure 1 on page 5, the new MMX instructions refer to these registers as mmreg0 to mmreg7. Mapping the new MMX registers on the floating-point stack enables backwards compatibility for the register saving that must occur as a result of task switching.
Aliasing the MMX registers onto the floating-point stack registers provides a safe way to introduce this new technology. Instead of needing to modify operating systems, new MMX applications can be supported through device drivers, MMX libraries, or DLL files. See the Programming Considerations section of this document for more information. Current operating systems have support for floating-point operations. Using the floating-point registers for MMX code is an ingenious way of implementing automatic support for MMX instructions. Every time the processor executes an MMX instruction, all the floating-point register tag bits are set to zero (00b=valid). Setting the tag bits after every MMX instruction prevents the processor from having to perform extra tasks. These extra tasks are normally executed on floating-point registers when the Tag field is something other than 00b. If a task switch occurs during an MMX or floating-point instruction, the Control Register (CR0) Task Switch (TS) bit is set to 1. The processor then generates an interrupt 7 (int 7 Device Not Available) when it encounters the next floating-point or MMX instruction, allowing the operating system to save the state of the MMX/FP registers.
If there is a task switch when MMX applications are running with older applications that do not include MMX instructions, the MMX/FP register state is still saved automatically through the int 7 handler.

Data Types

The AMD-K6 processor multimedia technology uses a packed data format. The data is packed in a single, 64-bit MMX register or memory operand as eight bytes, four words, or two double words. Each byte, word, doubleword, or quadword is an integer data type.
The form of an instruction determines the data type. For example, the MOV instruction comes in two different forms— MOVD moves 32 bits of data and MOVQ moves 64 bits of data. The four new data types are defined as follows:
Packed byte             Eight 8-bit bytes packed into 64 bits Signed integer range(–2^7to 2^7–1)Unsigned integer range(0 to 2^8–1)
Quadword               One 64-bit quadword Signed integer range(–2^63to 2^63–1)Unsigned integer range(0 to 2^64–1