Full general-Purpose Annals

Cortex-M3 Basics

Joseph Yiu , in The Definitive Guide to the ARM Cortex-M3 (2nd Edition), 2010

3.1 Registers

Equally we've seen, the Cortex™-M3 processor has registers R0 through R15 and a number of special registers. R0 through R12 are general purpose, only some of the 16-chip Thumb® instructions can only access R0 through R7 (low registers), whereas 32-bit Thumb-2 instructions can admission all these registers. Special registers accept predefined functions and can simply exist accessed by special annals admission instructions.

3.1.1 Full general Purpose Registers R0 through R7

The R0 through R7 general purpose registers are also called low registers. They can be accessed by all xvi-flake Thumb instructions and all 32-bit Thumb-2 instructions. They are all 32 $.25; the reset value is unpredictable.

3.ane.2 General Purpose Registers R8 through R12

The R8 through R12 registers are also called loftier registers. They are accessible past all Pollex-2 instructions but not past all 16-bit Pollex instructions. These registers are all 32 bits; the reset value is unpredictable (see Figure iii.1).

Effigy 3.1. Registers in the Cortex-M3.

iii.1.three Stack Pointer R13

R13 is the stack arrow (SP). In the Cortex-M3 processor, in that location are ii SPs. This duality allows two separate stack memories to exist set upwards. When using the annals name R13, y'all can only access the current SP; the other ane is inaccessible unless you use special instructions to motility to special register from general-purpose annals (MSR) and motion special register to general-purpose register (MRS). The 2 SPs are equally follows:

Main Stack Pointer (MSP) or SP_main in ARM documentation: This is the default SP; it is used by the operating system (Bone) kernel, exception handlers, and all awarding codes that crave privileged access.

Procedure Stack Arrow (PSP) or SP_process in ARM documentation: This is used by the base-level application code (when non running an exception handler).

Stack Button and POP

Stack is a memory usage model. It is simply part of the organization memory, and a pointer register (inside the processor) is used to make it work as a first-in/final-out buffer. The common utilise of a stack is to relieve annals contents before some data processing so restore those contents from the stack after the processing task is washed.

FIGURE three.2. Basic Concept of Stack Memory.

When doing PUSH and POP operations, the arrow register, commonly called stack pointer, is adjusted automatically to prevent next stack operations from corrupting previous stacked data. More than details on stack operations are provided on later on part of this chapter.

It is not necessary to utilize both SPs. Uncomplicated applications tin can rely purely on the MSP. The SPs are used for accessing stack retentiveness processes such equally Push button and POP.

In the Cortex-M3, the instructions for accessing stack retentivity are PUSH and POP. The assembly language syntax is equally follows (text later each semicolon [;] is a annotate):

Push button   {R0}   ; R13=R13-4, then Retention[R13] = R0

Popular   {R0}   ; R0 = Memory[R13], then R13 = R13 + 4

The Cortex-M3 uses a full-descending stack arrangement. (More than detail on this subject can exist found in the "Stack Retentiveness Operations" department of this affiliate.) Therefore, the SP decrements when new data is stored in the stack. PUSH and Popular are usually used to save register contents to stack memory at the offset of a subroutine and and then restore the registers from stack at the end of the subroutine. You tin can PUSH or Popular multiple registers in one instruction:

subroutine_1

  Button   {R0-R7, R12, R14} ; Salvage registers

  ...   ; Practice your processing

  POP   {R0-R7, R12, R14} ; Restore registers

  BX   R14   ; Return to calling office

Instead of using R13, you tin use SP (for SP) in your plan codes. It means the same matter. Inside plan code, both the MSP and the PSP can be called R13/SP. However, you can access a particular i using special annals access instructions (MRS/MSR).

The MSP, too called SP_main in ARM documentation, is the default SP later ability-up; it is used by kernel code and exception handlers. The PSP, or SP_process in ARM documentation, is typically used by thread processes in system with embedded Os running.

Because register PUSH and Pop operations are ever word aligned (their addresses must exist 0x0, 0x4, 0x8, ...), the SP/R13 bit 0 and scrap 1 are hardwired to 0 and ever read as zero (RAZ).

three.1.4 Link Register R14

R14 is the link annals (LR). Inside an assembly program, you tin write information technology as either R14 or LR. LR is used to shop the return programme counter (PC) when a subroutine or function is called—for instance, when you're using the branch and link (BL) education:

main   ; Main program

  ...

  BL function1 ; Call function1 using Branch with Link instruction.

  ; PC = function1 and

  ; LR = the side by side instruction in main

  ...

function1

  ...   ; Program code for role i

  BX LR   ; Return

Despite the fact that fleck 0 of the PC is e'er 0 (considering instructions are give-and-take aligned or one-half discussion aligned), the LR bit 0 is readable and writable. This is because in the Thumb instruction set, bit 0 is oftentimes used to bespeak ARM/Pollex states. To permit the Pollex-2 program for the Cortex-M3 to work with other ARM processors that support the Thumb-2 engineering science, this least pregnant bit (LSB) is writable and readable.

3.1.v Program Counter R15

R15 is the PC. You can admission it in assembler code by either R15 or PC. Because of the pipelined nature of the Cortex-M3 processor, when yous read this annals, yous will find that the value is different than the location of the executing instruction, normally by 4. For case:

0x1000 :   MOV   R0, PC   ; R0 = 0x1004

In other instructions similar literal load (reading of a memory location related to current PC value), the constructive value of PC might not be instruction address plus 4 due to alignment in address calculation. But the PC value is nevertheless at least 2 bytes ahead of the instruction address during execution.

Writing to the PC will crusade a branch (but LRs practise not get updated). Considering an education address must be half word aligned, the LSB (bit 0) of the PC read value is always 0. However, in branching, either by writing to PC or using branch instructions, the LSB of the target address should be fix to one because it is used to signal the Thumb state operations. If it is 0, information technology can imply trying to switch to the ARM state and will result in a mistake exception in the Cortex-M3.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/commodity/pii/B9781856179638000065

INTRODUCTION TO THE ARM Educational activity Prepare

ANDREW N. SLOSS , ... CHRIS WRIGHT , in ARM Arrangement Developer'southward Guide, 2004

3.5 Programme STATUS Register INSTRUCTIONS

The ARM didactics set provides 2 instructions to directly control a programme status register (psr). The MRS instruction transfers the contents of either the cpsr or spsr into a register; in the reverse direction, the MSR didactics transfers the contents of a register into the cpsr or spsr. Together these instructions are used to read and write the cpsr and spsr.

In the syntax you can see a label chosen fields. This can be whatsoever combination of control (c), extension (10), status (southward), and flags (f). These fields chronicle to item byte regions in a psr, as shown in Effigy iii.9.

Effigy 3.nine. psr byte fields.

MRS copy plan status register to a general-purpose register Rd = psr
MSR move a full general-purpose register to a program status register psr[field] = Rm
MSR motion an immediate value to a plan condition annals psr[field] = immediate

The c field controls the interrupt masks, Thumb state, and processor mode. Instance 3.26 shows how to enable IRQ interrupts by immigration the I mask. This operation involves using both the MRS and MSR instructions to read from and and so write to the cpsr.

EXAMPLE 3.26

The MSR outset copies the cpsr into register r1. The BIC instruction clears bit 7 of r1. Register r1 is then copied back into the cpsr, which enables IRQ interrupts. You can see from this case that this code preserves all the other settings in the cpsr and only modifies the I bit in the control field.

This example is in SVC mode. In user mode you can read all cpsr bits, but yous can only update the status flag field f.

3.5.ane COPROCESSOR INSTRUCTIONS

Coprocessor instructions are used to extend the teaching set. A coprocessor tin can either provide boosted computation capability or be used to control the memory subsystem including caches and retentivity management. The coprocessor instructions include data processing, annals transfer, and memory transfer instructions. We will provide but a short overview since these instructions are coprocessor specific. Note that these instructions are only used by cores with a coprocessor.

CDP coprocessor data processing—perform an operation in a coprocessor
MRC MCR coprocessor register transfer—motility information to/from coprocessor registers
LDC STC coprocessor memory transfer—load and shop blocks of memory to/from a coprocessor

In the syntax of the coprocessor instructions, the cp field represents the coprocessor number between p0 and p15. The opcode fields describe the performance to take place on the coprocessor. The Cn, Cm, and Cd fields draw registers inside the coprocessor. The coprocessor operations and registers depend on the specific coprocessor you are using. Coprocessor fifteen (CP15) is reserved for system command purposes, such as memory management, write buffer control, cache command, and identification registers.

EXAMPLE iii.27

This instance shows a CP15 annals beingness copied into a general-purpose register.

Here CP15 register-0 contains the processor identification number. This annals is copied into the general-purpose register r10.

3.5.2 COPROCESSOR 15 Education SYNTAX

CP15 configures the processor cadre and has a set up of dedicated registers to store configuration data, equally shown in Example 3.27. A value written into a register sets a configuration attribute—for example, switching on the enshroud.

CP15 is called the system control coprocessor. Both MRC and MCR instructions are used to read and write to CP15, where annals Rd is the core destination register, Cn is the master register, Cm is the secondary register, and opcode2 is a secondary register modifier. You may occasionally hear secondary registers chosen "extended registers."

Every bit an case, here is the instruction to move the contents of CP15 control register c1 into register r1 of the processor core:

We use a shorthand note for CP15 reference that makes referring to configuration registers easier to follow. The reference notation uses the following format:

The outset term, CP15, defines information technology equally coprocessor 15. The 2d term, after the separating colon, is the primary annals. The primary annals X can have a value between 0 and xv. The third term is the secondary or extended annals. The secondary register Y tin take a value between 0 and fifteen. The terminal term, opcode2, is an didactics modifier and can take a value between 0 and 7. Some operations may likewise use a nonzero value w of opcode1. We write these as CP15:w:cX:cY:Z.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9781558608740500046

Overview of the Cortex-M3

Joseph Yiu , in The Definitive Guide to the ARM Cortex-M3 (2d Edition), 2010

2.2 Registers

The Cortex-M3 processor has registers R0 through R15 (run across Effigy 2.2). R13 (the stack pointer) is banked, with only one copy of the R13 visible at a fourth dimension.

Figure 2.2. Registers in the Cortex-M3.

2.2.1 R0–R12: General-Purpose Registers

R0–R12 are 32-scrap general-purpose registers for data operations. Some 16-bit Thumb ® instructions tin can but access a subset of these registers (low registers, R0–R7).

2.2.2 R13: Stack Pointers

The Cortex-M3 contains two stack pointers (R13). They are banked so that simply ane is visible at a time. The ii stack pointers are as follows:

Main Stack Pointer (MSP): The default stack pointer, used by the operating organisation (Bone) kernel and exception handlers

Procedure Stack Pointer (PSP): Used by user application code

The lowest 2 bits of the stack pointers are ever 0, which means they are always word aligned.

2.2.three R14: The Link Register

When a subroutine is chosen, the render accost is stored in the link annals.

2.two.iv R15: The Plan Counter

The program counter is the current program address. This register tin can be written to control the plan flow.

ii.2.5 Special Registers

The Cortex-M3 processor besides has a number of special registers (see Figure two.three). They are as follows:

Program Condition registers (PSRs)

Interrupt Mask registers (PRIMASK, FAULTMASK, and BASEPRI)

Control annals (Control)

FIGURE 2.3. Special Registers in the Cortex-M3.

These registers accept special functions and tin exist accessed simply by special instructions. They cannot exist used for normal data processing (see Table 2.ane).

Table 2.1. Special Registers and Their Functions

Register Function
xPSR Provide arithmetic and logic processing flags (zero flag and carry flag), execution status, and current executing interrupt number
PRIMASK Disable all interrupts except the nonmaskable interrupt (NMI) and hard error
FAULTMASK Disable all interrupts except the NMI
BASEPRI Disable all interrupts of specific priority level or lower priority level
CONTROL Define privileged status and stack arrow selection

For more than information on these registers, see Chapter 3.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781856179638000053

Early Intel® Architecture

In Ability and Performance, 2015

1.ane.2 Registers

Aside from the four segment registers introduced in the previous section, the 8086 has seven full general purpose registers, and ii status registers.

The full general purpose registers are divided into ii categories. Four registers, AX, BX, CX, and DX, are classified as information registers. These data registers are attainable as either the full xvi-bit register, represented with the Ten suffix, the depression byte of the full 16-flake register, designated with an L suffix, or the loftier byte of the 16-flake register, delineated with an H suffix. For instance, AX would access the full sixteen-bit register, whereas AL and AH would admission the register's low and high bytes, respectively.

The second classification of registers are the arrow/index registers. This includes the following 4 registers: SP, BP, SI, and DI, The SP register, the stack arrow, is reserved for usage as a pointer to the top of the stack. The SI and DI registers are typically used implicitly as the source and destination pointers, respectively. Unlike the data registers, the pointer/alphabetize registers are only accessible equally full 16-fleck registers.

As this categorization may indicate, the general purpose registers come with some guidance for their intended usage. This guidance is reflected in the education forms with implicit operands. Instructions with implicit operands, that is, operands which are assumed to be a certain register and therefore don't require that operand to exist encoded, permit for shorter encodings for common usages. For convenience, instructions with implicit forms typically as well take explicit forms, which require more bytes to encode. The recommended uses for the registers are as follows:

AX Accumulator

BX Data (relative to DS)

CX Loop counter

DX Information

SI Source pointer (relative to DS)

DI Destination pointer (relative to ES)

SP Stack pointer (relative to SS)

BP Base arrow of stack frame (relative to SS)

Aside from allowing for shorter instruction encodings, this guidance is also an aid to the programmer who, once familiar with the various annals meanings, volition be able to deduce the meaning of associates, assuming it conforms to the guidelines, much faster. This parallels, to some degree, how variable names help the programmer reason about their contents. Information technology'due south of import to note that these are but suggestions, not rules.

Additionally, there are ii condition registers, the instruction pointer and the flags register.

The instruction pointer, IP, is also oftentimes referred to every bit the program counter. This annals contains the memory address of the side by side education to be executed. Until 64-flake mode was introduced, the instruction arrow was not directly attainable to the developer, that is, it wasn't possible to access it like the other general purpose registers. Despite this, the instruction arrow was indirectly accessible. Whereas the instruction pointer couldn't exist modified through a MOV instruction, it could be modified past any education that alters the program catamenia, such as the CALL or JMP instructions.

Reading the contents of the educational activity pointer was also possible by taking advantage of how x86 handles function calls. Transfer from one part to another occurs through the CALL and RET instructions. The CALL pedagogy preserves the current value of the instruction pointer, pushing it onto the stack in gild to support nested function calls, and so loads the instruction pointer with the new accost, provided as an operand to the instruction. This value on the stack is referred to as the return address. Whenever the function has finished executing, the RET didactics pops the render address off of the stack and restores information technology into the instruction pointer, thus transferring control dorsum to the function that initiated the function phone call. Leveraging this, the programmer can create a special thunk part that would simply re-create the return value off of the stack, load it into one of the registers, and then return. For instance, when compiling Position-Contained-Code (Pic), which is discussed in Chapter 12, the compiler will automatically add together functions that use this technique to obtain the instruction pointer. These functions are commonly chosen __x86.get_pc_thunk.bx(), __x86.get_pc_thunk.cx(), __x86.get_pc_thunk.dx(), so on, depending on which annals the instruction arrow is loaded.

The second status register, the EFLAGS register, is comprised of 1-bit condition and control flags. These $.25 are fix past various instructions, typically arithmetics or logic instructions, to signal certain conditions. These condition flags tin and so exist checked in gild to make decisions. For a list of the flags modified past each instruction, see the Intel SDM. The 8086 defined the following status and control bits in EFLAGS:

Zero Flag (ZF) Set if the result of the pedagogy is nix.

Sign Flag (SF) Set if the event of the instruction is negative.

Overflow Flag (OF) Set if the result of the instruction overflowed.

Parity Flag (PF) Ready if the result has an fifty-fifty number of bits set up.

Carry Flag (CF) Used for storing the bear bit in instructions that perform arithmetic with carry (for implementing extended precision).

Arrange Flag (AF) Similar to the Carry Flag. In the parlance of the 8086 documentation, this was referred to as the Auxiliary Carry Flag.

Direction Flag (DF) For instructions that either autoincrement or autodecrement a pointer, this flag chooses which to perform. If fix, autodecrement, otherwise autoincrement.

Interrupt Enable Flag (IF) Determines whether maskable interrupts are enabled.

Trap Flag (TF) If set CPU operates in single-step debugging manner.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012800726600001X

Intel® Pentium® Processors

In Power and Performance, 2015

2.two.3 Out-of-Order Execution

As discussed in Section two.1.i, prior to the 80486, the processor handled one educational activity at a time. As a effect, the processor'southward resource remained idle while the currently executing instruction was non utilizing them. With the introduction of pipelining, the pipeline was partitioned to permit multiple instructions to coexist simultaneously. Therefore, when the currently executing instruction had finished with some of the processor'south resources, the next instruction could begin utilizing them earlier the beginning education had completely finished executing. The introduction of μops expanded significantly on this concept, splitting instruction execution into smaller steps.

Each type of μop has a corresponding type of execution unit. The Pentium Pro has v execution units: two for handling integer μops, two for treatment floating point μops, and one for handling memory μops. Therefore, up to 5 μops tin can execute in parallel. An instruction, divided into ane or more μops, is not done executing until all of its respective μops have finished. Plain, μops from the same instruction have dependencies upon one another so they tin't all execute simultaneously. Therefore, μops from multiple instructions are dispatched to the execution units.

Taking advantage of the fine granularity of μops, out-of-order execution significantly improves utilization of the execution units. Upward until the Pentium Pro, Intel processors executed in-order, meaning that instructions were executed in the same sequence every bit they were organized in memory. With out-of-order execution, μops are scheduled based on the available resources, as opposed to their ordering. As instructions are fetched and decoded, the resulting μops are stored in the Reorder Buffer. As execution units and other resources become available, the Reservation Station dispatches the corresponding μop to 1 of the execution units. In one case the μop has finished executing, the result is stored back into the Reorder Buffer. One time all of the μops associated with an instruction have completed execution, the μops retire, that is, they are removed from the Reorder Buffer and any results or side-furnishings are made visible to the rest of the system. While instructions can execute in any order, instructions ever retire in-club, ensuring that the programmer does not need to worry about handling out-of-order execution.

To illustrate the problem with in-order execution and the do good of out-of-order execution, consider the following hypothetical situation. Assume that a processor has 2 execution units capable of treatment integer μops and 1 capable of treatment floating point μops. With in-order scheduling, the most efficient usage of this processor would be to intermix integer and floating bespeak instructions following the two-to-one ratio. This would involve carefully scheduling instructions based on their didactics latencies, along with the latencies for fetching any memory resources, to ensure that when an execution unit becomes available, the next μop in the queue would be executable with that unit of measurement.

For example, consider 4 instructions scheduled on this example processor, three integer instructions followed by a floating point pedagogy. Presume that each instruction corresponds to one μop, that these instructions take no interdependencies, and that all 3 execution units are currently bachelor. The first two integer instructions would be dispatched to the two available integer execution units, but the floating betoken didactics would non be dispatched, even though the floating point execution unit was available. This is considering the tertiary integer teaching, waiting for one of the 2 integer execution units to become available, must be issued get-go. This underutilizes the processor's resources. With out-of-society execution, the first 2 integer instructions and the floating point instruction would exist dispatched together.

In other words, out-of-lodge execution improves the utilization of the processor's resources. Additionally, because μops are scheduled based on bachelor resources, some educational activity latencies, such as an expensive load from memory, may exist partially or completely masked if other work tin be scheduled instead.

Annals Renaming

From the didactics set perspective, Intel processors have eight general purpose registers in 32-scrap style, and sixteen general purpose registers in 64-bit mode, however, from the internal hardware perspective, Intel processors have many more registers. For example, the Pentium Pro has forty registers, organized in a structure referred to equally a Concrete Register File.

While this many actress registers might seem like a performance boon, especially if the reader is familiar with the performance gain received from the viii actress registers in 64-flake mode, these registers serve a different purpose. Rather than providing the procedure with more registers, these extra registers serve to handle data dependencies in the out-of-order execution engine.

When a value is stored into a register, a new register file entry is assigned to contain that value. Once another value is stored into that register, a different register file entry is assigned to contain this new value. Internal to the processor core, each data dependency on the commencement value will reference the get-go entry, and each data dependency on the 2d value will reference the second entry. Therefore, the out-of-order engine is able to execute instructions in an guild that would otherwise be incommunicable due to false data dependencies.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780128007266000021

Load/store and co-operative instructions

Larry D. Pyeatt , William Ughetta , in ARM 64-Bit Associates Linguistic communication, 2020

iii.2 AArch64 user registers

As shown in Fig. three.2 , the AArch64 ISA provides 31 general-purpose registers, which are called

Image 2

through

Image 3

. These registers tin each store 64 bits of data. To use all 64 bits, they are referred to as

Image 4

through

Image 5

(capitalization is optional). To utilise just the lower (least significant) 32 bits, they are referred to as

Image 6

. Since each annals has a 64-bit name and a 32-bit proper noun, we utilize

Image 7

through

Image 8

to specify a register without specifying the number of bits. For example, when we refer to

Image 9

, we are actually referring to either

Image 10

or

Image 11

.

Figure 3.2

Figure 3.2. AArch64 general purpose registers (

Image 1
) and special registers.

3.2.1 General purpose registers

The general-purpose registers are each used according to specific conventions. These rules are divers in the application binary interface (ABI). The AArch64 ABI is chosen AAPCS64. The difference between callee saved and caller saved registers will likewise be explained in Section 5.4.iv.

Registers

Image 12
are used for passing arguments when calling a procedure or office Registers
Image 13
are scratch registers and can be used at any fourth dimension because no assumptions are made about what they incorporate. They are called scratch registers because they are useful for property temporary results of calculations. Registers
Image 14
tin also be used every bit scratch registers, but their contents must be saved before they are used, and restored to their original contents before the process exits.

Some of the registers have alternating names. For example,

Image 15
is as well known every bit
Image 16
. Most of these alternate names are merely of interest to people writing compilers and operating systems. However, two of these registers are of interest to all AArch64 programmers.

3.ii.2 Frame pointer

The frame arrow,

Image 17
, is used past high-level language compilers to track the electric current stack frame. This register tin can exist helpful when the program is running under a debugger, and can sometimes help the compiler to generate more efficient code for returning from a subroutine. The GNU C compiler can be instructed to use
Image 17
as a general-purpose register by using the –fomit-frame-pointer control line option. The employ of
Image 17
as the frame pointer is a programming convention. Some instructions (eastward.k. branches) implicitly modify the programme counter, the link register, and even the stack pointer, so they are considered to exist hardware special registers. Every bit far as the hardware is concerned, the frame pointer is exactly the same as the other general-purpose registers, merely AArch64 programmers use information technology for the frame pointer because of the ABI.

3.two.3 PSTATE register

The

Image 18

annals contains $.25 that indicate the status of the current process, including information well-nigh the results of previous operations. Fig. 3.3 shows all of its $.25. The dashed lines indicate unused space that may exist reserved for time to come AArch64 architectural extensions. The

Image 18

register is actually a collection of independent fields, most of which are simply used by the operating system. User programs make utilise of the first four bits, N, Z, C, and 5. These are referred to as the condition flags field. Well-nigh instructions tin change these flags, and later instructions can use the flags to control their operation. Their meaning is as follows:

Negative:

This flake is prepare to one if the signed issue of an operation is negative, and set to goose egg if the upshot is positive or nix.

Zero:

This bit is set to one if the issue of an operation is zero, and set up to zero if the result is non-nothing.

Carry:

This bit is set to one if an add functioning results in a carry out of the about significant bit, or if a subtract performance results in a borrow. For shift operations, this flag is set to the last bit shifted out by the shifter.

oVerflow:

For addition and subtraction, this flag is set if a signed overflow occurred.

Figure 3.3

Figure three.3. Fields in the PSTATE register.

3.2.4 Link annals

The procedure link register,

Image 5
, is used to concur the return address for subroutines. Sure instructions crusade the program counter to be copied to the link register, then the plan counter is loaded with a new address. These co-operative-and-link instructions are briefly covered in Department 3.5 and in more particular in Section 5.4. The link register could theoretically exist used as a scratch annals, but its contents are modified by hardware when a subroutine is called, in order to salvage the correct return address. Using
Image 5
as a general-purpose register is dangerous and is strongly discouraged.

3.2.5 Stack pointer

The programme stack was introduced in Section 1.iv. The stack pointer,

Image 19
, is used to hold the address where the stack ends. This is commonly referred to as the top of the stack, although on nigh systems the stack grows down and the stack pointer really refers to the everyman address in the stack. The address where the stack ends may change when registers are pushed onto the stack, or when temporary local variables (automatic variables) are allocated or deleted. The apply of the stack for storing automatic variables is described in Chapter 5. The stack pointer tin can only exist modified or read past a pocket-size set of instructions.

three.2.6 Zero register

The zero annals,

Image 20
, can be referred to as a 64-scrap register,
Image 21
, or a 32-bit register,
Image 22
. It always has the value cypher. Well-nigh instructions tin use the zero register equally an operand, fifty-fifty every bit a destination register. If this is the instance, the education volition non change the destination register. However, it can nonetheless have side furnishings, including updating the
Image 18
flags based on the ALU performance and incrementing a register in pre-indexed or post-indexed addressing. The zero register cannot e'er be used as an operand. Information technology shares the same binary encoding with the stack pointer register,
Image 19
, which is the value
Image 23
. Some instructions can admission the zero register, while others can access the stack pointer.

three.2.7 Programme counter

The program counter,

Image 24
, always contains the address of the next instruction that will be executed. The processor increments this register by four, automatically, subsequently each instruction is fetched from retentiveness. By moving an address into this annals, the programmer can crusade the processor to fetch the next didactics from the new accost. This gives the programmer the ability to leap to any address and brainstorm executing code at that place. But a modest number of instructions can admission the
Image 24
directly. For case instructions that create a PC-relative address, such as
Image 25
, and instructions which load a register, such as
Image 26
, are able to access the program counter directly.

Read full affiliate

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780128192214000109

Knights Landing architecture

Jim Jeffers , ... Avinash Sodani , in Intel Xeon Phi Processor Loftier Performance Programming (Second Edition), 2016

Integer execution unit

The IEU executes integer μops, which are defined as those that operate on full general-purpose registers R0–R15 (i.eastward., RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8…R15). There are two IEUs in the core. Each IEU contains 12-entry RS that issues one μop per cycle. The Integer RSes are fully out-of-order in their scheduling. Most operations accept 1-cycle latency and are supported by both IEUs, only a few operations take three- or 5-cycles latency (e.g., multiplies) and are only supported by 1 of the IEUs.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780128091944000041

Computer Data Processing Hardware Architecture

Paul J. Fortier , Howard E. Michel , in Computer Systems Functioning Evaluation and Prediction, 2003

2.3.1 Instruction types

Based on the number of registers available and the configuration of these registers several types of instruction are possible—for example, if many registers are bachelor, as would be the case in a stack estimator, no accost computations are needed and the instruction, therefore, tin be much shorter both in format and execution time required. On the other hand, if there are no general registers and all computations are performed by retention movements of information, then instructions will exist longer and require more than fourth dimension due to operand fetching and storage. The following are representative of educational activity types:

0-address instructions—This type of instruction is establish in machines where many general-purpose registers are available. This is the case in stack machines and in some reduced instruction set machines. Instructions of this type perform their office totally using registers. If we accept three general registers, A, B, and C, a typical format would accept the class:

(2.1) R [ A ] < R [ B ] operator R [ C ]

which indicates that the contents of registers B and C accept the operator (such every bit add, decrease, multiply, etc.) performed on them, with the result stored in general register C. Similarly, nosotros could depict instructions that use just ane or two registers as follows:

(2.2) R [ B ] < R [ B ] operator R [ C ]

or

(ii.3) operator R [ C ]

which represents 2-register and i-register instructions, respectively. In the two-register case i of the operand registers is also used equally the result register. In the single-annals case the operand register is also the upshot register. The increment teaching is an case of one-annals instruction. This type of instruction is found in all machines.

ane-accost instructions—In this type of teaching a unmarried memory address is constitute in the education. If another operand is used, it is typically an accumulator or the top of a stack in a stack reckoner. The typical format of these instructions has the grade:

(2.4) operator M [ address ]

where the contents of the named memory address have the named operator performed on them in conjunction with an unsaid special register. An example of such an teaching could be as follows:

(2.five) Move M [ 100 ]

or

(2.half dozen) Add M [ 100 ]

which moves the contents of retention location 100 into the ALU's accumulator or adds the contents of memory address 100 with the accumulator and stores the result in the accumulator. If the result must be stored in retentiveness, we would need a store pedagogy:

(two.7) Store M [ 100 ]

1-and-fifty/ii-accost instructions—Once nosotros take an architecture that has some full general-purpose registers, we can provide more advanced operations combining memory contents and the general registers. The typical instruction performs an performance on a memory location's contents with that of a full general register—for example, nosotros could add the contents of a memory location with the contents of a general register, A, equally shown:

(2.8) Add R [ A ] , M [ 100 ]

This instruction typically stores the issue in the first named location or register in the teaching. In this example information technology is register A.

ii-address instructions—2 address instructions utilize 2 memory locations to perform an teaching—for example, a block move of N words from one location in memory to another, or a block add. The motility may appear as follows:

(ii.9) Move N , M [ 100 ] , M [ thousand ]

2-and-50/2-address instructions—This format uses ii memory locations and a general annals in the educational activity. Typical of this blazon of teaching is an operation involving two memory locations storing the consequence in a register or an performance with a general register and a memory location storing the issue on some other retentiveness location, as shown:

(ii.x) R [ A ] > > One thousand [ 100 ] operator M [ 1000 ] M [ grand ] > > Thousand [ 100 ] operator R [ A ]

iii-address instructions—Another less common form of education format is the 3-address instruction. These instructions involve three memory locations—two used for operands and one as the results location. A typical format is shown:

(2.11) M [ 200 ] > > M [ 100 ] operator M [ 300 ]

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781555582609500023

Avant-garde Encryption Standard

Tom St Denis , Simon Johnson , in Cryptography for Developers, 2007

x86 Performance

The AMD Opteron achieves a dainty boost due to the addition of the eight new general-purpose registers. If nosotros examine the GCC output for x86_64 and x86_32 platforms, we can see a nice difference between the ii ( Table 4.ii).

Table 4.2. First Quarter of an AES Round

Both snippets accomplish (at least) the kickoff MixColumns stride of the first round in the loop. Notation that the compiler has scheduled part of the second MixColumns during the outset to reach higher parallelism. Even though in Table iv.2 the x86_64 lawmaking looks longer, it executes faster, partially because it processes more than of the second MixColumns in roughly the same fourth dimension and makes good use of the extra registers.

From the x86_32 side, nosotros can clearly come across diverse spills to the stack (in bold). Each of those costs united states of america three cycles (at a minimum) on the AMD processors (2 cycles on about Intel processors). The 64-bit code was compiled to take zero stack spills during the chief loop of rounds. The 32-bit lawmaking has about 15 stack spills during each round, which incurs a penalty of at least 45 cycles per circular or 405 cycles over the course of the 9 total rounds.

Of course, nosotros do not see the total penalty of 405 cycles, every bit more than one opcode is being executed at the aforementioned time. The penalty is too masked by parallel loads that are too on the critical path (such equally loads from the Te tables or round central). Those delays occur anyways, and then the fact that we are as well loading (or storing to) the stack at the aforementioned time does not add together to the bike count.

In either instance, nosotros can meliorate upon the code that GCC (4.1.ane in this case) emits. In the 64-bit lawmaking, we see a pairing of "shrq $24, %rdx" and "and1 $255,%edx". The andl functioning is not required since only the lower 32 bits of %rdx are guaranteed to have annihilation in them. This potentially saves up to 36 cycles over the course of nine rounds (depending on how the andl operation pairs up with other opcodes).

With the 32-bit code, the double loads from (%esp) (lines 2 and 3) incur a needless three-cycle penalty. In the case of the AMD Athlon (and Opterons), the load shop unit will short the load operation (in certain circumstances), but the load volition e'er accept at to the lowest degree three cycles. Irresolute the 2d load to "movl %edx,%ebx" ways that we stall waiting for %edx, but the punishment is just one cycle, not three. That change alone will free upward at most 9*2*4 = 72 cycles from the ix rounds.

Read total chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9781597491044500078

Embedded Processor Compages

Peter Barry , Patrick Crowley , in Modern Embedded Computing, 2012

Annals Operands

Source and destination operands tin be any of the follow registers depending on the didactics being executed:

32-bit full general purpose registers (EAX, EBC, ECX, EDX, ESI, EDI, ESP, or EBP)

sixteen-flake general purpose registers (AX, BX, CX, DX, SI, SP, BP)

8-bit general-purpose registers (AH, BH, CH, DH, AL, BL, CL, DL)

Segment registers

EFLAGS register

MMX

Control (CR0 through CR4)

Arrangement Tabular array registers (such every bit the Interrupt Descriptor Table register)

Debug registers

Machine-specific registers

On RISC embedded processors, there are generally fewer limitations in the registers that can be used by instructions. IA-32 often reduces the registers that can be used equally operands for certain instructions.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123914903000059