Lecture 10 – Ch 7 General Purpose Embedded Cores

Ryan Robucci

• Spacebar to advance through slides in order
• Shift-Spacebar to go back
• Arrow keys for navigation

• ESC/O-Key to see slide overview
• ? to see help

Printable Version

Table of Contents

References

Composition of Notes

Slides with (**) are noted for core lecture coverage
Other slide may be covered in discussion session

The μ\muP is the most successful prog. hardware (**)

Typical Microprocessor Toolchain (**)

†Schaumont

Cross Compiler (**)

Elements of Assembly

†Schaumont

Object Code Dump

†Schaumont
* micro-processor works with **object code**, binary opcodes generated out of assembly programs.

RISC Pipeline (**)

†Schaumont

Stalling and Hazards

Control Hazard

†Schaumont

Delayed Branch Instruction

†Schaumont

Data Hazards and Data Forwarding

Structural Hazard

†Schaumont

Predictable Timing is a Key Consideration for Embedded Systems (**)

Program Organization - Data Types

Examining Data Storage

Variables in the Memory Hierarchy (**)

†Schaumont

Transparency of Memory in implementing basic C Program

Storage Class Specifiers and Type Qualifiers

Function Calls

/usr/local/arm/bin/arm-linux-gcc -O2 -c accumulate.c -o accumulate.o
/usr/local/arm/bin/arm-linux-objdump -d accumulate.o

C

int accumulate (int a[10]) {
  int i;
  int c = 0;
  for (i=0; i<10;i++)
    c += a[i];
  return c;
}

int a[10];
int one = 1;
int main() {
  return one + accumulate(a);
}

\dagger

†Schaumont




i=i+1→
test for final iteration→

conditional       
return (pc=lr)→
else repeat →


save link register into
current stack frame→

branch into function→
•lr=pc
•pc=func. address












00000000 <accumulate>:
 0: e3a01000    mov r1, #0
 4: e1a02001    mov r2, r1
 8: e7903102    ldr r3, [r0,r2,lsl #2]]
 c: e2822001    add r2,r2,#1
10: e3520009    cmp r2,#9
14: e0811003    add r1,r1,r3
18: c1a00001    movgt r0, r1
1c: c1a0f00e    movgt pc, lr
20: ea000000    b 8 <accumulate+0x8>

00000024 <main>:

24: e52de004    str lr, [sp,-#4]
28: e59f0014    ldr r0, [pc, #20] ; 44 <main+0x20>
2c: ebfffffe    bl 0 <accumulate>


30: e59£2010    ldr r2, [pc, #16] ; 48 <main+0x24>
34: e5923000    ldr r3, [r2]
38: e0833000    add r3, r3, r0
3C: e1a00003    mov r0, r3
40: e49df004    ldr pc, [sp], #4



← get value from array at [r0]
                   with offset r2*4


← accumulate
← conditional save return
               sum value into r0




← pass array location using r0






← one+(sum stored in r0)

text pg 210: (slightly modified)

“Close inspection of the instructions reveals many practical aspects of the runtime layout of this program, and in particular of the implementation of function calls.” instruction that branches into function accumulate is implemented at address 0x2c with a bl instruction - branch with link. copies the program counter into into separate link register lr, and loads the address of the branch target into the program counter. A return-from-subroutine can now be implemented by copying the link register back into the program counter. This is shown at address 0x1c in accumulate. Obviously, care must be taken when making nested subroutine calls so that lr is not overwritten. In the main function, this is solved at the entry, at address 0x24. There is an instruction that copies the current contents of lr into a local area within the stack, and at the end of the main function the program counter is directly read from the same location. The arguments and return value of the accumulate function are passed through register r0 rather than main memory. This is obviously much faster, appropriate when only a few data elements need to be copied. The input argument of accumulate is the base address from the array a. Indeed, the instruction on address 8 uses r0 as a base address and adds the loop counter multiplied by 4. This expression thus results in the effective address of element a [i] as shown on line 5 of the C program (Listing 7.4). The return argument from accumulate is register r0 as well. On address 0x18 of the assembly program, the accumulator value is passed from r1 to r0.

Stack Frame

Full-Fledged Stack Frame

pg. 212:
The instructions on lines 2 and 3 are used to create the stack frame. On line 3, the frame pointer (fp), stack pointer (sp), link register or return address (lr) and current program counter (pc) are pushed onto the stack. The single instruction stmfd is able to perform multiple transfers (m), and it grows the stack downward (fd). These four elements take up 16 bytes of stack memory.
On line 3, the frame pointer is made to point to the first word of the stack frame. All variables stored in the stack frame will now be referenced based on the frame pointer fp. Since the first four words in the stack frame are already occupied, the first free word is at address fp - 16, the next free word is at address fp – 20 and so on. These addresses may be found back in Listing 7.6.
The following local variables of the function accumulate are stored within the stack frame: the base address of a, the variable i, and the variable c. Finally on line 32, a return instruction is shown. With a single instruction, the frame pointer fp, the stack pointer sp, and the program counter pc are restored to the values just before calling the accumulate function.

†Schaumont

Program Layout

ELF Layout

Run Time Memory Layout

Stack and Heap Collision

Examining Size

Examining Sections

Examining Symbols

Examining assembly code

Application Binary Interface (ABI)

https://en.wikipedia.org/wiki/Application_binary_interface

Embedded-application binary interface

https://en.wikipedia.org/wiki/Application_binary_interface

Processor Simulation (**)

Instruction Set Simulator vs. Cycle-Accurate Simulator (**)

Need for simulation (**)

Example control method for selective capture and reporting

int gcd (int a, int b){
  while (!=b) {
    if (a > b)
      a = a - b;
    else
      b = b - a;
  }
  return a;
}

void instructiontrace(unsigned a){}
  asm("swi 514");
}

int main() {
  instructiontrace(1);
  a = gcd(6, 8);
  instructiontrace(0);
  printf("GCD = \%d\n", a);
  return 0;
}

\dagger

/usr/local/arm/bin/arm-linux-gcc -static -S gcd.c -o gcd.S
†Schaumont

Example Information Report from Simulator

  • Example listing using SimIt-ARM
    • Cycle: cycle count at the fetch
    • Addr: location in program memory
    • Opcode: the instruction opcode
    • P: Pipeline miss-speculation. 1 indicates removal from pipeline
    • I: Instruction-cache miss.
    • D: Data-cache miss
    • Time: total time instruction is in the pipeline
    • Mnemonic: Assembly code
†Schaumont
  • Can understand significance and causes cache misses, stalls (type of hazard), canceled instructions, etc..
  • Can redesign code, compilation options, processor, caches sizes, etc..
†Schaumont

Fig 7.12 Mapping of address 0x8524 in a 32-set, 16-line, 32-bytes-per-line set-associative cache

Use of HDL for processor simulation (**)

†Schaumont

Summary

(**)