

Marcelo Zuffo





#### Locality of Reference

In computer science, **locality of reference**, also known as the **principle of locality**, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time:

temporal and spatial locality. Temporal locality refers to the reuse of specific data, and/or resources, within a relatively small time duration. Spatial locality (also termed *data locality*) refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as, traversing the elements in a one-dimensional array.

#### Hierarquia de Memória



### Hierarquia de Memória



Rede, o sistema de conexão, sensores e atuadores

#### Sistemas Embarcados Complexos



Rede, o sistema de conexão

### Hierarquia de Memória em Sistemas Embarcados

Diferentemente de um computador tradicional onde busca-se o desempenho computacional, num sistema embarcado é fundamental o entendimento da hierarquia de memória com fins de busca pelo DETERMINISMO, afinal, precisamos em sistemas embarcados críticos um total controle do comportamento do sistema.

#### Role of Memory in Embedded Systems

Traditional roles: Storage and Communication for Programs

Communication with Sensors and Actuators

Often much more constrained than in general-purpose computing

Size, power, reliability, etc.

Can be important for programmers to understand these constraints

#### Memory Architecture: Issues

These issues loom larger in embedded systems than in general-purpose computing.

- Types of memory
  - volatile vs. non-volatile, SRAM vs. DRAM
- Memory maps
  - Harvard architecture
  - Memory-mapped I/O
- Memory organization
  - statically allocated
  - stacks
  - heaps (allocation, fragmentation, garbage collection)
- The memory model of C
- Memory hierarchies
  - scratchpads, caches, virtual memory)
- Memory protection
  - segmented spaces

#### Non-Volatile Memory

#### Preserves contents when power is off

- EPROM: erasable programmable read only memory
  - Invented by Dov Frohman of Intel in 1971
  - Erase by exposing the chip to strong UV light



- EEPROM: electrically erasable programmable read-only memory
  - Invented by George Perlegos at Intel in 1978
- Flash memory
  - Invented by Dr. Fujio Masuoka at Toshiba around 1980
  - Erased a "block" at a time
  - Limited number of program/erase cycles (~ 100,000)
  - Controllers can get quite complex
- Disk drives
  - Not as well suited for embedded systems



#### **Volatile Memory**

Loses contents when power is off.

- SRAM: static random-access memory
  - Fast, deterministic access time
  - But more power hungry and less dense than DRAM
  - Used for caches, scratchpads, and small embedded memories
- DRAM: dynamic random-access memory
  - Slower than SRAM
  - Access time depends on the sequence of addresses
  - Denser than SRAM (higher capacity)
  - Requires periodic refresh (typically every 64msec)
  - Typically used for main memory
- Boot loader
  - On power up, transfers data from non-volatile to volatile memory.

#### Example:

Die of a STM32F103VGT6 ARM Cortex-M3 microcontroller with 1 megabyte flash memory by STMicroelectronics.



Image from Wikimedia Commons



#### Another Example: AVR



The AVR is an 8-bit single chip microcontroller first developed by Atmel in 1997. The AVR was one of the first microcontroller families to use on-chip flash memory for program storage. It has a modified Harvard architecture.<sup>1</sup>

AVR was conceived by two students at the Norwegian Institute of Technology (NTH) Alf-Egil Bogen and Vegard Wollan, who approached Atmel in Silicon Valley to produce it.

<sup>&</sup>lt;sup>1</sup> A Harvard architecture uses separate memory spaces for program and data. It originated with the Harvard Mark I relay-based computer (used during World War II), which stored the program on punched tape (24 bits wide) and the data in electro-mechanical counters.

#### A Use of AVR: Arduino

Arduino is a family of open-source hardware boards built around either 8-bit AVR processors or 32-bit ARM

processors.

Example:
Atmel AVR
Atmega328
28-pin DIP on an
Arduino Duemilanove
board



Open-Source
Hardware and
the maker
movement



Massimo Banzi, founder of the Arduino project at Ivrea, Italy, and Limor Fried, owner and founder of Adafruit, showing one of the first board Arduino Uno from the production lines of Adafruit.

[http://www.open-electronics.org]



Atmel ATMega 168 Microcontroller





## ATMega 168: An 8-bit microcontroller with 16-bit addresses





AVR microcontroller architecture used in iRobot command module.

Why is it called an 8-bit microcontroller?

### ATMega168 Memory Architecture An 8-bit microcontroller with 16-bit addresses





#### Questions to test your understanding

1. What is the difference between an 8-bit microcontroller and a 32-bit microcontroller?

2. Why use volatile memory? Why not always use non-volatile memory?

# Memory Organization for Programs from a logical (abstract) perspective

- Statically-allocated memory
  - Compiler chooses the address at which to store a variable.
- Stack
  - Dynamically allocated memory with a Last-in, First-out (LIFO) strategy
- Heap
  - Dynamically allocated memory

#### Statically-Allocated Memory in C

```
char x; /* 8 bit type global static allocated
variable x defined */
int main(void) {
  x = 0x20;
  ...
}
```

Compiler chooses what address to use for x, and the variable is accessible across procedures. The variable's lifetime is the total duration of the program execution.

#### Statically-Allocated Memory with Limited Scope

```
void foo(void) {
   static char x; /* 8-bit static variable x */
   x = 0x20;
   ...
}
```

Compiler chooses what address to use for x, but the variable is meant to be accessible only in foo(). The variable's lifetime is the total duration of the program execution (values persist across calls to foo()).

## Variables on the Stack ("automatic variables")

When the procedure foo is called, x is assigned an address on the stack (by decrementing the stack pointer). When the procedure returns, the memory is freed (by incrementing the stack pointer). The variable persists only for the duration of the call to foo().

#### Question 1



### What is meant by the following C code:

```
char x;
void foo(void) {
    x = 0x20;
    ...
}
```

#### **Answer 1**



### What is meant by the following C code:

```
char x;
void foo(void) {
    x = 0x20;
    ...
}
```

An 8-bit quantity (hex 0x20) is stored at an address in statically allocated memory in internal RAM determined by the compiler.

#### Question 2



### What is meant by the following C code:

```
char *x;
void foo(void) {
    x = 0x20;
    ...
}
```

#### Answer 2



### What is meant by the following C code:

```
char *x;
void foo(void) {
    x = 0x20;
    ...
}
```

An 16-bit quantity (hex 0x0020), or pointer, is stored at an address in statically allocated memory in internal RAM determined by the compiler.

#### Question 3



### What is meant by the following C code:

```
char *x, y;
void foo(void) {
  x = 0x20;
  y = *x;
```

#### Answer 3



### What is meant by the following C code:

```
char *x, y;
void foo(void) {
    x = 0x20;
    y = *x;
    ...
}
```

The 8-bit quantity in the I/O register at location 0x20 is loaded into y, which is at a location in internal SRAM determined by the compiler.

#### Question 4



```
char foo() {
  char *x, y;
  x = 0x20;
  y = *x;
  return y;
char z;
int main(void) {
  z = foo();
```

Where are x, y, z in memory?

#### **Answer 4**



```
char foo() {
  char *x, y;
  x = 0x20;
  V = *X;
  return y;
char z;
int main(void) {
  z = foo();
```

x occupies 2 bytes on the stack, y occupies 1 byte on the stack, and z occupies 1 byte in static memory.

#### Question 5



### What is meant by the following C code:

```
void foo(void) {
  char *x, y;
  x = &y;
  *_{X} = 0_{X}20;
```

#### Answer 5



### What is meant by the following C code:

```
void foo(void) {
  char *x, y;
  x = &y;
  *x = 0x20;
  ...
}
```

16 bits for x and 8 bits for y are allocated on the stack, then x is loaded with the address of y, and then y is loaded with the 8-bit quantity 0x20.

#### Question 6

#### Data Bus 8-bit Status Program Flash Counter and Control Program Memory Interrupt 32 x 8 Unit Instruction General Purpose Register SPI Registrers Unit Instruction Watchdog Decoder Timer Indirect Addressing Direct Addressing ALU Analog Control Lines Comparator I/O Module1 Data I/O Module 2 SRAM I/O Module n **Data Memory EEPROM** 0x0000 - 0x001F 32 Registers 64 I/O Registers 0x0020 - 0x005F I/O Lines 160 Ext I/O Rea. 0x0060 - 0x00FF 0x0100 Internal SRAM (512/1024/1024 x 8) 0x02FF/0x04FF/0x04FF

### What goes into z in the following program:

```
char foo() {
char y;
uint16 t x;
x = 0x20;
y = *x;
return y;
char z;
int main(void) {
z = foo();
```

#### Answer 6



### What goes into z in the following program:

```
char foo() {
char y;
uint16 t x;
x = 0x20;
y = *x;
return y;
char z;
int main(void) {
z = foo();
```

z is loaded with the 8-bit quantity in the I/O register at location 0x20.

#### A Linguagem C e a hierarquia de memória

a Linguagem, apesar de ter sido inventada há mais de 40 anos, é uma linguagem de programação EXTREMAMENTE poderosa para a programação da hierarquia de memória em todos os seus aspectos, para tanto é necessário que o programador do sistema embarcado ENTENDA a organização do processador e seu respectivo espaço de endereçamento.

Infelizmente este tipo de programação DEPENDE MUITO DA ARQUITETURA INTERNA DO SISTEMA EMBARCADO, ou seja, para cada arquitetura um programa diferente em C

### Quiz: Find the flaw in this program

(begin by thinking about where each variable is allocated)

```
int x = 2;
int* foo(int y) {
  int z;
  z = y * x;
  return &z;
int main(void) {
  int* result = foo(10);
```

### Solution: Find the flaw in this program

```
statically allocated: compiler assigns a memory location.
int x = 2;
                               arguments on the stack
int* foo(int y)
  int z;
                                 automatic variables on the stack
  return &z;
int main(void)
  int* result \neq foo(10);
                                                   program counter, argument 10,
                                                   and z go on the stack (and
                                                   possibly more, depending on the
                                                   compiler).
```

The procedure foo() returns a pointer to a variable on the stack. What if another procedure call (or interrupt) occurs before the returned pointer is de-referenced?

## Watch out for Recursion!! Quiz: What is the Final Value of z?



```
void foo(uint16 t x)
char y;
v = *x;
if (x > 0x100) {
     foo(x - 1);
char z;
void main(...) {
z = 0x10;
foo(0x04FF);
```

#### **Data Memory**

## Dynamically-Allocated Memory The Heap



An operating system typically offers a way to dynamically allocate memory on a "heap".

Memory management (malloc() and free()) can lead to **many problems** with embedded systems:

- Memory leaks (allocated memory is never freed)
- Memory fragmentation (allocatable pieces get smaller)

Automatic techniques ("garbage collection") often require stopping everything and reorganizing the allocated memory. This is deadly for real-time programs.

## Problems of Malloc() and Free() in Embedded Systems

memory violation; memory leak! memory exhaustion;

memory management!

ALL RELATED TO NON-DETERMINISM

POr isso evitamos usar memórias dinâmicas em sistemas embarcados críticos.

## Memory Hierarchies

- Memory hierarchy
  - Cache:
    - A subset of memory addresses is mapped to SRAM
    - Accessing an address not in SRAM results in cache miss
    - A miss is handled by copying contents of DRAM to SRAM
  - Scratchpad:
    - SRAM and DRAM occupy disjoint regions of memory space
    - Software manages what is stored where
- Segmentation
  - Logical addresses are mapped to a subset of physical addresses
  - Permissions regulate which tasks can access which memory

## Memory Hierarchy



Here, the cache or scratchpad, main memory, and disk or flash share the same address space.

## Memory Hierarchy



Here, each distinct piece of memory hardware has its own segment of the address space.

This requires more careful software design, but gives more direct control over timing.

## **Direct-Mapped** Cache

main memory, updating the line.

Valid Tag Set 0 **Block** A "set" consists of one "line" Set 1 Tag Valid **Block** t bits s bits b bits Tag Set index Block offset m-1**Address** If the tag of the address Tag Valid **Block** Set S matches the tag of the line, then we have a "cache hit." Otherwise, the fetch goes to

1 valid bit

CACHE

EECS 149/249A, UC Berkeley: 45

t tag bits  $B = 2^b$  bytes per block



EECS 149/249A, UC Berkeley: 46



EECS 149/249A, UC Berkeley: 47

# Your Lab Hardware (2014 - 2016)

myRIO 1950/1900 (National Instruments)



#### Xilinx Zynq Z-7010

- ARM Cortex-A9 MPCore dual core processor
  - Real-time Linux
- Xilinx Artix-7 FPGA
  - Preconfigured with a 32-bit MicroBlaze microprocessor running without an operating system ("bare metal").



## Xilinx Zynq

Dual-core ARM processor + FPGA + rich I/O on a single chip.



#### Microblaze I/O Architecture





Figure 2-1: MicroBlaze Core Block Diagram

#### Microblaze I/O Architecture

Source: Xilinx

M\_AXI\_DP: Peripheral Data Interface, AXI4-Lite or AXI4 interface

DPLB: Data interface, Processor Local Bus

DLMB: Data interface, Local Memory Bus (BRAM only)

M AXI IP: Peripheral Instruction interface, AXI4-Lite interface

IPLB: Instruction interface, Processor Local Bus

ILMB: Instruction interface, Local Memory Bus (BRAM only)

M0 AXIS..M15 AXIS: AXI4-Stream interface master direct connection interfaces

S0\_AXIS..S15\_AXIS: AXI4-Stream interface slave direct connection interfaces

MFSL 0..15: FSL master interfaces

DWFSL 0..15: FSL master direct connection interfaces

SFSL 0..15: FSL slave interfaces

DRFSL 0..15: FSL slave direct connection interfaces

DXCL: Data side Xilinx CacheLink interface (FSL master/slave pair)

M AXI DC: Data side cache AXI4 interface

IXCL: Instruction side Xilinx CacheLink interface (FSL master/slave pair)

M AXI IC: Instruction side cache AXI4 interface

Core: Miscellaneous signals for: clock, reset, debug, and trace



Figure 2-1: MicroBlaze Core Block Diagram

### Berkeley Microblaze Personality Memory Map





### Conclusion

Understanding memory architectures is essential to programming embedded systems.

C is the language paradigm for programming such systems