

# Training Intel® Processor Tracing

Release 09.2024



## **Training Intel® Processor Tracing**

#### **TRACE32 Online Help**

**TRACE32 Directory** 

**TRACE32 Index** 

| TRACE32 Training                                       |    |
|--------------------------------------------------------|----|
| Training Intel® x86/x64                                |    |
| Training Intel® Processor Tracing                      | 1  |
| Protocol Description                                   | 6  |
| Basic Trace Packets                                    | 6  |
| OS-Aware Tracing                                       | 7  |
| Time Information                                       | 8  |
| Tool Timestamp (POWER TRACE II / POWER TRACE III only) | 8  |
| CycleAccurate Tracing                                  | 10 |
| Synchronization Time                                   | 10 |
| Trace Configuration                                    | 11 |
| Off-chip Trace                                         | 11 |
| SDRAM Trace                                            | 18 |
| Trace Errors                                           | 22 |
| ERRORS                                                 | 22 |
| TARGET FIFO OVERFLOW                                   | 25 |
| TRACE32 Abstractions                                   | 27 |
| SystemTrace                                            | 27 |
| (Core) Trace                                           | 29 |
| Displaying the Trace Contents                          | 31 |
| Influencing Factors on the Trace Information           | 31 |
| Settings in the TRACE32 Trace Configuration Window     | 32 |
| States of the Trace                                    | 43 |
| The AutoInit Command                                   | 44 |
| Basic Display Commands                                 | 45 |
| Default Listing                                        | 45 |
| Basic Formatting                                       | 47 |
| Correlating the Trace Listing with the Source Listing  | 48 |
| Browsing through the Trace Buffer                      | 49 |
| Find a Specific Record                                 | 50 |
| Display Items                                          | 51 |
| Default Display Items                                  | 51 |
| Further Display Items                                  | 54 |
| Belated Trace Analysis                                 | 60 |
|                                                        |    |

| Save the Trace Information to an ASCII File           | 61  |
|-------------------------------------------------------|-----|
| Postprocessing with TRACE32 Instruction Set Simulator | 62  |
| Export STP Byte Stream                                | 67  |
| Trace Control by Filters                              | 68  |
| TraceEnable                                           | 69  |
| TraceOFF                                              | 71  |
| OS-Aware Tracing                                      | 73  |
| Process Switch Packets                                | 73  |
| Program Flow and Process Switches                     | 75  |
| Process Runtime Analysis                              | 76  |
| Time Chart                                            | 77  |
| Statistic                                             | 78  |
| Find Process Switches in the Trace                    | 79  |
| OS-aware Filtering                                    | 81  |
| Filtering by Privilege Level                          | 81  |
| Filtering by Process                                  | 82  |
| Filter on Function executed by Process                | 84  |
| Belated Analysis                                      | 86  |
| Example for Linux                                     | 86  |
| Trace-based Debugging (CTS)                           | 88  |
| Setup                                                 | 88  |
| Get Started                                           | 89  |
| Forward and Backward Debugging                        | 92  |
| Forward Debugging                                     | 92  |
| Backward Debugging                                    | 92  |
| CTS Technique                                         | 93  |
| Function Run-Time Analysis - Basic Concept            | 94  |
| Software under Analysis (no OS, OS or OS+MMU)         | 94  |
| Flat vs. Nesting Analysis                             | 94  |
| Basic Knowledge about Flat Analysis                   | 95  |
| Basic Knowledge about Nesting Analysis                | 96  |
| Summary                                               | 98  |
| Flat Function-Runtime Analysis                        | 99  |
| Function Time Chart                                   | 99  |
| Default Time Chart                                    | 99  |
| Core Options                                          | 100 |
| TASK Options                                          | 101 |
| Function Run-time Statistic                           | 103 |
| Further Commands                                      | 104 |
| Nesting Function Analysis OS                          | 105 |
| Survey                                                | 106 |

| range Column                    | 107 |
|---------------------------------|-----|
| Default Results                 | 110 |
| Net Results                     | 112 |
| Interrupt Details               | 114 |
| Time in Other Tasks             | 115 |
| Tree Display                    | 116 |
| Structure your Trace Evaluation | 117 |
| GROUPs for OS-aware Tracing     | 117 |
| GROUP Status ENable             | 118 |
| GROUP Status ENable+Merge       | 119 |
| GROUP Status Enable+HIDE        | 120 |
| GROUP Creation                  | 121 |

## **Training Intel® Processor Tracing**

Version 05-Oct-2024

#### **Basic Trace Packets**

To enable a trace tool to reconstruct the instruction execution sequence the following trace packets are generated:

#### **TNT** packets

Taken Not Taken packets track the direction of up to 6 conditional branches. Since the address at which the program execution continues when the branch was taken is part of the source code TNT packets provide sufficient information to reconstruct the instruction execution sequence.



#### **Target IP packets**

Ret instructions, register indirect calls and similar instructions as well as exception and interrupts cause the generation of a Target IP packet. Since the address at which the program execution continues is only known at run-time, a Target IP packet contains this address fully or in a compressed format.



#### **OS-Aware Tracing**

#### **Paging Information Packet (PIP)**

x86/x64 processors have a CR3 control register that contains the Process Context Identifier (PCID). On every context switch the corresponding PCID is loaded to CR3.

Intel® PT generates a Paging Information Packet (PIP) when a write to CR3 occurs.



#### Tool Timestamp (POWER TRACE II / POWER TRACE III only)

POWER TRACE II / POWER TRACE III timestamps the trace information when it is stored in its trace buffer.

The resolution of the POWER TRACE II / POWER TRACE III timestamp is 5 ns.

8 trace record have always an identical timestamp. There are two reasons for this:

- The TRACE32 recording technology.
- The smallest Intel<sup>®</sup> PT packet is one byte.



In the standard trace display timestamp information is displayed for the first record with the new timestamp. All following records with an identical timestamp show <0.005us.



#### **CycleAccurate Tracing**

If configured  $Intel^{\textcircled{R}}$  PT can generate cycle count information. The cycle count information indicates how much core clocks it took to execute a program section.



Cycle accurate tracing requires up to 2 times more bandwidth.

#### **Synchronization Time**

Not implemented yet.

#### **Off-chip Trace**

Recording the trace information exported via a PTI (Parallel Trace Interface) requires:

- A POWER TRACE II hardware (1 GByte, 2 GByte or 4 GByte of trace memory) or a POWER TRACE III hardware (4 GByte or 8 GByte of trace memory)
  - TRACE32 PowerView uses the name **Analyzer** to refer to the trace memory within POWER TRACE II / POWER TRACE III.
- An Preprocessor for Intel<sup>®</sup> Atom™ AUTOFOCUS 600 MIPI



The following configuration steps are required for off-chip tracing:

#### 1. Configure Parallel Trace Interface on target.

Configuration is required for:

- PTI port size
- PTI frequency
- GPIO pins used for PTI

The following commands are provided for this purpose:

```
; write <value> to the configuration register addressed by A:<physical address>
; in the specified <format>
PER.Set.simple A:<physical address> %<format> <value>
; write <value> to the memory location addressed by A:<physical address>
; in the specified < format>
Data.Set A:<physical address> %<format> <value>
```

Data.Set is equivalent to PER.Set.simple if the configuration register is memory mapped.

The access class A: allow to use the physical address for the write operations.

```
Per.Set.simple A:0xf9009000 %Long 0x3e715
Data.Set A:0xf9009000 %Long 0x3e715
```

Please refer to your chip manual for the physical addresses of the configuration registers.

#### 2. Configure TRACE32 for a PTI that exports STP (System Trace Protocol) packets.

```
SYStem.CONFIG STM Mode STP64
                                       ; inform TRACE32 that your
                                       ; chip provides a STM that
                                       ; generated 64-bit STPv1
                                       ; packets
STM. PortSize 16.
                                       ; inform TRACE32 that your
                                       ; PTI size is 16 pins
```

#### 3. Inform TRACE32 which core traces you want to analyze.

IPT.TraceID <value> | <bitmask> Specify which masters/channels (that produce Intel® PT trace information) you want to analyze with the help of TRACE32.

| <value></value>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <value> is a 32-bit number. The first 16 bits represent the master ID, the last 16 bits represent the channel ID.</value> |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> | bitmask representation of <value></value>                                                                                 |

**Example 1:** Each core has its own master ID.



**Example 2:** Each core has its own channel ID, all cores use the same master ID.



### 4. Enable Intel® PT on the target and allow TRACE32 to configure it.

IPT.ON

#### Calibrate the *Preprocessor for Intel® Atom™ AUTOFOCUS 600 MIPI* for recording. 5.

TRACE32 supports three methods of generating outputs on the trace lines for calibration.

- On-chip test pattern generator (not tested yet).
- Test executable provided by Lauterbach.
- Application program.

Please be aware that TRACE32 PowerView displays "Analyzer data capture o.k." only if:

- All trace lines toggled while calibration is performed.
- There are no short circuits between the trace lines.
- An error-free trace decoding was possible.

#### Test executable provided by Lauterbach

In order to use the test executable provided by Lauterbach for calibration, the following command sequence is recommended.

```
; example for a free-running clock (Tangier)
AREA.view
                                       ; open TRACE32 Message AREA
                                       ; to observe calibration
                                       : results
Analyzer. THreshold VCC
                                       ; advise TRACE32 to use
                                      ; 1/2 VCC as threshold level
                                       ; for the trace signals
Analyzer.AutoFocus /NoTHreshold
                                      ; start the calibration by
                                       ; using test executable
                                       ; advise TRACE32 to keep
                                       ; the threshold level
```

A manual setup is required if your target is using a gated clock. Refer to "Manual Setup" in AutoFocus User's Guide, page 18 (autofocus user.pdf) for assistance.

#### **Application program**

In order to use the application program for calibration, the following command sequence is recommended

```
; example for a free-running clock (Tangier)
AREA.view
                                      ; open TRACE32 Message AREA
                                      : to observe calibration
                                      : results
Data.LOAD.Elf demo x86.elf /PlusVM
                                      ; download application program
                                      ; to the target,
                                      ; in order to perform trace
                                      ; decoding while the
                                      ; application program is
                                      ; running, the program code
                                      ; has to be copied to the
                                      ; TRACE32 Virtual Memory
Go
                                      ; start the execution of the
                                      ; application program
Analyzer. THreshold VCC
                                      ; advise TRACE32 to use
                                      ; 1/2 VCC as threshold level
                                      ; for the trace signals
Analyzer.AutoFocus /NoTHreshold
                                      ; start the calibration
                                      ; advise TRACE32 to keep
                                      ; the threshold level
WAIT 1.s
                                      ; wait 1 second
Break
                                      ; stop the program execution
```

A manual setup is required if your target is using a gated clock. Refer to "Manual Setup" in AutoFocus User's Guide, page 18 (autofocus user.pdf) for assistance.

After a successful configuration of the off-chip tracing the following command can be used to inspect the STP packet stream:

#### STMAnalyzer.List

Display STP packet stream recorded to POWER TRACE II / POWER TRACE III.



The Intel<sup>®</sup> PT based core traces can be displayed by the following command:

#### Analyzer.List

Display all core trace information decoded out of the STP packet stream.

```
B::Trace.List
                                                                                               🌽 Setup... │ 🗘 Goto... │ 🛉 Find... │ 🛂 Chart │ 📕 Profile │ 📕 MIPS │
                                                                            X Less
                                               cycle
                                                                                           ti.back
    record run address
                                                                            symbol
                                 +0x12
              0
                                                                                                           E
0027244056
                         NP:0000000008048CE0 ptrace
                                                                            ..\sieve+0x51
                                                                                               0.015us
             0
                                            anzahl++;
        700
                 add
                             dword ptr [ebp-0x10],+0x1
        689
             0
                          for ( i = 0 ; i \le SIZE ; i++ )
              0
                 add
                             ebx,+0x1
ebx.+0x12
                             0x8048CBC
              Ō
                 jle
        691
              0
                                   if (flags[i]
                              eax, byte ptr [ebx+0x8049328]
              Ö
                 movzx
              0
                 test
              0
        693
              0
                                            primz = i + i + 3;
                             eax,[ebx+ebx]
edi,[eax+0x3]
                 lea
        694
              0
                                            k = i + primz;
                             esi,[edi+ebx]
                 lea
        695
                                             while ( k <= SIZE )
                             0x8048CDB
              0
                 jmp
```

If the Intel® PT trace information is routed to SDRAM, a fixed amount of memory is assigned to each core. The max. SDRAM size per core is currently 4 MByte.



#### **Configure TRACE32**

1. Advise TRACE32 to read the trace information from SDRAM.

```
Trace.METHOD Onchip
```

TRACE32 reads the onchip trace via JTAG.

Provide further details on the SDRAM configuration to TRACE32. 2.

```
; inform TRACE32 that the SDRAM
Onchip.Buffer IPT
                                     ; provides Intel<sup>®</sup> PT trace
                                     : information
                                     ; inform TRACE32 that the SDRAM
Onchip.Buffer BASE 0x5000000
                                     ; allocated for Intel<sup>®</sup> PT trace
                                     ; starts at address 0x5000000
                                     ; inform TRACE32 that the SDRAM
Onchip.Buffer SIZE 0x1000000
                                     ; allocated for Intel<sup>®</sup>PT trace has
                                     ; a size of 16 MByte
```



3. Enable Intel® PT on the target and allow TRACE32 to configure it.

IPT.ON

If the command **Onchip.List** is used, TRACE32 merges the Intel<sup>®</sup> PT traces from the individual cores as follows:

| SDRAM block core 0             | SDRAM block core 1             | SDRAM block core 2             | SDRAM block core 3             |
|--------------------------------|--------------------------------|--------------------------------|--------------------------------|
| Intel <sup>®</sup> PT packet 1 | Intel <sup>®</sup> PT packet 1 | Intel® PT packet 1             | Intel <sup>®</sup> PT packet 1 |
| Intel <sup>®</sup> PT packet 2 | Intel <sup>®</sup> PT packet 2 | Intel <sup>®</sup> PT packet 2 |                                |
| Intel <sup>®</sup> PT packet 3 | Intel <sup>®</sup> PT packet 3 | Intel® PT packet 3             |                                |
| Intel <sup>®</sup> PT packet 4 | Intel <sup>®</sup> PT packet 4 | Intel <sup>®</sup> PT packet 4 |                                |
| Intel <sup>®</sup> PT packet 5 | Intel <sup>®</sup> PT packet 5 | Intel <sup>®</sup> PT packet 5 |                                |
|                                | Intel <sup>®</sup> PT packet 6 | Intel <sup>®</sup> PT packet 6 |                                |
|                                | Intel® PT packet 7             |                                |                                |
|                                |                                |                                |                                |

#### Onchip.List

| Intel <sup>®</sup> PT packet 1 of core 0 |
|------------------------------------------|
| Intel <sup>®</sup> PT packet 1 of core 1 |
| Intel <sup>®</sup> PT packet 1 of core 2 |
| Intel <sup>®</sup> PT packet 1 of core 3 |
| Intel <sup>®</sup> PT packet 2 of core 0 |
| Intel <sup>®</sup> PT packet 2 of core 1 |
| Intel <sup>®</sup> PT packet 2 of core 2 |
| Intel <sup>®</sup> PT packet 3 of core 0 |
|                                          |

This procedure will change with the decoding of synchronisation packets.

```
Onchip.List
                                       ; display trace listing for all
Onchip.List /CORE 1
                                       ; display trace listing for core 1
```





#### **ERRORS**

If the trace contains ERRORS, please try to set up a proper trace recording before you start to evaluate or analyze the trace contents.

ERRORS can be caused by the following:

• TRACE32 detected an invalid trace packet. TRACE32 additionally displays the error indicator HARDERROR, if it is likely that the error was caused by pin problems.



TRACE32 could not decode the packet.



The trace information is not consistent with the program code in the target memory.



Background: In order to provide an intuitive trace display the following sources of information are merged:

- The trace information recorded.
- The program code from the target memory read via the JTAG interface.
- The symbol and debug information already loaded to TRACE32.



The TRACE32 function Trace.FLOW.ERROR() returns the number of ERRORS as a hex. number.

```
PRINT %Decimal Trace.FLOW.ERRORS() ; display the number of ERRORS ; as a decimal number in the ; TRACE32 PowerView Message Line
```



To find ERRORS in the trace use the keyword FLOWERROR on the Expert page of the Trace Find dialog.



#### **Trace.Find FLOWERROR**

Inside each Intel<sup>®</sup> PT generation module trace packets are queued to a FIFO buffer in order to send them out to the STM/SDRAM.



If trace packets are generated faster than can be sent out, the FIFO buffer can overflow and trace packets are lost.

The affected Intel<sup>®</sup> PT generates a Buffer Overflow packet (FUP.OVF) to indicate that its FIFO is full and trace packets are no longer generated.

A Asynchronous Flow Update packet, that provides the address of the next instruction that will be executed, is generated to indicate that the packet generation now continues.

The TRACE32 function **Trace.FLOW.FIFOFULL()** returns the number of TARGET FIFO OVERFLOWs as a hex. number.

```
PRINT %Decimal Trace.FLOW.FIFOFULL() ; display the number of TARGET ; FIFO OVERFLOWs as a decimal ; number in the TRACE32 ; PowerView Message Line
```

To find TARGET FIFO OVERFLOWs in the trace use the keyword FIFOFULL on the Expert page in the **Trace Find** dialog.



#### Trace.Find FIFOFULL

TARGET FIFO OVERFLOWs are strictly speaking not errors. They can occur in normal operation.

Since gaps in the instruction execution sequence are likely to disturb the nesting trace analyses, TRACE32 explicitly points them out.

#### **SystemTrace**

Depending on where the STP packets are stored, the following TRACE32 command groups can be used to analyze and display these packets:

 STMAnalyzer.<sub\_cmd>, if the STP packets are stored in the trace memory provided by POWER TRACE II / POWER TRACE III.



• **STMLA.**<*sub\_cmd>*, if the STP packets were recorded without a TRACE32 trace tool, and if they were loaded to TRACE32 PowerView for analysis.

The command groups usable in your current configuration can be get from the TRACE32 PowerView Softkey line.



Push **trace** to get access to all command groups that analyze trace information.



Push other to see more command groups.



POWER TRACE II / POWER TRACE III is used in the current configuration, so the command group STMAnalyzer is enabled.



The command group STMLA is always enabled.

TRACE32 PowerView offers the following abstraction, since most <sub\_cmd> are identical for all command groups:

#### SystemTrace.METHOD Analyzer | LA

```
SystemTrace.METHOD Analyzer ; inform TRACE32 PowerView that the ; STP packets are stored in POWER ; TRACE II ; List STP packet stream
```

Depending on where the trace packets are stored, the following TRACE32 command groups can be used to analyze and display the core trace information:

 Analyzer.<sub\_cmd>, if the STP packets are stored in the trace memory provided by POWER TRACE II / POWER TRACE III.



- Onchip.<sub\_cmd>, if the Intel® PT trace packets are stored in the target SDRAM.
- **LA.**<*sub\_cmd>*, if the trace packets were recorded without a TRACE32 trace tool, and if they were loaded to TRACE32 PowerView for analysis.

TRACE32 PowerView offers the following abstraction, since most *<sub\_cmd>* are identical for all command groups:

#### Trace.METHOD Analyzer | Onchip | LA

```
Trace.METHOD Analyzer ; inform TRACE32 PowerView that the ; trace packets are stored in POWER ; TRACE II ; List core trace information
```

Selecting the trace METHOD has the following additional consequences:

All **Trace**.<sub\_cmd> commands offered in the TRACE32 PowerView menu apply to the selected trace METHOD.



TRACE32 is advised to use the trace information from the trace specified by METHOD as source for the trace evaluations of the following command groups:

| COVerage. <sub_cmd></sub_cmd> | Trace-based code coverage     |
|-------------------------------|-------------------------------|
| ISTAT. <sub_cmd></sub_cmd>    | Detailed instruction analysis |
| MIPS. <sub_cmd></sub_cmd>     | MIPS analysis                 |

## **Displaying the Trace Contents**

#### **Influencing Factors on the Trace Information**

The main influencing factor on the trace information is the Intel® PT. It specifies what type of trace information is generated for the user.

Basics about the trace messages are described in "Protocol Description", page 6.

Advanced setting can be found in "Trace Control by Filters", page 68.

Another important influencing factor are the settings in the **TRACE32 Trace Configuration** window. They specify how much trace information can be recorded and when the trace recording is stopped.

### **Settings in the TRACE32 Trace Configuration Window**

The Mode settings in the Trace Configuration window specify how much trace information can be recorded and when the trace recording is stopped.

The following modes are provided, if the **Trace.METHOD Analyzer** is selected:

 Fifo, Stack, Leash Mode: allow to record as much trace records as indicated in the SIZE field of the Trace Configuration window.





• **STREAM Mode:** STREAM mode specifies that the trace information is immediately streamed to a file on the host computer. STREAM mode allows a trace memory size of several T Frames.

PIPE Mode: PIPE mode specifies that the trace information is immediately streamed to a named pipe on the host computer.

PIPE mode creates the path to convey trace raw data to an application outside of TRACE32 PowerView. The named pipe has to be created by the receiving application before TRACE32 can connect to it.

Trace.Mode PIPE

Trace.PipeWrite <pipe\_name> Connect to named pipe

Trace.PipeWrite \\.\pipe\<pipe\_name> Connect to named pipe (Windows)

Trace.PipeWrite Disconnect from named pipe

```
Trace.Mode PIPE
                                       ; switch trace to PIPE mode
Trace.PipeWRITE \\.\pipe\pproto00
                                       ; connect to named pipe
                                       ; (Windows)
Trace.PipeWRITE
                                       ; disconnect from named pipe
```



STP packets (no timestamp) are conveyed in PIPE mode.

If the Trace.METHOD Onchip is selected only Fifo mode can be used:

• **Fifo:** allows to record as much trace records as indicated in the **SIZE** field of the Trace Configuration window.





```
Trace.Mode Fifo ; default mode ; when the trace memory is full ; the newest trace information will ; overwrite the oldest one ; the trace memory contains all ; information generated until the ; program execution stopped
```





In **Fifo** mode negative record numbers are used. The last record gets the smallest negative number.

```
Trace.Mode Stack ; when the trace memory is full ; the trace recording is stopped ; the trace memory contains all ; information generated directly ; after the start of the program ; execution
```



TRACE32 needs to read the program code from the target memory in order to display the core trace information. This is not possible while the program execution is running. This is the reason why the **Trace.List** window indicates **NOACCESS**.



Stop the program execution to allow TRACE32 to read the program code from the target. Or if you need to display the core trace information while the program execution is running, load a copy of the program code to the TRACE32 Virtual Memory.

Data.LOAD.Elf <file> /PlusVM

Load the program code to the target and to the TRACE32 Virtual Memory.

Since the trace recording starts with the program execution and stops, when the trace memory is full, positive record numbers are used in **Stack** mode. The first record in the trace gets the smallest positive number.

```
B::Trace.List
                                                                                             - E X
X Less
record |run |address | |cycle
+0000000073 | 0 | | NP:000000008048B5F | ptrace
                                                                                           ti.back
                                         cycle
                                                                   symbol
                                                                   ..sieve_funcs\main+0x17E
            0
       638
            ŏ
                               func11(5);
                          dword ptr [esp],0x5
0x8048793
            0
               call
            0
            0
               int func11(x)
                                                       /* multiple returns */
            ŏ
               int x;
            ŏ
       438
               push
                          ebp
            0
                          ebp,esp
               mov
            000
       439
                       switch
                          dword ptr [ebp+0x8],+0x6
            ŏ
                      NP:000000000804879C ptrace
+0000000074
                                                                   ..sieve_funcs\func11+0x9
                                                                                             <0.005us
            0
               mov
                          eax,[ebp+0x8]
```

```
Trace.Mode Leash ; when the trace memory is nearly ; full the program execution is ; stopped ; Leash mode uses the same record ; numbering scheme as Stack mode
```





The program execution is **stopped** as soon as the trace buffer is nearly full.

Since stopping the program execution when the trace buffer is nearly full requires some logic/time, **used** is smaller then the maximum **SIZE**.

```
Trace.Mode STREAM ; stream the recorded trace ; information to a file on the host ; computer ; STREAM mode uses the same record ; numbering scheme as Stack mode
```

The trace information is immediately streamed to a file on the host computer after it was placed into the trace memory. This procedure extends the size of the trace memory to several T Frames.

• STREAM mode requires a 64-bit host computer and a 64-bit TRACE32 executable to handle the large trace record numbers.

By default the streaming file is placed into the TRACE32 temp. directory (OS.PresentTemporaryDirectory()).

The command **Trace.STREAMFILE** *<file>* allows to specify a different name and location for the streaming file.

```
Trace.STREAMFILE d:\temp\mystream.t32 ; specify the location for ; your streaming file
```

TRACE32 stops the streaming when less then 1 GByte free memory is left on the drive by default.

The command Trace.STREAMFileLimit <+/- limit in bytes> allows a user-defined free memory limitation.

```
Trace.STREAMFileLimit 5000000000. ; streaming file is limited to ; 5 GByte

Trace.STREAMFileLimit -5000000000. ; streaming is stopped when less ; the 5 GByte free memory is left ; on the drive
```

Please be aware that the streaming file is deleted as soon as you de-select the STREAM mode or when you exit TRACE32.

At high data rates your host computer might have problems saving the trace data to the streaming file. The command **Trace.STREAMCompression** allow to configure a better compression.

Trace.STREAMCompression HIGH

#### In STREAM mode the **used** field is split:

Number of records buffered by the trace memory of POWER TRACE II / POWER TRACE III



Number of records saved to streaming file



STREAM mode can generate very large record numbers

STREAM mode can only be used if the average data rate at the trace port does not exceed the maximum transmission rate of the host interface in use. Peak loads at the trace port are intercepted by the memory in POWER TRACE III / POWER TRACE III, which can be considered to be operating as a large FIFO.

If the average data rate at the trace port exceeds the maximum transmission rate of the host interface in use, a **PowerTrace FIFO Overrun** occurs. TRACE32 stops streaming and empties the POWER TRACE II / POWER TRACE III FIFO. Streaming is re-started after the POWER TRACE II / POWER TRACE III FIFO is empty.

#### A **PowerTrace FIFO Overrun** is indicated as follows:

1. A ! in the **used** area of the Trace Configuration window indicates an overrun of the POWER TRACE II / POWER TRACE III FIFO.



2. The OVERRUN is indicated in all trace display windows.



OVERRUNs are not visible at record level.

A large ti.back value (tool timestamp only) can be considered as an OVERRUN indicator.

```
Trace.FindAll TIme.Back 10.s--50.s ; find all trace records with ; a timestamp between 10.s and ; 50.s
```

The trace buffer can either sample or allows the read-out for information display.



| States of the Trace |                                                                              |
|---------------------|------------------------------------------------------------------------------|
| DISable             | The trace is disabled.                                                       |
| OFF                 | The trace is not sampling. The trace contents can be analyzed and displayed. |
| Arm                 | The trace is sampling. There is no access to the trace contents.             |

The current state of the trace is always indicated in the **Trace State** field of the TRACE32 state line.



Since Intel<sup>®</sup> PT does not provide a mean to indicate a trigger, the Trace states **trigger** and **break** are never reached.

## **The AutoInit Command**



| Init Button       | Clear the trace memory. All other settings in the Trace configuration window remain valid. |
|-------------------|--------------------------------------------------------------------------------------------|
| AutoInit CheckBox | ON: The trace memory is cleared whenever the program execution is started (Go, Step).      |

## **Basic Display Commands**

## **Default Listing**



The trace information for all cores is displayed by default in the **Trace.List** window. The column run and the coloring of the trace information are used for core indication.



**Trace.List /CORE** <*n>* The option CORE allows a per core display of the trace information.

```
B::Trace.List /CORE 0.
                                                                                                                                                       _ - X
🌽 Setup... 📭 Goto... 📫 Find... Mart Mart Mare MIPS 春 More 🛣 Less
                                                                                                                                                      ti.back
                        dress cycle data symbol csd_lock/csd_unlock used to serialize access to per-cpu csd resources
     record | run | address
                                                                                                                                                                    E
                        For non-synchronous ipi calls the csd can still be in use by the previous function call. For multi-cpu calls its even more interesting as we'll have to ensure no other cpu is observing our csd.
                    static void csd_lock_wait(struct call_single_data *csd)
                              while (csd->flags & CSD_FLAG_LOCK)
         105
                    test
                                  byte ptr [r12+0x20
-0000000207
                          XP:FFFF:FFFFFF820C7988 ptrace
                                                                                                                                                         2.680us
                                                                                           \\vmlinux\kernel/smp\generic_exec_single+0x78
                              cpuid(op, &eax, &ebx, &ecx, &edx);
                              return edx;
```

## **Basic Formatting**



| 1. time Less | Suppress the display of the ptrace packets. |
|--------------|---------------------------------------------|
| 2. time Less | Suppress the display of the assembly code.  |

The *More* button works vice versa.

## **Correlating the Trace Listing with the Source Listing**





Tracking between the Trace Listing and the Source Listing is based on the program addresses.

## **Browsing through the Trace Buffer**



| Pg↑         | Scroll page up.                                     |
|-------------|-----------------------------------------------------|
| Pg ↓        | Scroll page down.                                   |
| Ctrl - Pg ↑ | Go to the first record sampled in the trace buffer. |
| Ctrl - Pg ↓ | Go to the last record sampled in the trace buffer.  |

The **Trace.List** window provides a "**Find...**" button to open the **Trace Find** dialog. The **Trace Find** dialog allows to search for events of interest in the trace.



**Example:** Find the entry to the function func10.



A detailed description of the **Trace Find dialog** can be found in "**Application Note for Trace.Find**" (app\_trace\_find.pdf).

### **Default Display Items**



#### Column record

Displays the record numbers

#### Column run

The column run displays some graphic element to provide a quick overview on the instruction execution sequence.

Trace.List List.ADDRESS DEFault



The column run also indicates Interrupts and TRAPs.

```
B::Trace.List
                                                                                                                                                                               - - X
🌽 Setup... 🔼 Goto... 📫 Find... Mark Mark Mark Miles 🗮 MIPS 🗘 More 🛣 Less
                    run address cycle data symbo

0 static void csd_lock_wait(struct call_single_data *csd)

0 while (csd->flags & CSD_FLAG_LOCK)

0 test byte ptr [r12+0x20] 0v1
                                                                                                                                                                              ti.back
                                                                                                                                                                                               E
                                      while (csd->flags & CSD_FLAG_LOCK)
byte ptr [r12+0x20],0x1
            105
                         test byte ptr [r12+0x20],0x1
interrupt
XP:FFFF:FFFFFFF8284AD80 ptrace
nop reboot_interrupt smp_reboot_
-0193607871
                     0000000
                                                                                                                \\vmlinux\Global\apic_timer_interrupt
                                                                                                                                                                                  0.120us
                                      reboot_interrupt smp_reboot_interrupt
                         #endif
                         #ifdef CONFIG_X86_UV
apicinterrupt UV_BAU_MESSAGE \
    uv_bau_message_intr1 uv_bau_message_interrupt
```

### Column cycle

The main cycle type is:

- ptrace (program trace information)

### • Column address/symbol



The **address column** shows the following information:

<access class>:<address>

| Access Classes |                                          |
|----------------|------------------------------------------|
| NP             | Program address in 32-bit Protected Mode |
| ХР             | Program address in 64-bit mode           |

Information on the other available access classes can be found in "Intel® x86/x64 Debugger" (debugger\_x86.pdf).

The **symbol column** shows the corresponding symbolic address.

#### Column ti.back



The **ti.back column** shows the time distance to the previous timestamped record.

For details on the TRACE32 tool timestamp refer to "Tool Timestamp (POWER TRACE II / POWER TRACE III only)", page 8.

#### Time Information

| Time.Back | Time relative to the previous record (red) |
|-----------|--------------------------------------------|
| Time.Fore | Time relative to the next record (green).  |
| TIme.Zero | Time relative to the global zero point.    |

Trace.List TIme.Back TIme.Fore TIme.Zero Address CYcle sYmbol



### Set the Global Zero Point (PowerTrace II only)



**ZERO.RESet** 

(tool timestamp only)

Tlme.Zero is the zero point of the timestamp counter commonly used by all TRACE32 hardware modules.

ZERO.offset <time>

TIme.Zero is the zero point of the timestamp counter commonly used by all TRACE32 hardware modules minus the specified *<time>*.

```
PRINT Trace.RECORD.TIME(-99.) ; print the timestamp of ; record -99.

ZERO.offset Trace.RECORD.TIME(-99.) ; specify the time of record ; -99. as global zero point
```

## Cycle Accurate Tracing Pros.

Provides how much core clocks it took to execute a program section.

Allows to synchronize traces from different trace sources if Time Synchronization packets are available (not implemented yet).

## • Cycle Accurate Tracing Cons.

Cycle accurate tracing requires up 2 times more bandwidth.

### Cycle accurate tracing and changing core clock while recording

Cycle accurate tracing has to be enabled in the IPT configuration window.



### **IPT.CycleAccurate ON**

Advise Intel® PT to generate cycle count information.

```
; advise TRACE32 to display a trace listing with
; cycle count information (CLOCKS.Back)
; advise TRACE32 to suppress the display
; of the timestamp information (TIme.Back.OFF)
Trace.List CLOCKS.Back DEFault TIme.Back.OFF
```

The following command allows to specify this display as default for the **Trace.List** window.

SETUP.ALIST CLOCKS.Back DEFault TIme.Back.OFF



TRACE32 displays the warning above when the recorded trace information is analyzed and displayed the first time. This warning points out that all displayed time information (Tlme.Back, Tlme.Zero) might be inaccurate.



Cycle count information relative to the previous record

; advise TRACE32 to display a trace listing with the decoded trace packet ; (TPINFO)

Trace.List TPINFO DEFault



## **Belated Trace Analysis**

There are several ways for a belated trace analysis:

1. Save a part of the trace contents into a file (ASCII, CSV or XML format) and analyze this trace contents outside of TRACE32 PowerView.



- 2. Save the trace contents in a compact format into a file. Load the trace contents at a subsequent date into a TRACE32 Instruction Set Simulator and analyze it there.
- 3. Export the STP byte stream to postprocess it with an external tool.

Saving a part of the trace contents to an ASCII file requires the following steps:

1. Select **Printer Setting...** in the **File** menu to specify the file name and the output format.





```
PRinTer.FileType ASCIIE ; specify output format ; here enhanced ASCII

PRinTer.FILE test_run.lst ; specify the file name
```

2. It might make sense to save only a part of the trace contents into the file. Use the record numbers to specify the trace part you are interested in.

TRACE32 provides the command prefix **WinPrint.** to redirect the result of a display command into a file.

```
; save the trace record range (-8976.)--(-2418.) into the ; specified file WinPrint.Trace.List (-8976.)--(-2418.)
```

Analyze the result outside of TRACE32.

1. Save the contents of the trace memory into a file.



The default extension for the trace file is .ad.

Trace.SAVE testrun1

### 2. Start a TRACE32 Instruction Set Simulator (PBI=SIM).





3. Select your target CPU within the simulator. Then establish the communication between TRACE32 and the simulator.





4. Load the trace file.





Trace.LOAD testrun

#### 5. Display the trace contents.



**LOAD** indicates that the source for the trace information is the loaded file.

### 6. Load symbol and debug information if you need it.

```
Data.LOAD.Elf sieve_funcs_x86.elf /NoCODE
```

The TRACE32 Instruction Set Simulator provides the same trace display and analysis commands as the TRACE32 debugger.



Postprocessing of recorded trace information with the TRACE32 Instruction Set Simulator becomes more complex if an operating system that uses dynamic memory management to handle processes/task is used (e.g. Linux).

### **Script version**

Save the trace contents in the recording TRACE32 instance:

```
Trace.SAVE testrun.ad
```

Prepare the TRACE32 Instruction Set Simulator for off-line processing of the trace information:

```
SYStem.CPU TANGIER
SYStem.Up
Trace.LOAD testrun.ad
Data.LOAD.Elf sieve_funcs_x86.elf /NoCODE
Trace.List
```

Trace.EXPORT.TracePort <file> Export trace raw data (no timestamps).

SystemTrace.EXPORT.TracePort mytest1.ad

# **Trace Control by Filters**

Intel® PT provides 2 address ranges for trace control. The smallest range size is 4 bytes.

TRACE32 PowerView provides access to these address ranges by the action field in the Break.Set dialog.



The 2 address ranges can be used for the following purposes:

TraceEnable advises Intel® PT to generate program flow information for the specified address range only.

**TraceOFF** advises Intel<sup>®</sup> PT to stop the generation of program flow information as soon as a specified address range is reached.

Both filters are programmed to all Intel<sup>®</sup> PT in an SMP configuration.

**Example 1:** Advise Intel<sup>®</sup> PT to generate program flow information only for function func10.

 Set a Program breakpoint to the address range of func10 and select the action TraceEnable.



- 2. Start and stop the program execution.
- 3. Display the result.

TRACE ENABLE indicates the start of the message generation. It might be necessary to search for it.





Break.Delete /ALL Var.Break.Set func10 /Program /TraceEnable Go Break Trace.List

**Example 2:** Advise Intel<sup>®</sup> PT to stop the generation of program flow information as soon as function func10 is entered.

1. Set a Program breakpoint to the start address of func10 plus 5 bytes and select the action TraceOFF.



### 2. Start the program execution.

TRACE32 has, unfortunately, no way to detect that Intel<sup>®</sup> PT stopped the generation of trace information.

**Off-chip trace:** Since the Analyzer is recording STP packets, **used** could increase, because other trace sources continue generating STP packets.



Onchip trace: TRACE32 can not read the filling level of the onchip trace while recording.

- 3. Stop the program execution.
- 4. Display the result.



```
Break.Delete /ALL
Break.Set func10++0x5 /Program /TraceOFF
Go
Break
Trace.List
```

# **OS-Aware Tracing**

OS-aware tracing requires that OS-aware debugging is configured. For more information refer to "OS-aware Debugging" (trace32 concepts.pdf).

## **Process Switch Packets**

x86/x64 processors have a CR3 control register that contains the Process Context Identifier (PCID). On every context switch the corresponding PCID is loaded to CR3.

Intel® PT generates a Paging Information Packet (PIP) when a write to CR3 occurs.

```
B::Trace.List TPINFO List.TASK DEFault
                                                                                                                                                                - P X
cycle
                                                     run address
                                                          pop
                                                                                                                                                                             r13
                                                          pop
                                                          #ifdef CONFIG_SMP
                                                                        this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
dword ptr gs:[0x12BC8],0x1
          40
                                                                     if (likely(prev != next)) {
           38
                                                                         r12,r13
0xFFFFFFFF82847FA2
               Task: netd (FFFF880038452A30) — Task: netd (FFFF880038452A30) — TNT a=NTNTNT
-0985525675
-0985525669
                                                                   XP:00BD:FFFFFFFF82847C1A ptrac
                                                                                                                                        _schedule+0x2EA
                                                                        vobb.rrrrrrrroco*(cLA ptrace
    this_cpu_write(cpu_tlbstate.active_mm, next);
qword ptr gs:[0x128C0],r12: qword ptr gs:cpu_tlbstate,r12
    "iq" ((u8)CONST_MASK(nr))
    "memory");
                                                                    } else {
    asm volatile(LOCK_PREFIX "bts %1,%0"
           70
                                                                         [r12+0x2F0],eax
```

TRACE32 names the cycle type **owner** if the PCID loaded to CR3 can be assigned to a process.

The command **TASK.List.tasks** can be used the check all assignments currently known to TRACE32. The **traceid** represents the PCID in this display.





TRACE32 names the cycle type **context** if the PCID loaded to CR3 can not be assigned to a process.

The fact that the PCID can not be assigned to a process results in the following,:

• Since TRACE32 does not require the PCID to decode trace information for the common address range, full trace decoding is possible.

```
; command in the setup for the OS Awareness
TRANSlation.COMMON 0xffff88000000000--0xfffffffffffffff
```

• For all other address ranges a decoding of the trace information is not possible. The cycle type **unknown** is used for trace information that can not be decoded.



#### NOTE:

The Real-Time Instruction Trace (RTIT), doesn't feature the process switching packets. If multiple user space applications are traced, it is only possible to decode the trace packets of the kernel. The cycle type unknown is used for the user space trace packets. For decoding the trace packets of a user application, it is necessary to filter the process of interest using the CR3 filter.

RTIT was implemented on very few devices, then it was extended to the Intel Processor Trace which supports the process switching trace. The RTIT trace is also covered by TRACE32 using the IPT command group.

## **Program Flow and Process Switches**



```
- - X
B::Trace.List List.TASK DEFault
🌽 Setup.... 📭 Goto... 📑 Find... 🚰 Chart 🕍 Profile 🕍 MIPS 🗘 ♣ More 🛣 Less
                                                              cycle
                                                                                                                                   ti.back
                run address
                                   rsi,rdx
rdx,rsi
                    cmp
                                                                                                                                                             ↓ □ ↓ ↓
                               /* tlb_flushall_shift is on balance point, details in commit log */
if ((end - start) >> PAGE_SHIFT > act_entries >> tlb_flushall_shift)
edx,cl
rax,rdx
0xFFFFFFF82034468
                 1
1
1
1
         214
                    shr
                    cmp
                             task: netd (FFFF880038452A30) -
                              owner FFFF880038452A30 vP:00BD:FFFFFFF82034468 ptrace
-0987128735
                                                                                                                                      <0.005us
                 1
1
1
-0987128729
                                                                                              ..\flush_tlb_mm_range+0x1F8
                     static inline unsigned long native_read_cr3(void)
                 1
1
1
                               unsigned long val;
asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
```

```
Trace.List List.TASK DEFault
                                       ; display trace listing with
                                       ; decoded task switch information
```

#### NOTE:

This is a process switch analysis, since Paging Information Packets (PIP) only indicate process switches, but no thread switches.



Threads do not have their own traceid





**contextid:** <a href="mailto:contextid">contextid:<a href="mailto:contextid">crace\_id</a> can not be assigned to a process.



The recording time before the first Paging Information Packet (PIP) is assigned to the (unknown) task.





# Trace.STATistic.TASK [/SplitCORE] Process runtime statistic - numerical display - split the results per core - sort the results per recording order Trace.STATistic.TASK /MergeCORE Process runtime statistic - numerical display - merge the results of all cores Trace information is analyzed independently for each core. The statistic summarizes these results to a single result.

## **Find Process Switches in the Trace**

1. Open a process time chart window and a trace listing with decoded process switch information. Link both windows by using the /Track option.



```
Trace.Chart.TASK /Track ; open process time chart ; window

Trace.List List.TASK DEFault /Track ; open a default trace ; listing that includes ; process information

; both windows use the /Track option ; a window opened with the /Track option follows the cursor movement ; of the active window

; tracking between trace windows is based on the timestamp ; information
```



```
[B::Trace.List List.TASK DEFault /Track]
                                                                                           - - X
🌽 Setup... | ♠ Goto... | ♣ Find... | ♣ More
    record | run | address
                                                  cycle
                                                                                        ti.back
                                                          data
                                  this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
         40
              1
                                                                                                    E
                            dword ptr gs:[0x12BC8],0x1
                 mov
                          if (likely(prev != next)) {
              ī
         38
                            r12,r13
0xffffffff82847FA2
              1
                       task: com.android.phone (FFFF8800226C10E0)
+0388837026
+0388837056
                                                                                          <0.005us
                                                          FFFF8800226C10E0
                                                  owner
                        XP:0000:FFFFFFF82180850 ptrace
                                                                            ..\pollwake
              0
              0
                 static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
        208
```

2. Use the arrow keys • of the process of interest to move to next state change.



## Filtering by Privilege Level

Intel® PT can be advised to generate program flow information only for:

- privilege level 0
- all privilege levels greater than 0

**Example:** Advise Intel<sup>®</sup> PT to generate only program flow information for privilege level 0.

1. Uncheck TraceUSER in the IPT configuration window.



- 2. Start and stop the program execution.
- 3. Display the result.

```
IPT.state

IPT.TraceUSER OFF

Go

...
Trace.List
```

Intel® PT can be advised to generate program flow information only for a process of interest.

**Example:** Advise Intel<sup>®</sup> PT to generate only program flow information for the process "logcat".

1. Program the CR3 filter via the IPT window.

```
IPT.state ; open IPT configuration ; window
```



```
; specify the <trace_id> of the process "logcat"
IPT.CR3 0x386D9000
; IPT.CR3 TASK.PROC.NAME2TRACEID("logcat")
```

TASK.PROC.NAME2TRACEID(cess\_name>)

Returns the *<trace\_id>* of specified process.

- 2. Start and stop the program execution.
- 3. Display the result.



TRACE ENABLE indicates the re-start of the program flow trace generation.

Please be aware that TRACE32 decodes all trace information for the process specified in the command IPT.CR3 <trace\_id>. Intel® PT does not generate Paging Information Packet (PIP) in this scenario.

**Example:** Advise Intel<sup>®</sup> PT to generate only program flow information when the function "logger\_poll" is running in the context of the process "logcat".

1. Set a Program breakpoint to the address range of the function logger\_poll and select the action TraceEnable.



2. Program the CR3 filter via the IPT window.



IPT.CR3 TASK.PROC.NAME2TRACEID("logcat")

- 3. Start and stop the program execution.
- 4. Display the result.

```
- - X
B::Trace.List List.TASK DEFault
🌽 Setup... | 🔼 Goto... | 👬 Find... | ⚠ Chart | 🌉 Profile | 🕍 MIPS | 🗘 More | 🗶 Less
    record | run | address
                                                   cycle
                                                                                                                  ti.back
                 · III
              1
1
1
        662
+0000738672
                                                                              \\vmlinux\logger\logger_poll+0x87
                                                                                                                     <0.005us
              1
1
1
1
1
1
1
1
                         return ret;
        665
                            eax,ebx
rbx
r12
r13
rbp
                 mov
pop
pop
                 pop
                 TRACE ENABLE XP:0102:FFFFFFF8262C9E0 ptrace
              1
                                                                                                                      0.080us
+0000738686
                                                                              \\vmlinux\logger\logger_poll
```

TRACE ENABLE indicates the re-start of the program flow trace generation.

## **Belated Analysis**

Postprocessing of recorded trace information with the TRACE32 Instruction Set Simulator requires complex preparations if an operating system that uses dynamic memory management to handle processes is used (e.g. Linux).

The following information has to be store after recording and re-loaded to the TRACE32 Instruction Set Simulator:

- The recorded trace information
- The whole kernel address space (code and data)
- The core registers
- All MMU-related registers
- The settings of the Debugger Address Translation (TRACE32 command group: TRANSlation)

## **Example for Linux**

The **Generate RAM Dump** command in the **Linux** menu provides a store framework. It generates a **CMM file** that summarizes all commands for the TRACE32 Instruction Set Simulator.





If you start a TRACE32 Instruction Set Simulator and run the generated script, the recorded trace information can be analyzed there. Please be aware that additional settings might be necessary e.g. the specification of the search paths for the C/C++ sources.



# **Trace-based Debugging (CTS)**

Trace-based debugging allows to re-run the recorded program section within TRACE32 PowerView.

## **Setup**

Since Intel<sup>®</sup> PT does not provide any information on read/write accesses, **UseMemory** has to be unchecked. A full explanation on this is given later in the chapter "CTS Technique", page 93.





**CTS.UseFinalMemory OFF** 

Specify the starting point for the trace re-run by selecting **Set CTS** from the Trace pull-down menu. The starting point in the example below is the entry to the function **activate\_task** executed by core 1.



Selecting **Set CTS** has the following effect:

• TRACE32 PowerView will use the preceding trace packet as starting point for the trace re-run.



 The TRACE32 PowerView GUI does no longer show the current state of the target system, but it shows the target state as it was, when the starting point instruction was executed. This display mode is called CTS View.

#### CTS View means:

- The instruction pointers of all cores are set to the values they had when the starting point instruction was executed.
- The content of the core registers of all cores is reconstructed (as far as possible) to the values they had when the starting point instruction was executed. If TRACE32 can not reconstruct the content of a register it is displayed as empty.

TRACE32 PowerView uses a yellow look-and-feel to indicate CTS View.

The **Off** button in the source listing can be used to switch off the CTS View.







TRACE32 PowerView displays the state of the target as it was when the instruction of the trace record -470020435.0 was executed

## Forward and Backward Debugging

Now you can start to re-run the recorded program section within TRACE32 PowerView by forward or backward debugging.



## **Forward Debugging**



## **Backward Debugging**





CTS.UseFinalMemory ON Default setting within TRACE32

If **CTS.UseFinalMemory** is ON and TRACE32 detects that a memory address was not changed by the recorded program section, TRACE32 PowerView displays the current content of this memory in CTS display mode.

Since Intel<sup>®</sup> PT does not provide any information on read/write accesses and since most read/write accesses are done by using an indirect address, TRACE32 can not detect which memory content was changed. This is the reason why **CTS.UseFinalMemory** has to be set to OFF.

If **CTS.UseFinalMemory** is switch OFF, but your memory contains constants, you can configure TRACE32 to use these constants by the following commands:

MAP.CONST <address\_range>

**CTS.MapConst ON** 

CTS.UseFinalContext ON Default setting within TRACE32

If CTS.UseFinalContext is ON and TRACE32 detects that a register was not changed by the recorded program section, TRACE32 PowerView displays the current content of this register in CTS display mode.

CTS.UseFinalContext has to be set to OFF, if you are using Stack mode for tracing.

## **Function Run-Time Analysis - Basic Concept**

## Software under Analysis (no OS, OS or OS+MMU)

For the use of the function run-time analysis it is helpful to differentiate between three types of application software:

- 1. Software without operating system (abbreviation: **no OS**)
- 2. Software that includes an operating system (abbreviation: **OS**)
- 3. Software with an operating system that uses dynamic memory management to handle processes/tasks (**OS+MMU**).

## Flat vs. Nesting Analysis

TRACE32 provides two methods to analyze function run-times:

- Flat analysis
- Nesting analysis

The flat function run-time analysis bases on the symbolic instruction addresses of the trace entries. The time spent by an instruction is assigned to the corresponding function/symbol region.





| min | shortest time continuously in the address range of the function/symbol region |
|-----|-------------------------------------------------------------------------------|
| max | longest time continuously in the address range of the function/symbol region  |

The function nesting analysis analyses only high-level language functions.





In order to display a nesting function run-time analysis TRACE32 analyzes the structure of the program execution by processing the trace information. The focus is put on the transition between functions (see picture above). The following events are of interest:

- 1. Function entries
- 2. Function exits
- 3. Entries to interrupt service routines
- 4. Exits of interrupt service routines
- 5. Entries to TRAP handlers
- 6. Exits of TRAP handlers



| min | shortest time within the function including all subfunctions and traps |
|-----|------------------------------------------------------------------------|
| max | longest time within the function including all subfunctions and traps  |

## **Summary**

The nesting analysis provides more details on the structure and the timing of the program run, but it is much more sensitive then the flat analysis. Missing or tricky function entries/exits may require additional setups before nesting analysis can be used.

# **Flat Function-Runtime Analysis**

NOTE:

As long a TRACE32 does not support Synchronisation Time, cycle accurate tracing should be disabled for all kind of runtime measurement.

## **Function Time Chart**

#### **Default Time Chart**

Pushing the **Chart** button in the **Trace.List** window opens a **Trace.Chart.sYmbol** window



Trace.Chart.sYmbol [/SplitCore /Sort CoreTogether]

Flat function run-time analysis

- graphical display
- split the result per core
- sort results per core and then per recording order



### Trace.Chart.sYmbol [/SplitCORE] /Sort CoreSeparated

Flat function run-time analysis

- graphical display
- split the result per core
- sort the results per recording order



#### Trace.Chart.sYmbol /MergeCORE

Flat function run-time analysis

- graphical display
- merge the results of all cores

Trace information is analyzed independently for each core. The time chart summarizes these results to a single result.

### Trace.Chart.sYmbol /SplitTASK

Display function time chart including process information (OS, OS+MMU only)



| @ <task_name></task_name> | Process name information                                                   |
|---------------------------|----------------------------------------------------------------------------|
| @(unknown)                | Function was recorded before first process switch information was recorded |

#### Trace.Chart.sYmbol /TASK <name>

Display function time chart for specified process (OS, OS+MMU only)



| @ <task_name></task_name> | Process name information                     |
|---------------------------|----------------------------------------------|
| (root)@(unknown)          | Everything outside of the specified process. |



If **Window** in the **Sort visible** field is switched ON in the **Chart Config** window, the functions that are active at the selected point of time are visualized in the scope of the **Trace.Chart.sYmbol** window. This is helpful especially if you scroll horizontally.

Analog to the timing diagram there is also a numerical analysis.



| survey  |                                                                                                                                           |
|---------|-------------------------------------------------------------------------------------------------------------------------------------------|
| item    | number of recorded functions/symbol regions                                                                                               |
| total   | time period recorded by the trace                                                                                                         |
| samples | total number of recorded changes of functions/symbol regions (program flow continuously in the address range of a function/symbol region) |

| function details |                                                                                                          |
|------------------|----------------------------------------------------------------------------------------------------------|
| address          | function/symbol region name (here per core)  (other) program sections that can not be assigned to a      |
|                  | function/symbol region                                                                                   |
| total            | time period in the function/symbol region during the recorded time period                                |
| min              | shortest time continuously in the address range of the function/symbol region                            |
| max              | longest time continuously in the address range of the function/symbol region                             |
| avr              | average time continuously in the address range of the function/symbol region (calculated by total/count) |

| count | number of new entries (start address executed) into the address range of the function/symbol region |
|-------|-----------------------------------------------------------------------------------------------------|
| ratio | ratio of time in the function/symbol region with regards to the total time period recorded          |

Pushing the **Config** button provides the possibility to specify a different column layout and a different sorting criterion for the address column. By default the functions/symbol regions are sorted by their recording order.



## **Further Commands**

| Trace.PROfileChart.sYmbol | Display dynamic program behavior graphically.     |
|---------------------------|---------------------------------------------------|
| MIPS.PROfileChart.sYmbol  | Display MIPS for all program symbols graphically. |
| MIPS.STATistic.sYmbol     | Display MIPS for all program symbols numerically. |

# **Nesting Function Analysis OS**

Function nesting analysis for OS requires that OS-aware debugging is configured. For more information refer to "OS-aware Debugging" (trace32 concepts.pdf).

#### Trace.STATistic.Func

Nesting function run-time analysis

- numeric display
- core information is discarded exceptions are the @(unknown) task and the @(interrupt) task







| survey |                                          |
|--------|------------------------------------------|
| func   | number of functions in the trace         |
| total  | total measurement time                   |
| intr   | total time in interrupt service routines |

OVERFLOW funcs: 174. total: 1.392s intr: 902.302us stack overflow at 186892464.

| survey (issue indication)               |                                                                                                                                                                                                                                                                                    |
|-----------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| stopped: <time></time>                  | The analyzed trace recording contains program stops. <time> indicates the total time the program execution was stopped.</time>                                                                                                                                                     |
| <number> problems</number>              | The nesting analysis contains problems. Please contact support@lauterbach.com.                                                                                                                                                                                                     |
| <number> workarounds</number>           | The nesting analysis contains issues, but TRACE32 found solutions for them. It is recommended to perform a sanity check on the proposed solutions.                                                                                                                                 |
| stack overflow at<br><record></record>  | The nesting analysis exceeds the nesting level 200. It is highly likely that the function exit for an often called function is missing. The command Trace.STATistic.TREE can help you to identify the function. If you need further help please contact support@lauterbach.com.    |
| stack underflow at<br><record></record> | The nesting analysis exceeds the nesting level 200. It is highly likely that the function entry for an often executed function is missing. The command Trace.STATistic.TREE can help you to identify the function. If you need further help please contact support@lauterbach.com. |



| columns      |                                                           |
|--------------|-----------------------------------------------------------|
| range (NAME) | function name, sorted by their recording order as default |

\\vmlinux\hrtimer\hrtimer\_cancel@logcat

HLL function hrtimer\_cancel running in process @logcat.

Please be aware that no core information is provided for processes and their functions.

Nesting function run-time analysis can also be performed per process.

#### Trace.STATistic.Func /TASK <task\_magic> | <task\_name> | <task\_id>

Trace.STATistic.Func /TASK "logcat"



Interrupt service routines are assigned to the @(interrupt) task. Core information is provided for the @(interrupt) task.



An arrow before the interrupt function indicates the function executed after the interrupt occurred:

→\\vmlinux\Global\apic\_timer\_interrupt@(interrupt):1

#### The unknown Task

All function recorded before the first process switch is recorded are assigned to the @(unknown) task. Core information is provided for the @(unknown) task.





| columns (cont.) |                                                                                                                                                                             |
|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| total           | total time within the function                                                                                                                                              |
| min             | shortest time between function entry and exit, time spent in interrupt service routines is excluded  No <b>min</b> time is displayed if a function exit was never executed. |
|                 | No min time is displayed if a function exit was never executed.                                                                                                             |
| max             | longest time between function entry and exit, time spent in interrupt service routines is excluded                                                                          |
| avr             | average time between function entry and exit, time spent in interrupt service routines is excluded                                                                          |



| columns (cont.) |                                     |
|-----------------|-------------------------------------|
| count           | number of times within the function |

If function entries or exits are missing, this is displayed in the following format:

<times within the function >. (<number of missing function entries>/<number of missing function exits>).

3671. (0/1)

### Interpretation examples:

- 1. 2. (2/0): 2 times within the function, 2 function entries missing
- 2. 4. (0/3): 4 times within the function, 3 function exits missing
- 3. 11. (1/1): 11 times within the function, 1 function entry and 1 function exit is missing.



If the number of missing function entries or exits is higher the 1 the analysis performed by the command **Trace.STATistic.Func** might fail due to nesting problems. A detailed view to the trace contents is recommended.

| columns (cont.)                                |                                                                                              |
|------------------------------------------------|----------------------------------------------------------------------------------------------|
| intern%<br>(InternalRatio,<br>InternalBAR.LOG) | ratio of time within the function without subfunctions, TRAP handlers, interrupts (net time) |







| columns (cont.) - times only in function |                                                                                                                                            |
|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
| Internal                                 | total time between function entry and exit without called sub-functions, TRAP handlers, interrupt service routines                         |
| IAVeRage                                 | average time between function entry and exit without called sub-<br>functions, TRAP handlers, interrupt service routines                   |
| IMIN                                     | shortest time between function entry and exit without called sub-<br>functions, TRAP handlers, interrupt service routines                  |
| IMAX                                     | longest time spent in the function between function entry and exit without called sub-functions, TRAP handlers, interrupt service routines |
| InternalRatio                            | <internal function="" of="" time="">/<total measurement="" time=""> as a numeric value.</total></internal>                                 |
| InternalBAR                              | <internal function="" of="" time="">/<total measurement="" time=""> graphically.</total></internal>                                        |



| columns (cont.) - interrupt times |                                                                 |  |
|-----------------------------------|-----------------------------------------------------------------|--|
| ExternalINTR                      | total time the function was interrupted                         |  |
| ExternalINTRMAX                   | max. time one function pass was interrupted                     |  |
| INTRCount                         | number of interrupts that occurred during the function run-time |  |

# **Time in Other Tasks**



| columns - process related information |                                                          |  |
|---------------------------------------|----------------------------------------------------------|--|
| TASKCount                             | number of tasks that interrupt the function/task         |  |
| ExternalTASK                          | total time in other tasks                                |  |
| ExternalTASKMAX                       | max. time 1 function/task pass was interrupted by a task |  |

#### Trace.STATistic.TREE

## Nesting function run-time analysis

- tree display





It is also possible to get a task/process-specific tree.

Trace.STATistic.TREE /TASK "rild"

# **GROUPs for OS-aware Tracing**

TRACE32 PowerView provides the GROUP command to structure the trace evaluation.

If you use a target OS such a Linux, the following groups are created by the Lauterbach scripts and Lauterbach OS menus:

- A GROUP "kernel", color RED, to mark the OS kernel.
- A GROUP "droid", color BLUE, to mark virtual machine byte code e.g. Android/Dalvik.
- A GROUP process\_name
   per process, color GREEN.
- A GROUPs <module name> per kernel module, color YELLOW.



A group can have the following statuses:

- enable
- enable + merge
- enable + hide



#### If a GROUP is enabled:

 The trace information recorded for the group members is marked with the color assigned to the group.



Group-based trace analyses commands are provided e.g. Trace.STATistic.GROUP.



# **GROUP Status ENable+Merge**



### If a GROUP is enabled and merge is checked:

 The group represents its members in all trace analysis windows. No details about group members are displayed.



### **GROUP Status Enable+HIDE**



#### If a GROUP is enabled and hide is checked:

- The group represents its members in all trace analysis windows. No details about group members are displayed (same as merge checked).
- The trace information recorded for the group members is hidden in the Trace.List window.



### **GROUP Creation**

The GROUPs "kernel" and "droid" are typically created in the start-up script that sets up the OS-aware debugging.

GROUP.Create <group name> <address range> I < color>





For more details about GROUPs refer to the **GROUP** command group.