ARM Logo

On-Target Trace and Profiling

Non-Intrusive Tracing and Profiling of Linux Kernel using the CoreSight Access Library

CoreSight Trace enables you to non-intrusively collect the sequence of instructions that were executed on the target platform – which is really useful when trying to debug thorny real-time issues, or when trying to optimize your code.

In this tutorial, we’ll be using the CoreSight Access Library on a Cortex-A9-based ST-Ericsson Snowball board designed by Calao Systems, running Linux Ubuntu, to get a trace of instruction execution and to profile the usage of functions within the Linux kernel itself. The user-space example could be modified for real-time “flight-recorder” monitoring, or for post-mortem crash analysis.

Here’s a screenshot showing the result of a trace capture – you can see the actual sequence of instructions executed in the kernel, the corresponding function in the C source file, and which functions occupied the most time.

An example Trace of the Linux Kernel in DS-5 Debugger

In the navigational timeline, the colour coding is a "heat" map showing the executed instructions and the amount of instructions each function executes in each timeline - the darker red showing more instructions and the lighter yellow showing fewer instructions.

At a scale of 1:1 the colour scheme changes to display memory access instructions as a darker red colour, branch instructions as a medium orange colour, and all the other instructions as a lighter green colour.


What can CoreSight trace do?

Trace is available on a variety of ARM processors, for example, the Cortex-A9 processor core can provide a trace interface to an (optional) CoreSight Program Trace Macrocell (PTM) that allows tracing of program flow (instructions only, not data).

The PTM outputs the raw trace information to an on-chip Embedded Trace Buffer (ETB) or in more recent devices to an on-chip Embedded Trace FIFO (ETF), or to a parallel trace port via the Trace Port Interface Unit (TPIU) which can be connected to external trace-capable tools.


Example CoreSight System

Example CoreSight system

Trace data captured in ETB/ETF can be read via JTAG using probes such as the ARM DSTREAM and Keil ULINK™ probes. Alternatively, trace data can be collected and stored by user code on the target for retrieval later, without the need for any JTAG probe.

Debuggers can retrieve the captured data to display a complete trace execution history, including branches, exceptions, and whether conditional instructions were executed or skipped. Timestamps, Context IDs and VMIDs can also be captured on higher-end devices.

The debugger can then match the instructions executed to the source code via ELF/DWARF debug information, to show which functions were being executed over time – code profiling. And because CoreSight trace is completely non-intrusive, the code runs at full speed on the target – capturing trace does not alter any timing aspects of the code, unlike other debug methods such as inserting printf() statements.


CoreSight Access Library

The CoreSight Access Library provided with DS-5 enables user code to interact directly with CoreSight devices on your target. This allows, for example, program execution trace to be captured in a production system without the need to have an external debugger connected.

The trace can be saved to a file on the target at run-time, and the saved trace can be retrieved later and loaded into DS-5 Debugger for analysis.

The library supports a number of different CoreSight components on several target boards – check the readme to see the current list. You can modify it to support other CoreSight components or other boards. The library can be used in a bare-metal or Linux environment.

The library offers a flexible programming interface allowing a variety of use cases and experimentation. It also offers some advantages compared to a register-level interface. For example, it can:

  • Manage any unlocking and locking of CoreSight devices via the lock register, OS Lock register, programming bit, power-down bit.
  • Attempt to ensure that the devices are programmed correctly and in a suitable sequence.
  • Handle variations between devices (for example, between variants of ETM/PTMs), and where necessary, work around known issues.
  • Become aware of the trace bus topology and will generally manage trace links automatically. For example enabling only funnel ports in use.
  • Manage “claim bits” that coordinate internal and external use of CoreSight devices.

Using the CoreSight Access Library Example

An example Linux application ("tracedemo") is provided in DS-5 that uses the CoreSight Access Library. As it runs, tracedemo creates several files on the target, including the captured trace. After running, the saved files can copied from the Linux target to your host computer, either using a secure copy program such as scp (Windows versions of this Linux command are available, such as pscp as provided with PuTTY), or using Remote System Explorer as provided in DS-5 Debugger.


Importing the Example

The sources are provided in [DS-5 install dir]\examples\CoreSight_Access_Library.zip.

First, import the example into the DS-5 Eclipse workspace as follows:

  1. Launch Eclipse for DS-5
  2. Close the Welcome screen, if it appears
  3. Go to 'File' menu and select 'Import...'. The Import wizard opens
  4. Open the 'General' folder
  5. Select 'Existing Projects into Workspace'
  6. Press 'Next'
  7. Press 'Browse...' next to 'Select archive file:'. A file dialog opens
  8. Navigate to [DS-5 install dir]\examples, select Coresight_Access_Library.zip, then press 'OK'. The 'Projects:' field populates with the project(s) found in the .zip
  9. Press 'Finish'. The 'Project' view populates with the project(s)

Next, boot Linux on your target and login (preferably as root). Then copy the CoreSight_access directory from the workspace to a suitable writable directory on your Linux target, either using Remote System Explorer as provided in DS-5 Debugger or using a secure copy program such as scp (Windows versions of this Linux command are available, such as pscp as provided with PuTTY). You’ll need to get the IP address of the target to do this, using ‘ipconfig’.

View of Snowball’s file-system in Remote System Explorer


Configuring the Example for your Target

Before you can use the example on your target, you may have to modify some of the example source files to suit your target:

  • If your target is not one of the platforms already supported, then you will need to add a "registration" function into csregistration.c
  • To configure the range of addresses to be traced in the kernel, open tracedemo.c and modify the values of:
    • KERNEL_TRACE_SIZE is the extent of the region to trace
    • KERNEL_TRACE_VIRTUAL_ADDR is the virtual address of the start of the region to trace, i.e. corresponding to addresses in the vmlinux file
    • KERNEL_TRACE_PHYSICAL_ADDR is the physical address (e.g. in RAM) of the start of the region to trace, normally a fixed offset from KERNEL_TRACE_VIRTUAL_ADDR

One way to identify the addresses of candidate regions/functions to trace is to use, for example:

grep cpu_idle /proc/kallsyms


Configuring your Target for the Example

Before using the example on your target, check the following:

  • For tracedemo to use mmap()to read kernel memory (which it then dumps to a file kernel_dump.bin that can be read by DS-5 Debugger), the kernel must have been built without "CONFIG_STRICT_DEVMEM=y". You can check this by typing: zgrep "CONFIG_STRICT_DEVMEM" /proc/config.gz If the kernel has been built with "CONFIG_STRICT_DEVMEM=y", then you must either rebuild it with "CONFIG_STRICT_DEVMEM=n", or dump the kernel memory manually by some other means, for example, use DS-5 Debugger and DSTREAM to dump this kernel memory from the target (with, for example, the CLI command "dump memory kernel_dump.bin 0xC000D000 +0x5000"), or extract this kernel memory from the kernel Image file.
  • To view the disassembly and trace augmented with symbols (e.g. function names), the vmlinux file for the kernel must be available, built with debug symbols. To map the trace and disassembly back to kernel source files, the kernel source tree must also be available. The vmlinux file with debug symbols is created when the kernel is built with debug info "CONFIG_DEBUG_INFO=y". You can check how your kernel was built by typing: zgrep "CONFIG_DEBUG_INFO" /proc/config.gz If using ‘make menuconfig’ to rebuild the kernel, select the following under ‘Kernel Hacking’:
    
    [*] Kernel debugging 
           [*] Compile the kernel with debug info

Building the example and library

The supplied makefile expects the example and library to be built natively on the target; minor modifications will be needed if they are built using a cross-compiler. The example and library can be built by typing:

make tracedemo

You should see, for example:

gcc -O2 -Wall -Wno-switch -g -o tracedemo.o -c tracedemo.c
gcc -O2 -Wall -Wno-switch -g -o csaccess.o -c csaccess.c
ar cr libcsaccess.a csaccess.o
gcc -O2 -Wall -Wno-switch -g -o csregister.o -c csregistration.c
gcc -o tracedemo tracedemo.o libcsaccess.a csregister.o

Building the API documentation

To create the API documentation, you need to have installed Doxygen. Then run:

make docs

The resulting API documentation can then be found in doc/html.


Running the Example

Run the tracedemo Linux application as root with:

./tracedemo

This will auto-detect the target board (if supported), configure trace generation, and retrieve trace from the trace buffer.

To view the possible run options, add "-h" for help:

./tracedemo -h

You will see output like:

Usage:
-h              This help screen.
-c <cpu>        Select CPU for demo to run on. Default 0
-itm            Enable ITM tracing, ITM tracing disabled by default.
-full           Show full trace with no filtering.
-pause          Run the demo with a pause after each step.

If you see:

** csaccess: ERROR: can't open /dev/mem

then try running it as sudo:

sudo ./tracedemo

You will see output like this sample captured from a Snowball board.

Depending on the configuration, tracedemo creates a number of output files such as snapshot.ini, cpu_0.ini, device_0.ini, device_1.ini, trace.ini, kernel_dump.bin and cstrace.bin. These files can be loaded into DS-5 Debugger to analyze the collected trace, as described in the next section. Copy these files from the target into a suitable writable directory on your host, either using Remote System Explorer or scp (or pscp).

To create a standalone package of files for decoding by DS-5 Debugger, for example to give to someone else, collect the files above, together with vmlinux, and the kernel source tree.


Analyzing the collected trace in DS-5 Debugger

The files created by tracedemo, after copying to the host, can be loaded into DS-5 Debugger to analyze the collected trace as follows. Ready-made example files captured from a session on a Snowball target are provided in the \example_capture directory that can be used for test or demo purposes. A ready-made debug config launch file Snapshot-kernel-Snowball-example is also provided.

  1. Launch Eclipse for DS-5
  2. Go to Window menu and select Open Perspective → DS-5 Debug
  3. Go to Run menu and select Debug Configurations.... The Debug Configurations dialog opens
  4. Open the DS-5 Debugger node
  5. Create a new debug configuration and give it a name
  6. In the Connection tab tree view, select your target and the "Snapshot View" entry for it
  7. In the Connection field below, enter the path to the top-level contents file snapshot.ini. You can use the File... button to navigate to the file. See the screenshot below. To load the ready-made example files, specify e.g. ...\DS-5 Workspace\CoreSight_access\example_capture\snapshot.ini
  8. To view the disassembly and trace augmented with symbols (e.g. function names), the vmlinux file for the kernel must be given to DS-5 Debugger. In the Files tab, select Load symbols from file, and use the File System... or Workspace... buttons to navigate to the vmlinux file
  9. In cases where the kernel modifies itself, you can force DS-5 Debugger to use program code from kernel_dump.bin rather than vmlinux. In the Debugger tab, tick Execute debugger commands, and add "set trust-ro-sections-for-opcodes off" to its field
  10. Press 'Debug'.

Debug launch configuration to view collected trace from Snowball


Digging Deeper into the Trace

The trace captured earlier was focussed in a narrow range of addresses around cpu_idle(). By widening the trace range, events such as IRQ exceptions can also be captured, so that the execution of the interrupt handler can analyzed.

As these events are relatively rare, DS-5 Debugger has configurable filters so that, for example, only exceptions are displayed, so that these can be spotted more easily. DS-5 Debugger is also able to export the captured trace data into a text file.

In the following screenshot, we can see an IRQ exception occurring immediately after interrupts are re-enabled by a CPSIE instruction within arch_local_irq_enable() at index 4,777. The interrupt is then handled by the code in __irq_svc().

Tracing through the IRQ Handler in the Linux Kernel


Profiling the Kernel

In the following screenshot, we can see the profile of a quiescent system, and that two functions together __raw_spin_lock_irqsave() and __memzero() occupy nearly 10% of the run-time. The red hot-spots in the heat-map reveal the instructions that occupy most time (mostly memory access instructions in this system). This reveals which functions are candidates for closer examination and possible optimization.

Profile of a quiescent system

Summary

We’ve seen how the CoreSight Access Library can be used to get a trace of instruction execution within the Linux kernel non-intrusively, and how DS-5 Debugger’s Snapshot Viewer can allow us to dig deep into the trace to debug the kernel, or to profile the usage of functions within the kernel. The user-space application running on the Linux target could be modified for real-time “flight-recorder” monitoring, or for post-mortem crash analysis.

Important Information for ds.arm.com

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ds.arm.com uses three types of cookies: (1) those that enable the site to function and perform as required; (2) analytical cookies which anonymously track visitors while using this site; and (3) cookies which track visitors while using this site and then link to that visitor’s account once they log in or register.

If you are not happy with this use of these cookies please review our Cookie Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.