基于性能计数器的系统性能分析工具¶
项目概述¶
项目目标¶
构建一个功能完整的嵌入式系统性能分析工具,能够:
- 精确测量:利用DWT计数器实现周期级精度的性能测量
- 性能分析:识别代码中的性能瓶颈和热点函数
- 实时监控:在系统运行时实时监控性能指标
- 数据可视化:通过串口或网络接口输出性能数据,支持可视化展示
- 多维度分析:支持函数级、模块级和系统级性能分析
应用场景¶
- 嵌入式系统性能优化
- 实时系统响应时间分析
- 算法性能对比测试
- 系统资源使用监控
- 产品性能基准测试
技术栈¶
- 硬件平台:ARM Cortex-M3/M4/M7(支持DWT)
- 开发工具:GCC/Keil/IAR
- 调试接口:UART/USB/RTT
- 可视化:Python脚本或Web界面
项目特点¶
- 零侵入性:不影响被测代码的执行逻辑
- 高精度:周期级测量精度
- 低开销:最小化性能分析本身的开销
- 易集成:简单的API接口
- 可扩展:支持自定义性能指标
DWT性能计数器详解¶
DWT简介¶
DWT(Data Watchpoint and Trace)是ARM Cortex-M处理器的调试组件,提供:
- CYCCNT:周期计数器,记录CPU执行的时钟周期数
- CPICNT:CPI(Cycles Per Instruction)计数器
- EXCCNT:异常开销计数器
- SLEEPCNT:睡眠周期计数器
- LSUCNT:加载/存储单元计数器
- FOLDCNT:指令折叠计数器
DWT寄存器映射¶
DWT基地址:0xE0001000
关键寄存器:
- DWT_CTRL (0xE0001000): 控制寄存器
- DWT_CYCCNT (0xE0001004): 周期计数器
- DWT_CPICNT (0xE0001008): CPI计数器
- DWT_EXCCNT (0xE000100C): 异常计数器
- DWT_SLEEPCNT (0xE0001010): 睡眠计数器
- DWT_LSUCNT (0xE0001014): LSU计数器
- DWT_FOLDCNT (0xE0001018): 折叠计数器
DWT工作原理¶
┌─────────────────────────────────────────┐
│ ARM Cortex-M 处理器 │
│ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ CPU │─────▶│ DWT单元 │ │
│ │ Core │ │ │ │
│ └──────────┘ │ ┌──────────┐ │ │
│ │ │ CYCCNT │ │ │
│ ┌──────────┐ │ ├──────────┤ │ │
│ │ 总线接口 │ │ │ CPICNT │ │ │
│ └──────────┘ │ ├──────────┤ │ │
│ │ │ EXCCNT │ │ │
│ ┌──────────┐ │ ├──────────┤ │ │
│ │ 中断控制 │─────▶│ │ SLEEPCNT │ │ │
│ └──────────┘ │ ├──────────┤ │ │
│ │ │ LSUCNT │ │ │
│ │ ├──────────┤ │ │
│ │ │ FOLDCNT │ │ │
│ │ └──────────┘ │ │
│ └──────────────┘ │
└─────────────────────────────────────────┘
核心模块设计¶
模块架构¶
┌─────────────────────────────────────────────────────┐
│ 性能分析工具架构 │
├─────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ DWT驱动层 │ │ 测量引擎 │ │ 数据管理 │ │
│ │ │ │ │ │ │ │
│ │ - 初始化 │ │ - 开始测量 │ │ - 存储 │ │
│ │ - 配置 │ │ - 停止测量 │ │ - 查询 │ │
│ │ - 读取计数器 │ │ - 计算统计 │ │ - 导出 │ │
│ └──────────────┘ └──────────────┘ └──────────┘ │
│ │ │ │ │
│ └──────────────────┴────────────────┘ │
│ │ │
│ ┌─────────────────────┴────────────────────────┐ │
│ │ API接口层 │ │
│ │ │ │
│ │ - PERF_Init() │ │
│ │ - PERF_Start(name) │ │
│ │ - PERF_Stop(name) │ │
│ │ - PERF_Report() │ │
│ └───────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────┴────────────────────────┐ │
│ │ 输出层 │ │
│ │ │ │
│ │ - UART输出 │ │
│ │ - RTT输出 │ │
│ │ - JSON格式化 │ │
│ └───────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
数据结构设计¶
/**
* @file perf_counter.h
* @brief 性能计数器头文件
*/
#ifndef PERF_COUNTER_H
#define PERF_COUNTER_H
#include <stdint.h>
#include <stdbool.h>
/* 配置参数 */
#define PERF_MAX_COUNTERS 32 // 最大计数器数量
#define PERF_NAME_MAX_LEN 32 // 名称最大长度
#define PERF_HISTORY_SIZE 10 // 历史记录数量
/* DWT寄存器定义 */
#define DWT_CTRL (*(volatile uint32_t *)0xE0001000)
#define DWT_CYCCNT (*(volatile uint32_t *)0xE0001004)
#define DWT_CPICNT (*(volatile uint32_t *)0xE0001008)
#define DWT_EXCCNT (*(volatile uint32_t *)0xE000100C)
#define DWT_SLEEPCNT (*(volatile uint32_t *)0xE0001010)
#define DWT_LSUCNT (*(volatile uint32_t *)0xE0001014)
#define DWT_FOLDCNT (*(volatile uint32_t *)0xE0001018)
#define CoreDebug_DEMCR (*(volatile uint32_t *)0xE000EDFC)
#define CoreDebug_DEMCR_TRCENA (1 << 24)
/* DWT控制位 */
#define DWT_CTRL_CYCCNTENA (1 << 0)
#define DWT_CTRL_CPIEVTENA (1 << 17)
#define DWT_CTRL_EXCEVTENA (1 << 18)
#define DWT_CTRL_SLEEPEVTENA (1 << 19)
#define DWT_CTRL_LSUEVTENA (1 << 20)
#define DWT_CTRL_FOLDEVTENA (1 << 21)
/**
* @brief 性能计数器状态
*/
typedef enum {
PERF_STATE_IDLE = 0, // 空闲
PERF_STATE_RUNNING, // 运行中
PERF_STATE_STOPPED, // 已停止
PERF_STATE_ERROR // 错误
} perf_state_t;
/**
* @brief 性能统计数据
*/
typedef struct {
uint32_t cycles; // 周期数
uint32_t cpi_count; // CPI计数
uint32_t exc_count; // 异常计数
uint32_t sleep_count; // 睡眠计数
uint32_t lsu_count; // LSU计数
uint32_t fold_count; // 折叠计数
uint32_t timestamp; // 时间戳
} perf_snapshot_t;
/**
* @brief 性能计数器
*/
typedef struct {
char name[PERF_NAME_MAX_LEN]; // 计数器名称
perf_state_t state; // 状态
perf_snapshot_t start; // 开始快照
perf_snapshot_t end; // 结束快照
uint32_t call_count; // 调用次数
uint64_t total_cycles; // 总周期数
uint32_t min_cycles; // 最小周期数
uint32_t max_cycles; // 最大周期数
uint32_t history[PERF_HISTORY_SIZE]; // 历史记录
uint8_t history_index; // 历史索引
bool enabled; // 是否启用
} perf_counter_t;
/**
* @brief 性能分析器
*/
typedef struct {
perf_counter_t counters[PERF_MAX_COUNTERS];
uint8_t counter_count;
bool initialized;
uint32_t cpu_freq_mhz;
} perf_analyzer_t;
#endif /* PERF_COUNTER_H */
DWT驱动实现¶
DWT初始化¶
/**
* @file perf_counter.c
* @brief 性能计数器实现
*/
#include "perf_counter.h"
#include <string.h>
#include <stdio.h>
/* 全局性能分析器实例 */
static perf_analyzer_t g_perf_analyzer = {0};
/**
* @brief 初始化DWT
* @param cpu_freq_mhz CPU频率(MHz)
* @retval 0: 成功, -1: 失败
*/
int PERF_DWT_Init(uint32_t cpu_freq_mhz) {
// 使能DWT和ITM
CoreDebug_DEMCR |= CoreDebug_DEMCR_TRCENA;
// 复位周期计数器
DWT_CYCCNT = 0;
// 使能周期计数器
DWT_CTRL |= DWT_CTRL_CYCCNTENA;
// 使能其他计数器(可选)
DWT_CTRL |= DWT_CTRL_CPIEVTENA; // CPI计数
DWT_CTRL |= DWT_CTRL_EXCEVTENA; // 异常计数
DWT_CTRL |= DWT_CTRL_SLEEPEVTENA; // 睡眠计数
DWT_CTRL |= DWT_CTRL_LSUEVTENA; // LSU计数
DWT_CTRL |= DWT_CTRL_FOLDEVTENA; // 折叠计数
// 清零所有计数器
DWT_CPICNT = 0;
DWT_EXCCNT = 0;
DWT_SLEEPCNT = 0;
DWT_LSUCNT = 0;
DWT_FOLDCNT = 0;
// 验证DWT是否工作
uint32_t test_start = DWT_CYCCNT;
for (volatile int i = 0; i < 100; i++);
uint32_t test_end = DWT_CYCCNT;
if (test_end <= test_start) {
return -1; // DWT未工作
}
g_perf_analyzer.cpu_freq_mhz = cpu_freq_mhz;
g_perf_analyzer.initialized = true;
return 0;
}
/**
* @brief 读取DWT快照
* @param snapshot 快照结构指针
*/
static void PERF_DWT_ReadSnapshot(perf_snapshot_t *snapshot) {
// 读取所有计数器(原子操作)
uint32_t primask = __get_PRIMASK();
__disable_irq();
snapshot->cycles = DWT_CYCCNT;
snapshot->cpi_count = DWT_CPICNT;
snapshot->exc_count = DWT_EXCCNT;
snapshot->sleep_count = DWT_SLEEPCNT;
snapshot->lsu_count = DWT_LSUCNT;
snapshot->fold_count = DWT_FOLDCNT;
snapshot->timestamp = DWT_CYCCNT; // 使用周期计数作为时间戳
__set_PRIMASK(primask);
}
/**
* @brief 计算两个快照之间的差值
* @param start 开始快照
* @param end 结束快照
* @param diff 差值快照
*/
static void PERF_DWT_CalcDiff(const perf_snapshot_t *start,
const perf_snapshot_t *end,
perf_snapshot_t *diff) {
// 处理计数器溢出(32位)
diff->cycles = (end->cycles >= start->cycles) ?
(end->cycles - start->cycles) :
(0xFFFFFFFF - start->cycles + end->cycles + 1);
diff->cpi_count = (end->cpi_count >= start->cpi_count) ?
(end->cpi_count - start->cpi_count) :
(0xFFFFFFFF - start->cpi_count + end->cpi_count + 1);
diff->exc_count = (end->exc_count >= start->exc_count) ?
(end->exc_count - start->exc_count) :
(0xFFFFFFFF - start->exc_count + end->exc_count + 1);
diff->sleep_count = (end->sleep_count >= start->sleep_count) ?
(end->sleep_count - start->sleep_count) :
(0xFFFFFFFF - start->sleep_count + end->sleep_count + 1);
diff->lsu_count = (end->lsu_count >= start->lsu_count) ?
(end->lsu_count - start->lsu_count) :
(0xFFFFFFFF - start->lsu_count + end->lsu_count + 1);
diff->fold_count = (end->fold_count >= start->fold_count) ?
(end->fold_count - start->fold_count) :
(0xFFFFFFFF - start->fold_count + end->fold_count + 1);
}
/**
* @brief 复位DWT计数器
*/
void PERF_DWT_Reset(void) {
DWT_CYCCNT = 0;
DWT_CPICNT = 0;
DWT_EXCCNT = 0;
DWT_SLEEPCNT = 0;
DWT_LSUCNT = 0;
DWT_FOLDCNT = 0;
}
性能测量API¶
核心API实现¶
/**
* @brief 初始化性能分析器
* @param cpu_freq_mhz CPU频率(MHz)
* @retval 0: 成功, -1: 失败
*/
int PERF_Init(uint32_t cpu_freq_mhz) {
if (g_perf_analyzer.initialized) {
return 0; // 已初始化
}
// 清零分析器
memset(&g_perf_analyzer, 0, sizeof(perf_analyzer_t));
// 初始化DWT
if (PERF_DWT_Init(cpu_freq_mhz) != 0) {
return -1;
}
printf("[PERF] 性能分析器已初始化 (CPU: %u MHz)\n",
(unsigned int)cpu_freq_mhz);
return 0;
}
/**
* @brief 查找或创建计数器
* @param name 计数器名称
* @retval 计数器指针,NULL表示失败
*/
static perf_counter_t* PERF_FindOrCreateCounter(const char *name) {
// 查找现有计数器
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
if (strcmp(g_perf_analyzer.counters[i].name, name) == 0) {
return &g_perf_analyzer.counters[i];
}
}
// 创建新计数器
if (g_perf_analyzer.counter_count >= PERF_MAX_COUNTERS) {
printf("[PERF] 错误: 计数器数量已达上限\n");
return NULL;
}
perf_counter_t *counter = &g_perf_analyzer.counters[g_perf_analyzer.counter_count++];
// 初始化计数器
memset(counter, 0, sizeof(perf_counter_t));
strncpy(counter->name, name, PERF_NAME_MAX_LEN - 1);
counter->name[PERF_NAME_MAX_LEN - 1] = '\0';
counter->state = PERF_STATE_IDLE;
counter->enabled = true;
counter->min_cycles = 0xFFFFFFFF;
counter->max_cycles = 0;
return counter;
}
/**
* @brief 开始性能测量
* @param name 测量点名称
* @retval 0: 成功, -1: 失败
*/
int PERF_Start(const char *name) {
if (!g_perf_analyzer.initialized) {
return -1;
}
perf_counter_t *counter = PERF_FindOrCreateCounter(name);
if (counter == NULL) {
return -1;
}
if (!counter->enabled) {
return 0; // 计数器已禁用
}
if (counter->state == PERF_STATE_RUNNING) {
printf("[PERF] 警告: 计数器 '%s' 已在运行\n", name);
return -1;
}
// 读取开始快照
PERF_DWT_ReadSnapshot(&counter->start);
counter->state = PERF_STATE_RUNNING;
return 0;
}
/**
* @brief 停止性能测量
* @param name 测量点名称
* @retval 0: 成功, -1: 失败
*/
int PERF_Stop(const char *name) {
if (!g_perf_analyzer.initialized) {
return -1;
}
perf_counter_t *counter = PERF_FindOrCreateCounter(name);
if (counter == NULL) {
return -1;
}
if (!counter->enabled) {
return 0;
}
if (counter->state != PERF_STATE_RUNNING) {
printf("[PERF] 警告: 计数器 '%s' 未运行\n", name);
return -1;
}
// 读取结束快照
PERF_DWT_ReadSnapshot(&counter->end);
counter->state = PERF_STATE_STOPPED;
// 计算差值
perf_snapshot_t diff;
PERF_DWT_CalcDiff(&counter->start, &counter->end, &diff);
// 更新统计信息
counter->call_count++;
counter->total_cycles += diff.cycles;
if (diff.cycles < counter->min_cycles) {
counter->min_cycles = diff.cycles;
}
if (diff.cycles > counter->max_cycles) {
counter->max_cycles = diff.cycles;
}
// 更新历史记录
counter->history[counter->history_index] = diff.cycles;
counter->history_index = (counter->history_index + 1) % PERF_HISTORY_SIZE;
return 0;
}
/**
* @brief 获取计数器统计信息
* @param name 计数器名称
* @param cycles 输出:周期数
* @retval 0: 成功, -1: 失败
*/
int PERF_GetCycles(const char *name, uint32_t *cycles) {
if (!g_perf_analyzer.initialized || cycles == NULL) {
return -1;
}
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
if (strcmp(g_perf_analyzer.counters[i].name, name) == 0) {
perf_counter_t *counter = &g_perf_analyzer.counters[i];
if (counter->call_count > 0) {
*cycles = (uint32_t)(counter->total_cycles / counter->call_count);
} else {
*cycles = 0;
}
return 0;
}
}
return -1; // 未找到计数器
}
/**
* @brief 复位计数器
* @param name 计数器名称(NULL表示复位所有)
*/
void PERF_Reset(const char *name) {
if (name == NULL) {
// 复位所有计数器
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
perf_counter_t *counter = &g_perf_analyzer.counters[i];
counter->call_count = 0;
counter->total_cycles = 0;
counter->min_cycles = 0xFFFFFFFF;
counter->max_cycles = 0;
counter->history_index = 0;
memset(counter->history, 0, sizeof(counter->history));
}
} else {
// 复位指定计数器
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
if (strcmp(g_perf_analyzer.counters[i].name, name) == 0) {
perf_counter_t *counter = &g_perf_analyzer.counters[i];
counter->call_count = 0;
counter->total_cycles = 0;
counter->min_cycles = 0xFFFFFFFF;
counter->max_cycles = 0;
counter->history_index = 0;
memset(counter->history, 0, sizeof(counter->history));
break;
}
}
}
}
/**
* @brief 启用/禁用计数器
* @param name 计数器名称
* @param enabled true: 启用, false: 禁用
*/
void PERF_SetEnabled(const char *name, bool enabled) {
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
if (strcmp(g_perf_analyzer.counters[i].name, name) == 0) {
g_perf_analyzer.counters[i].enabled = enabled;
break;
}
}
}
便捷宏定义¶
/**
* @file perf_macros.h
* @brief 性能测量便捷宏
*/
#ifndef PERF_MACROS_H
#define PERF_MACROS_H
#include "perf_counter.h"
/* 性能测量宏 */
#define PERF_MEASURE_START(name) PERF_Start(#name)
#define PERF_MEASURE_STOP(name) PERF_Stop(#name)
/* 函数性能测量宏 */
#define PERF_FUNCTION_START() PERF_Start(__FUNCTION__)
#define PERF_FUNCTION_STOP() PERF_Stop(__FUNCTION__)
/* 代码块性能测量 */
#define PERF_BLOCK_START(name) do { PERF_Start(name);
#define PERF_BLOCK_END(name) PERF_Stop(name); } while(0)
/* 自动性能测量(使用作用域) */
typedef struct {
const char *name;
} perf_scope_t;
static inline perf_scope_t perf_scope_begin(const char *name) {
PERF_Start(name);
perf_scope_t scope = {name};
return scope;
}
static inline void perf_scope_end(perf_scope_t *scope) {
if (scope && scope->name) {
PERF_Stop(scope->name);
}
}
#define PERF_SCOPE(name) \
perf_scope_t __perf_scope_##name __attribute__((cleanup(perf_scope_end))) = \
perf_scope_begin(#name)
/* 条件性能测量 */
#ifdef ENABLE_PERF_MEASURE
#define PERF_MEASURE(name, code) \
do { \
PERF_Start(name); \
code; \
PERF_Stop(name); \
} while(0)
#else
#define PERF_MEASURE(name, code) do { code; } while(0)
#endif
#endif /* PERF_MACROS_H */
性能报告生成¶
文本格式报告¶
/**
* @brief 打印性能报告
*/
void PERF_Report(void) {
if (!g_perf_analyzer.initialized) {
printf("[PERF] 错误: 性能分析器未初始化\n");
return;
}
printf("\n");
printf("╔════════════════════════════════════════════════════════════════╗\n");
printf("║ 性能分析报告 ║\n");
printf("╠════════════════════════════════════════════════════════════════╣\n");
printf("║ CPU频率: %u MHz ║\n",
(unsigned int)g_perf_analyzer.cpu_freq_mhz);
printf("║ 计数器数量: %u ║\n",
g_perf_analyzer.counter_count);
printf("╚════════════════════════════════════════════════════════════════╝\n");
printf("\n");
if (g_perf_analyzer.counter_count == 0) {
printf("没有性能数据\n");
return;
}
// 表头
printf("┌────────────────────────┬────────┬──────────┬──────────┬──────────┬──────────┐\n");
printf("│ 名称 │ 调用次 │ 平均周期 │ 最小周期 │ 最大周期 │ 平均时间 │\n");
printf("├────────────────────────┼────────┼──────────┼──────────┼──────────┼──────────┤\n");
// 数据行
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
perf_counter_t *counter = &g_perf_analyzer.counters[i];
if (counter->call_count == 0) {
continue;
}
uint32_t avg_cycles = (uint32_t)(counter->total_cycles / counter->call_count);
float avg_time_us = (float)avg_cycles / g_perf_analyzer.cpu_freq_mhz;
printf("│ %-22s │ %6u │ %8u │ %8u │ %8u │ %7.2f │\n",
counter->name,
(unsigned int)counter->call_count,
(unsigned int)avg_cycles,
(unsigned int)counter->min_cycles,
(unsigned int)counter->max_cycles,
avg_time_us);
}
printf("└────────────────────────┴────────┴──────────┴──────────┴──────────┴──────────┘\n");
printf("\n");
}
/**
* @brief 打印详细性能报告
* @param name 计数器名称
*/
void PERF_ReportDetailed(const char *name) {
if (!g_perf_analyzer.initialized) {
return;
}
perf_counter_t *counter = NULL;
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
if (strcmp(g_perf_analyzer.counters[i].name, name) == 0) {
counter = &g_perf_analyzer.counters[i];
break;
}
}
if (counter == NULL || counter->call_count == 0) {
printf("[PERF] 未找到计数器 '%s' 或无数据\n", name);
return;
}
uint32_t avg_cycles = (uint32_t)(counter->total_cycles / counter->call_count);
float avg_time_us = (float)avg_cycles / g_perf_analyzer.cpu_freq_mhz;
float min_time_us = (float)counter->min_cycles / g_perf_analyzer.cpu_freq_mhz;
float max_time_us = (float)counter->max_cycles / g_perf_analyzer.cpu_freq_mhz;
printf("\n");
printf("═══════════════════════════════════════════════════════════\n");
printf(" 详细性能报告: %s\n", counter->name);
printf("═══════════════════════════════════════════════════════════\n");
printf(" 调用次数: %u\n", (unsigned int)counter->call_count);
printf(" 总周期数: %llu\n", (unsigned long long)counter->total_cycles);
printf(" 平均周期: %u (%.2f μs)\n", (unsigned int)avg_cycles, avg_time_us);
printf(" 最小周期: %u (%.2f μs)\n", (unsigned int)counter->min_cycles, min_time_us);
printf(" 最大周期: %u (%.2f μs)\n", (unsigned int)counter->max_cycles, max_time_us);
printf(" 周期变化: %u (%.1f%%)\n",
(unsigned int)(counter->max_cycles - counter->min_cycles),
100.0f * (counter->max_cycles - counter->min_cycles) / avg_cycles);
// 打印历史记录
printf("\n 最近%d次测量:\n", PERF_HISTORY_SIZE);
printf(" ┌");
for (int i = 0; i < PERF_HISTORY_SIZE; i++) {
printf("────────┬");
}
printf("\b┐\n");
printf(" │");
for (int i = 0; i < PERF_HISTORY_SIZE; i++) {
int idx = (counter->history_index + i) % PERF_HISTORY_SIZE;
if (counter->history[idx] > 0) {
printf(" %6u │", (unsigned int)counter->history[idx]);
} else {
printf(" -- │");
}
}
printf("\n");
printf(" └");
for (int i = 0; i < PERF_HISTORY_SIZE; i++) {
printf("────────┴");
}
printf("\b┘\n");
printf("═══════════════════════════════════════════════════════════\n");
printf("\n");
}
JSON格式输出¶
/**
* @brief 生成JSON格式报告
*/
void PERF_ReportJSON(void) {
if (!g_perf_analyzer.initialized) {
return;
}
printf("{\n");
printf(" \"performance_report\": {\n");
printf(" \"cpu_freq_mhz\": %u,\n", (unsigned int)g_perf_analyzer.cpu_freq_mhz);
printf(" \"counter_count\": %u,\n", g_perf_analyzer.counter_count);
printf(" \"counters\": [\n");
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
perf_counter_t *counter = &g_perf_analyzer.counters[i];
if (counter->call_count == 0) {
continue;
}
uint32_t avg_cycles = (uint32_t)(counter->total_cycles / counter->call_count);
float avg_time_us = (float)avg_cycles / g_perf_analyzer.cpu_freq_mhz;
printf(" {\n");
printf(" \"name\": \"%s\",\n", counter->name);
printf(" \"call_count\": %u,\n", (unsigned int)counter->call_count);
printf(" \"total_cycles\": %llu,\n", (unsigned long long)counter->total_cycles);
printf(" \"avg_cycles\": %u,\n", (unsigned int)avg_cycles);
printf(" \"min_cycles\": %u,\n", (unsigned int)counter->min_cycles);
printf(" \"max_cycles\": %u,\n", (unsigned int)counter->max_cycles);
printf(" \"avg_time_us\": %.2f,\n", avg_time_us);
printf(" \"history\": [");
for (int j = 0; j < PERF_HISTORY_SIZE; j++) {
if (j > 0) printf(", ");
printf("%u", (unsigned int)counter->history[j]);
}
printf("]\n");
printf(" }");
if (i < g_perf_analyzer.counter_count - 1) {
printf(",");
}
printf("\n");
}
printf(" ]\n");
printf(" }\n");
printf("}\n");
}
/**
* @brief 生成CSV格式报告
*/
void PERF_ReportCSV(void) {
if (!g_perf_analyzer.initialized) {
return;
}
// CSV表头
printf("Name,CallCount,TotalCycles,AvgCycles,MinCycles,MaxCycles,AvgTimeUs\n");
// 数据行
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
perf_counter_t *counter = &g_perf_analyzer.counters[i];
if (counter->call_count == 0) {
continue;
}
uint32_t avg_cycles = (uint32_t)(counter->total_cycles / counter->call_count);
float avg_time_us = (float)avg_cycles / g_perf_analyzer.cpu_freq_mhz;
printf("%s,%u,%llu,%u,%u,%u,%.2f\n",
counter->name,
(unsigned int)counter->call_count,
(unsigned long long)counter->total_cycles,
(unsigned int)avg_cycles,
(unsigned int)counter->min_cycles,
(unsigned int)counter->max_cycles,
avg_time_us);
}
}
使用示例¶
基本使用¶
/**
* @file main.c
* @brief 性能分析工具使用示例
*/
#include "perf_counter.h"
#include "perf_macros.h"
#include <stdio.h>
/* 测试函数 */
void test_function_1(void) {
PERF_FUNCTION_START();
// 模拟一些计算
volatile int sum = 0;
for (int i = 0; i < 1000; i++) {
sum += i;
}
PERF_FUNCTION_STOP();
}
void test_function_2(void) {
PERF_FUNCTION_START();
// 模拟更复杂的计算
volatile float result = 0.0f;
for (int i = 0; i < 500; i++) {
result += (float)i * 1.5f;
}
PERF_FUNCTION_STOP();
}
void test_nested_measurement(void) {
PERF_START("outer_function");
// 外层代码
volatile int x = 0;
for (int i = 0; i < 100; i++) {
x += i;
}
// 内层测量
PERF_START("inner_function");
volatile int y = 0;
for (int i = 0; i < 200; i++) {
y += i * 2;
}
PERF_STOP("inner_function");
// 外层代码继续
for (int i = 0; i < 100; i++) {
x -= i;
}
PERF_STOP("outer_function");
}
int main(void) {
// 系统初始化
SystemClock_Config(); // 配置系统时钟
// 初始化性能分析器(假设CPU频率为168MHz)
if (PERF_Init(168) != 0) {
printf("性能分析器初始化失败\n");
return -1;
}
printf("性能分析工具示例\n");
printf("==================\n\n");
// 示例1: 基本测量
printf("示例1: 基本函数性能测量\n");
for (int i = 0; i < 10; i++) {
test_function_1();
test_function_2();
}
PERF_Report();
// 示例2: 嵌套测量
printf("\n示例2: 嵌套性能测量\n");
PERF_Reset(NULL); // 复位所有计数器
for (int i = 0; i < 5; i++) {
test_nested_measurement();
}
PERF_Report();
// 示例3: 详细报告
printf("\n示例3: 详细性能报告\n");
PERF_ReportDetailed("test_function_1");
// 示例4: JSON输出
printf("\n示例4: JSON格式输出\n");
PERF_ReportJSON();
while (1) {
// 主循环
}
return 0;
}
算法性能对比¶
/**
* @brief 冒泡排序
*/
void bubble_sort(int *arr, int n) {
PERF_FUNCTION_START();
for (int i = 0; i < n - 1; i++) {
for (int j = 0; j < n - i - 1; j++) {
if (arr[j] > arr[j + 1]) {
int temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
}
PERF_FUNCTION_STOP();
}
/**
* @brief 快速排序
*/
void quick_sort(int *arr, int low, int high) {
if (low < high) {
int pivot = arr[high];
int i = low - 1;
for (int j = low; j < high; j++) {
if (arr[j] < pivot) {
i++;
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
int temp = arr[i + 1];
arr[i + 1] = arr[high];
arr[high] = temp;
int pi = i + 1;
quick_sort(arr, low, pi - 1);
quick_sort(arr, pi + 1, high);
}
}
void quick_sort_wrapper(int *arr, int n) {
PERF_START("quick_sort");
quick_sort(arr, 0, n - 1);
PERF_STOP("quick_sort");
}
/**
* @brief 排序算法性能对比
*/
void test_sorting_algorithms(void) {
const int SIZE = 100;
int arr1[SIZE], arr2[SIZE];
// 初始化数组
for (int i = 0; i < SIZE; i++) {
arr1[i] = SIZE - i; // 逆序
arr2[i] = SIZE - i;
}
printf("\n排序算法性能对比 (数组大小: %d)\n", SIZE);
printf("=====================================\n");
// 测试冒泡排序
for (int i = 0; i < 10; i++) {
// 重新初始化数组
for (int j = 0; j < SIZE; j++) {
arr1[j] = SIZE - j;
}
bubble_sort(arr1, SIZE);
}
// 测试快速排序
for (int i = 0; i < 10; i++) {
// 重新初始化数组
for (int j = 0; j < SIZE; j++) {
arr2[j] = SIZE - j;
}
quick_sort_wrapper(arr2, SIZE);
}
PERF_Report();
// 计算加速比
uint32_t bubble_cycles, quick_cycles;
if (PERF_GetCycles("bubble_sort", &bubble_cycles) == 0 &&
PERF_GetCycles("quick_sort", &quick_cycles) == 0) {
float speedup = (float)bubble_cycles / quick_cycles;
printf("\n快速排序加速比: %.2fx\n", speedup);
}
}
实时系统性能监控¶
/**
* @brief 实时任务性能监控
*/
/* 模拟RTOS任务 */
void task_sensor_read(void) {
PERF_START("task_sensor");
// 模拟传感器读取
volatile int sensor_data = 0;
for (int i = 0; i < 500; i++) {
sensor_data += i;
}
PERF_STOP("task_sensor");
}
void task_data_process(void) {
PERF_START("task_process");
// 模拟数据处理
volatile float result = 0.0f;
for (int i = 0; i < 300; i++) {
result += (float)i * 0.5f;
}
PERF_STOP("task_process");
}
void task_communication(void) {
PERF_START("task_comm");
// 模拟通信
volatile int checksum = 0;
for (int i = 0; i < 200; i++) {
checksum ^= i;
}
PERF_STOP("task_comm");
}
/**
* @brief 系统性能监控循环
*/
void system_performance_monitor(void) {
static uint32_t report_counter = 0;
const uint32_t REPORT_INTERVAL = 1000; // 每1000次循环报告一次
while (1) {
// 执行各个任务
task_sensor_read();
task_data_process();
task_communication();
report_counter++;
// 定期生成报告
if (report_counter >= REPORT_INTERVAL) {
printf("\n=== 系统性能报告 (循环: %u) ===\n",
(unsigned int)report_counter);
PERF_Report();
// 检查是否有任务超时
uint32_t sensor_cycles, process_cycles, comm_cycles;
PERF_GetCycles("task_sensor", &sensor_cycles);
PERF_GetCycles("task_process", &process_cycles);
PERF_GetCycles("task_comm", &comm_cycles);
// 假设每个任务的超时阈值(周期数)
const uint32_t SENSOR_TIMEOUT = 100000;
const uint32_t PROCESS_TIMEOUT = 80000;
const uint32_t COMM_TIMEOUT = 50000;
if (sensor_cycles > SENSOR_TIMEOUT) {
printf("[警告] 传感器任务超时: %u 周期\n",
(unsigned int)sensor_cycles);
}
if (process_cycles > PROCESS_TIMEOUT) {
printf("[警告] 处理任务超时: %u 周期\n",
(unsigned int)process_cycles);
}
if (comm_cycles > COMM_TIMEOUT) {
printf("[警告] 通信任务超时: %u 周期\n",
(unsigned int)comm_cycles);
}
report_counter = 0;
PERF_Reset(NULL); // 复位计数器
}
// 模拟延时
for (volatile int i = 0; i < 10000; i++);
}
}
高级功能¶
性能瓶颈自动检测¶
/**
* @brief 性能瓶颈检测
* @param threshold_percent 阈值百分比(相对于总时间)
*/
void PERF_DetectBottlenecks(float threshold_percent) {
if (!g_perf_analyzer.initialized || g_perf_analyzer.counter_count == 0) {
return;
}
// 计算总周期数
uint64_t total_cycles = 0;
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
total_cycles += g_perf_analyzer.counters[i].total_cycles;
}
if (total_cycles == 0) {
return;
}
printf("\n");
printf("╔════════════════════════════════════════════════════════════╗\n");
printf("║ 性能瓶颈分析 (阈值: %.1f%%) ║\n",
threshold_percent);
printf("╠════════════════════════════════════════════════════════════╣\n");
bool found_bottleneck = false;
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
perf_counter_t *counter = &g_perf_analyzer.counters[i];
if (counter->call_count == 0) {
continue;
}
float percent = (float)(counter->total_cycles * 100.0) / total_cycles;
if (percent >= threshold_percent) {
found_bottleneck = true;
uint32_t avg_cycles = (uint32_t)(counter->total_cycles / counter->call_count);
float avg_time_us = (float)avg_cycles / g_perf_analyzer.cpu_freq_mhz;
printf("║ [瓶颈] %-20s ║\n", counter->name);
printf("║ 占用时间: %.1f%% ║\n", percent);
printf("║ 调用次数: %u ║\n",
(unsigned int)counter->call_count);
printf("║ 平均耗时: %.2f μs ║\n", avg_time_us);
printf("╠════════════════════════════════════════════════════════════╣\n");
}
}
if (!found_bottleneck) {
printf("║ 未检测到明显的性能瓶颈 ║\n");
printf("╠════════════════════════════════════════════════════════════╣\n");
}
printf("║ 总周期数: %llu ║\n",
(unsigned long long)total_cycles);
printf("╚════════════════════════════════════════════════════════════╝\n");
printf("\n");
}
/**
* @brief 性能对比分析
* @param name1 第一个计数器名称
* @param name2 第二个计数器名称
*/
void PERF_Compare(const char *name1, const char *name2) {
perf_counter_t *counter1 = NULL;
perf_counter_t *counter2 = NULL;
// 查找计数器
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
if (strcmp(g_perf_analyzer.counters[i].name, name1) == 0) {
counter1 = &g_perf_analyzer.counters[i];
}
if (strcmp(g_perf_analyzer.counters[i].name, name2) == 0) {
counter2 = &g_perf_analyzer.counters[i];
}
}
if (counter1 == NULL || counter2 == NULL) {
printf("[PERF] 错误: 未找到计数器\n");
return;
}
if (counter1->call_count == 0 || counter2->call_count == 0) {
printf("[PERF] 错误: 计数器无数据\n");
return;
}
uint32_t avg1 = (uint32_t)(counter1->total_cycles / counter1->call_count);
uint32_t avg2 = (uint32_t)(counter2->total_cycles / counter2->call_count);
float time1_us = (float)avg1 / g_perf_analyzer.cpu_freq_mhz;
float time2_us = (float)avg2 / g_perf_analyzer.cpu_freq_mhz;
printf("\n");
printf("═══════════════════════════════════════════════════════════\n");
printf(" 性能对比: %s vs %s\n", name1, name2);
printf("═══════════════════════════════════════════════════════════\n");
printf(" %-20s │ %-20s\n", name1, name2);
printf("───────────────────────────┼───────────────────────────\n");
printf(" 平均周期: %8u │ 平均周期: %8u\n",
(unsigned int)avg1, (unsigned int)avg2);
printf(" 平均时间: %8.2f μs │ 平均时间: %8.2f μs\n", time1_us, time2_us);
printf(" 最小周期: %8u │ 最小周期: %8u\n",
(unsigned int)counter1->min_cycles, (unsigned int)counter2->min_cycles);
printf(" 最大周期: %8u │ 最大周期: %8u\n",
(unsigned int)counter1->max_cycles, (unsigned int)counter2->max_cycles);
printf(" 调用次数: %8u │ 调用次数: %8u\n",
(unsigned int)counter1->call_count, (unsigned int)counter2->call_count);
printf("═══════════════════════════════════════════════════════════\n");
if (avg1 > avg2) {
float speedup = (float)avg1 / avg2;
printf(" %s 比 %s 慢 %.2fx\n", name1, name2, speedup);
} else if (avg2 > avg1) {
float speedup = (float)avg2 / avg1;
printf(" %s 比 %s 快 %.2fx\n", name1, name2, speedup);
} else {
printf(" 两者性能相当\n");
}
printf("═══════════════════════════════════════════════════════════\n");
printf("\n");
}
性能趋势分析¶
/**
* @brief 性能趋势分析
* @param name 计数器名称
*/
void PERF_AnalyzeTrend(const char *name) {
perf_counter_t *counter = NULL;
for (int i = 0; i < g_perf_analyzer.counter_count; i++) {
if (strcmp(g_perf_analyzer.counters[i].name, name) == 0) {
counter = &g_perf_analyzer.counters[i];
break;
}
}
if (counter == NULL || counter->call_count == 0) {
printf("[PERF] 未找到计数器或无数据\n");
return;
}
// 计算历史数据的统计信息
uint32_t sum = 0;
uint32_t count = 0;
uint32_t min = 0xFFFFFFFF;
uint32_t max = 0;
for (int i = 0; i < PERF_HISTORY_SIZE; i++) {
if (counter->history[i] > 0) {
sum += counter->history[i];
count++;
if (counter->history[i] < min) min = counter->history[i];
if (counter->history[i] > max) max = counter->history[i];
}
}
if (count == 0) {
printf("[PERF] 历史数据不足\n");
return;
}
uint32_t avg = sum / count;
// 计算标准差
uint64_t variance_sum = 0;
for (int i = 0; i < PERF_HISTORY_SIZE; i++) {
if (counter->history[i] > 0) {
int32_t diff = (int32_t)counter->history[i] - (int32_t)avg;
variance_sum += (uint64_t)(diff * diff);
}
}
float std_dev = sqrtf((float)variance_sum / count);
float cv = (std_dev / avg) * 100.0f; // 变异系数
printf("\n");
printf("═══════════════════════════════════════════════════════════\n");
printf(" 性能趋势分析: %s\n", counter->name);
printf("═══════════════════════════════════════════════════════════\n");
printf(" 样本数量: %u\n", count);
printf(" 平均值: %u 周期\n", avg);
printf(" 标准差: %.2f 周期\n", std_dev);
printf(" 变异系数: %.2f%%\n", cv);
printf(" 最小值: %u 周期\n", min);
printf(" 最大值: %u 周期\n", max);
printf(" 范围: %u 周期\n", max - min);
printf("───────────────────────────────────────────────────────────\n");
// 性能稳定性评估
if (cv < 5.0f) {
printf(" 稳定性评估: 优秀 (变化很小)\n");
} else if (cv < 10.0f) {
printf(" 稳定性评估: 良好 (变化较小)\n");
} else if (cv < 20.0f) {
printf(" 稳定性评估: 一般 (有一定波动)\n");
} else {
printf(" 稳定性评估: 较差 (波动较大)\n");
}
printf("═══════════════════════════════════════════════════════════\n");
printf("\n");
}
可视化工具¶
Python可视化脚本¶
#!/usr/bin/env python3
"""
性能数据可视化工具
读取JSON格式的性能报告并生成图表
"""
import json
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import rcParams
# 配置中文字体
rcParams['font.sans-serif'] = ['SimHei'] # 用黑体显示中文
rcParams['axes.unicode_minus'] = False # 正常显示负号
def load_performance_data(filename):
"""加载性能数据"""
with open(filename, 'r', encoding='utf-8') as f:
data = json.load(f)
return data['performance_report']
def plot_performance_comparison(data):
"""绘制性能对比柱状图"""
counters = data['counters']
names = [c['name'] for c in counters]
avg_cycles = [c['avg_cycles'] for c in counters]
plt.figure(figsize=(12, 6))
bars = plt.bar(range(len(names)), avg_cycles, color='steelblue', alpha=0.8)
# 添加数值标签
for i, bar in enumerate(bars):
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{int(height)}',
ha='center', va='bottom', fontsize=9)
plt.xlabel('测量点', fontsize=12)
plt.ylabel('平均周期数', fontsize=12)
plt.title('性能对比分析', fontsize=14, fontweight='bold')
plt.xticks(range(len(names)), names, rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('performance_comparison.png', dpi=300)
print("已生成: performance_comparison.png")
def plot_performance_distribution(data):
"""绘制性能分布饼图"""
counters = data['counters']
names = [c['name'] for c in counters]
total_cycles = [c['total_cycles'] for c in counters]
plt.figure(figsize=(10, 8))
colors = plt.cm.Set3(np.linspace(0, 1, len(names)))
wedges, texts, autotexts = plt.pie(total_cycles, labels=names, autopct='%1.1f%%',
colors=colors, startangle=90)
# 美化文本
for text in texts:
text.set_fontsize(10)
for autotext in autotexts:
autotext.set_color('white')
autotext.set_fontweight('bold')
autotext.set_fontsize(9)
plt.title('性能占用分布', fontsize=14, fontweight='bold')
plt.axis('equal')
plt.tight_layout()
plt.savefig('performance_distribution.png', dpi=300)
print("已生成: performance_distribution.png")
def plot_performance_history(data):
"""绘制性能历史趋势图"""
counters = data['counters']
plt.figure(figsize=(14, 8))
for counter in counters:
name = counter['name']
history = counter['history']
# 过滤掉0值
valid_history = [h for h in history if h > 0]
if len(valid_history) > 0:
plt.plot(range(len(valid_history)), valid_history,
marker='o', label=name, linewidth=2, markersize=6)
plt.xlabel('测量序号', fontsize=12)
plt.ylabel('周期数', fontsize=12)
plt.title('性能历史趋势', fontsize=14, fontweight='bold')
plt.legend(loc='best', fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('performance_history.png', dpi=300)
print("已生成: performance_history.png")
def plot_min_max_range(data):
"""绘制最小/最大/平均值对比图"""
counters = data['counters']
names = [c['name'] for c in counters]
min_cycles = [c['min_cycles'] for c in counters]
avg_cycles = [c['avg_cycles'] for c in counters]
max_cycles = [c['max_cycles'] for c in counters]
x = np.arange(len(names))
width = 0.25
plt.figure(figsize=(14, 6))
plt.bar(x - width, min_cycles, width, label='最小值', color='lightgreen', alpha=0.8)
plt.bar(x, avg_cycles, width, label='平均值', color='steelblue', alpha=0.8)
plt.bar(x + width, max_cycles, width, label='最大值', color='coral', alpha=0.8)
plt.xlabel('测量点', fontsize=12)
plt.ylabel('周期数', fontsize=12)
plt.title('性能范围分析', fontsize=14, fontweight='bold')
plt.xticks(x, names, rotation=45, ha='right')
plt.legend(fontsize=10)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('performance_range.png', dpi=300)
print("已生成: performance_range.png")
def generate_html_report(data):
"""生成HTML报告"""
html = """
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>性能分析报告</title>
<style>
body {
font-family: 'Microsoft YaHei', Arial, sans-serif;
margin: 20px;
background-color: #f5f5f5;
}
.container {
max-width: 1200px;
margin: 0 auto;
background-color: white;
padding: 30px;
border-radius: 10px;
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
}
h1 {
color: #333;
border-bottom: 3px solid #4CAF50;
padding-bottom: 10px;
}
h2 {
color: #555;
margin-top: 30px;
}
table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
}
th, td {
padding: 12px;
text-align: left;
border-bottom: 1px solid #ddd;
}
th {
background-color: #4CAF50;
color: white;
}
tr:hover {
background-color: #f5f5f5;
}
.chart {
margin: 30px 0;
text-align: center;
}
.chart img {
max-width: 100%;
border: 1px solid #ddd;
border-radius: 5px;
}
.info-box {
background-color: #e3f2fd;
padding: 15px;
border-left: 4px solid #2196F3;
margin: 20px 0;
}
</style>
</head>
<body>
<div class="container">
<h1>🚀 性能分析报告</h1>
<div class="info-box">
<strong>CPU频率:</strong> {cpu_freq} MHz<br>
<strong>测量点数量:</strong> {counter_count}<br>
<strong>生成时间:</strong> <span id="datetime"></span>
</div>
<h2>📊 性能数据表</h2>
<table>
<thead>
<tr>
<th>名称</th>
<th>调用次数</th>
<th>平均周期</th>
<th>最小周期</th>
<th>最大周期</th>
<th>平均时间 (μs)</th>
</tr>
</thead>
<tbody>
{table_rows}
</tbody>
</table>
<h2>📈 性能图表</h2>
<div class="chart">
<h3>性能对比</h3>
<img src="performance_comparison.png" alt="性能对比">
</div>
<div class="chart">
<h3>性能分布</h3>
<img src="performance_distribution.png" alt="性能分布">
</div>
<div class="chart">
<h3>性能历史趋势</h3>
<img src="performance_history.png" alt="性能历史">
</div>
<div class="chart">
<h3>性能范围分析</h3>
<img src="performance_range.png" alt="性能范围">
</div>
</div>
<script>
document.getElementById('datetime').textContent = new Date().toLocaleString('zh-CN');
</script>
</body>
</html>
"""
# 生成表格行
table_rows = ""
for counter in data['counters']:
table_rows += f"""
<tr>
<td>{counter['name']}</td>
<td>{counter['call_count']}</td>
<td>{counter['avg_cycles']}</td>
<td>{counter['min_cycles']}</td>
<td>{counter['max_cycles']}</td>
<td>{counter['avg_time_us']:.2f}</td>
</tr>
"""
html = html.format(
cpu_freq=data['cpu_freq_mhz'],
counter_count=data['counter_count'],
table_rows=table_rows
)
with open('performance_report.html', 'w', encoding='utf-8') as f:
f.write(html)
print("已生成: performance_report.html")
def main():
"""主函数"""
import sys
if len(sys.argv) < 2:
print("用法: python visualize.py <performance_data.json>")
sys.exit(1)
filename = sys.argv[1]
try:
data = load_performance_data(filename)
print("正在生成可视化图表...")
plot_performance_comparison(data)
plot_performance_distribution(data)
plot_performance_history(data)
plot_min_max_range(data)
print("\n正在生成HTML报告...")
generate_html_report(data)
print("\n✅ 所有报告已生成完成!")
print("请打开 performance_report.html 查看完整报告")
except Exception as e:
print(f"错误: {e}")
sys.exit(1)
if __name__ == '__main__':
main()
项目集成¶
Makefile配置¶
# Makefile for Performance Analysis Tool
# 编译器配置
CC = arm-none-eabi-gcc
OBJCOPY = arm-none-eabi-objcopy
SIZE = arm-none-eabi-size
# 目标MCU
MCU = -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16
# 编译选项
CFLAGS = $(MCU)
CFLAGS += -O2 -g3
CFLAGS += -Wall -Wextra
CFLAGS += -ffunction-sections -fdata-sections
CFLAGS += -DUSE_HAL_DRIVER -DSTM32F407xx
CFLAGS += -DENABLE_PERF_MEASURE # 启用性能测量
# 包含路径
INCLUDES = -IInc
INCLUDES += -IDrivers/CMSIS/Include
INCLUDES += -IDrivers/CMSIS/Device/ST/STM32F4xx/Include
INCLUDES += -IDrivers/STM32F4xx_HAL_Driver/Inc
INCLUDES += -Iperformance # 性能分析工具头文件
# 链接选项
LDFLAGS = $(MCU)
LDFLAGS += -specs=nano.specs
LDFLAGS += -T STM32F407VGTx_FLASH.ld
LDFLAGS += -Wl,-Map=build/output.map,--cref
LDFLAGS += -Wl,--gc-sections
# 源文件
SOURCES = Src/main.c
SOURCES += Src/system_stm32f4xx.c
SOURCES += Src/stm32f4xx_it.c
SOURCES += performance/perf_counter.c # 性能分析工具源文件
SOURCES += $(wildcard Drivers/STM32F4xx_HAL_Driver/Src/*.c)
# 目标文件
OBJECTS = $(addprefix build/,$(notdir $(SOURCES:.c=.o)))
# 默认目标
all: build/firmware.elf build/firmware.bin build/firmware.hex
$(SIZE) build/firmware.elf
# 编译规则
build/%.o: Src/%.c | build
$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@
build/%.o: performance/%.c | build
$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@
build/%.o: Drivers/STM32F4xx_HAL_Driver/Src/%.c | build
$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@
# 链接
build/firmware.elf: $(OBJECTS)
$(CC) $(LDFLAGS) $^ -o $@
# 生成二进制文件
build/firmware.bin: build/firmware.elf
$(OBJCOPY) -O binary $< $@
build/firmware.hex: build/firmware.elf
$(OBJCOPY) -O ihex $< $@
# 创建构建目录
build:
mkdir -p build
# 清理
clean:
rm -rf build
# 烧录
flash: build/firmware.bin
st-flash write build/firmware.bin 0x8000000
# 调试
debug: build/firmware.elf
arm-none-eabi-gdb -ex "target remote localhost:3333" build/firmware.elf
.PHONY: all clean flash debug
CMake配置¶
# CMakeLists.txt for Performance Analysis Tool
cmake_minimum_required(VERSION 3.15)
# 项目配置
project(PerformanceAnalysisTool C ASM)
set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)
# 工具链配置
set(CMAKE_SYSTEM_NAME Generic)
set(CMAKE_SYSTEM_PROCESSOR ARM)
set(CMAKE_C_COMPILER arm-none-eabi-gcc)
set(CMAKE_ASM_COMPILER arm-none-eabi-gcc)
set(CMAKE_OBJCOPY arm-none-eabi-objcopy)
set(CMAKE_SIZE arm-none-eabi-size)
# MCU配置
set(MCU_FLAGS "-mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16")
# 编译选项
set(CMAKE_C_FLAGS "${MCU_FLAGS} -O2 -g3 -Wall -Wextra")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -ffunction-sections -fdata-sections")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DUSE_HAL_DRIVER -DSTM32F407xx")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_PERF_MEASURE")
# 链接选项
set(CMAKE_EXE_LINKER_FLAGS "${MCU_FLAGS} -specs=nano.specs")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -T ${CMAKE_SOURCE_DIR}/STM32F407VGTx_FLASH.ld")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,-Map=output.map,--cref")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gc-sections")
# 包含目录
include_directories(
Inc
Drivers/CMSIS/Include
Drivers/CMSIS/Device/ST/STM32F4xx/Include
Drivers/STM32F4xx_HAL_Driver/Inc
performance # 性能分析工具
)
# 源文件
file(GLOB_RECURSE SOURCES
"Src/*.c"
"performance/*.c" # 性能分析工具源文件
"Drivers/STM32F4xx_HAL_Driver/Src/*.c"
)
# 可执行文件
add_executable(${PROJECT_NAME}.elf ${SOURCES})
# 生成二进制文件
add_custom_command(TARGET ${PROJECT_NAME}.elf POST_BUILD
COMMAND ${CMAKE_OBJCOPY} -O binary ${PROJECT_NAME}.elf ${PROJECT_NAME}.bin
COMMAND ${CMAKE_OBJCOPY} -O ihex ${PROJECT_NAME}.elf ${PROJECT_NAME}.hex
COMMAND ${CMAKE_SIZE} ${PROJECT_NAME}.elf
COMMENT "Building binary and hex files"
)
最佳实践¶
性能测量注意事项¶
-
最小化测量开销
-
避免在中断中使用
-
合理设置测量粒度
// 粗粒度测量:适合整体性能分析 PERF_START("main_loop"); sensor_read(); data_process(); communication(); PERF_STOP("main_loop"); // 细粒度测量:适合性能优化 PERF_START("sensor_read"); sensor_read(); PERF_STOP("sensor_read"); PERF_START("data_process"); data_process(); PERF_STOP("data_process"); PERF_START("communication"); communication(); PERF_STOP("communication");
性能优化流程¶
┌─────────────────────────────────────────┐
│ 1. 建立性能基准 │
│ - 测量当前性能 │
│ - 记录关键指标 │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ 2. 识别性能瓶颈 │
│ - 使用PERF_DetectBottlenecks() │
│ - 分析热点函数 │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ 3. 优化实施 │
│ - 算法优化 │
│ - 代码优化 │
│ - 编译器优化 │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ 4. 验证优化效果 │
│ - 重新测量性能 │
│ - 使用PERF_Compare()对比 │
└─────────────────┬───────────────────────┘
│
▼
是否达到目标?
│
┌─────────┴─────────┐
│ │
是 否
│ │
▼ │
完成优化 ────────┘
常见问题排查¶
问题1:DWT计数器不工作
// 检查DWT是否使能
void debug_dwt_status(void) {
printf("CoreDebug_DEMCR: 0x%08X\n", (unsigned int)CoreDebug_DEMCR);
printf("DWT_CTRL: 0x%08X\n", (unsigned int)DWT_CTRL);
printf("DWT_CYCCNT: 0x%08X\n", (unsigned int)DWT_CYCCNT);
if (!(CoreDebug_DEMCR & CoreDebug_DEMCR_TRCENA)) {
printf("错误: TRCENA未使能\n");
}
if (!(DWT_CTRL & DWT_CTRL_CYCCNTENA)) {
printf("错误: CYCCNTENA未使能\n");
}
}
问题2:测量结果不准确
// 确保禁用中断以获得准确测量
void accurate_measurement(void) {
uint32_t primask = __get_PRIMASK();
__disable_irq();
PERF_START("critical_section");
// 关键代码
PERF_STOP("critical_section");
__set_PRIMASK(primask);
}
问题3:计数器溢出
// 对于长时间运行的代码,注意32位溢出
// DWT_CYCCNT在168MHz下约25.5秒溢出
void handle_long_running_code(void) {
// 方法1:分段测量
PERF_START("segment_1");
long_running_part_1();
PERF_STOP("segment_1");
PERF_START("segment_2");
long_running_part_2();
PERF_STOP("segment_2");
// 方法2:使用64位累加器(已在工具中实现)
}
项目扩展¶
支持多核处理器¶
/**
* @brief 多核性能分析器
*/
typedef struct {
perf_analyzer_t core_analyzers[4]; // 支持最多4核
uint8_t core_count;
} multi_core_perf_analyzer_t;
/**
* @brief 初始化多核性能分析
*/
int PERF_MultiCore_Init(uint8_t core_count, uint32_t cpu_freq_mhz) {
// 为每个核心初始化独立的分析器
// 实现略
return 0;
}
/**
* @brief 获取当前核心ID
*/
static uint8_t get_current_core_id(void) {
// 根据具体MCU实现
return 0;
}
/**
* @brief 多核性能测量开始
*/
int PERF_MultiCore_Start(const char *name) {
uint8_t core_id = get_current_core_id();
// 在对应核心的分析器上开始测量
return 0;
}
支持RTOS集成¶
/**
* @brief FreeRTOS任务性能监控
*/
#include "FreeRTOS.h"
#include "task.h"
/**
* @brief 任务性能统计
*/
typedef struct {
char task_name[configMAX_TASK_NAME_LEN];
uint32_t total_runtime;
uint32_t percentage;
} task_perf_stats_t;
/**
* @brief 获取所有任务的性能统计
*/
void PERF_GetTaskStats(task_perf_stats_t *stats, uint8_t *count) {
TaskStatus_t *task_status_array;
volatile UBaseType_t task_count;
uint32_t total_runtime;
// 获取任务数量
task_count = uxTaskGetNumberOfTasks();
// 分配内存
task_status_array = pvPortMalloc(task_count * sizeof(TaskStatus_t));
if (task_status_array != NULL) {
// 获取任务状态
task_count = uxTaskGetSystemState(task_status_array,
task_count,
&total_runtime);
// 计算每个任务的CPU占用率
for (UBaseType_t i = 0; i < task_count; i++) {
strncpy(stats[i].task_name,
task_status_array[i].pcTaskName,
configMAX_TASK_NAME_LEN);
stats[i].total_runtime = task_status_array[i].ulRunTimeCounter;
if (total_runtime > 0) {
stats[i].percentage = (stats[i].total_runtime * 100) / total_runtime;
} else {
stats[i].percentage = 0;
}
}
*count = task_count;
vPortFree(task_status_array);
}
}
/**
* @brief 打印任务性能报告
*/
void PERF_PrintTaskStats(void) {
task_perf_stats_t stats[16];
uint8_t count = 0;
PERF_GetTaskStats(stats, &count);
printf("\n");
printf("╔════════════════════════════════════════════════════════╗\n");
printf("║ RTOS任务性能统计 ║\n");
printf("╠════════════════════════════════════════════════════════╣\n");
printf("║ 任务名称 │ 运行时间 │ CPU占用率 ║\n");
printf("╠═══════════════════════╪═════════════╪═════════════════╣\n");
for (uint8_t i = 0; i < count; i++) {
printf("║ %-21s │ %10u │ %6u%% ║\n",
stats[i].task_name,
(unsigned int)stats[i].total_runtime,
(unsigned int)stats[i].percentage);
}
printf("╚════════════════════════════════════════════════════════╝\n");
printf("\n");
}
网络远程监控¶
/**
* @brief 通过网络发送性能数据
*/
#include "lwip/tcp.h"
#define PERF_SERVER_PORT 8080
/**
* @brief 发送性能数据到远程服务器
*/
int PERF_SendToServer(const char *server_ip, uint16_t port) {
// 创建TCP连接
struct tcp_pcb *pcb = tcp_new();
if (pcb == NULL) {
return -1;
}
// 连接到服务器
ip_addr_t server_addr;
ipaddr_aton(server_ip, &server_addr);
err_t err = tcp_connect(pcb, &server_addr, port, NULL);
if (err != ERR_OK) {
tcp_close(pcb);
return -1;
}
// 生成JSON数据
char json_buffer[2048];
generate_json_report(json_buffer, sizeof(json_buffer));
// 发送数据
tcp_write(pcb, json_buffer, strlen(json_buffer), TCP_WRITE_FLAG_COPY);
tcp_output(pcb);
// 关闭连接
tcp_close(pcb);
return 0;
}
/**
* @brief 启动性能监控HTTP服务器
*/
void PERF_StartHTTPServer(void) {
// 创建HTTP服务器
// 提供REST API接口查询性能数据
// 实现略
}
项目总结¶
项目成果¶
通过本项目,你将获得:
- 完整的性能分析工具
- DWT驱动程序
- 性能测量API
- 报告生成模块
-
可视化工具
-
实用的开发技能
- 底层硬件编程
- 性能优化方法
- 数据可视化
-
工具开发经验
-
可复用的代码库
- 模块化设计
- 易于集成
- 跨平台支持
- 完整文档
关键技术点¶
- DWT性能计数器
- 周期计数器(CYCCNT)
- CPI计数器
- 异常计数器
-
其他专用计数器
-
精确时间测量
- 周期级精度
- 溢出处理
-
中断影响最小化
-
数据管理
- 统计信息计算
- 历史记录维护
-
趋势分析
-
可视化展示
- 多种报告格式
- 图表生成
- Web界面
性能指标¶
工具本身的性能开销:
| 操作 | 开销(周期) | 时间(@168MHz) |
|---|---|---|
| PERF_Start() | ~50 | ~0.3 μs |
| PERF_Stop() | ~100 | ~0.6 μs |
| 单次完整测量 | ~150 | ~0.9 μs |
| 报告生成 | ~10000 | ~60 μs |
应用场景总结¶
- 算法优化
- 对比不同算法实现
- 找出性能瓶颈
-
验证优化效果
-
实时系统分析
- 任务执行时间监控
- 响应时间测量
-
系统负载分析
-
产品性能测试
- 建立性能基准
- 回归测试
-
性能报告生成
-
教学和学习
- 理解代码性能
- 学习优化技术
- 实践性能分析
后续改进方向¶
- 功能扩展
- 支持更多MCU平台
- 增加内存分析功能
- 集成功耗测量
-
支持无线数据传输
-
工具优化
- 降低测量开销
- 提高数据精度
- 优化存储效率
-
增强可视化效果
-
生态建设
- 开发IDE插件
- 提供在线分析平台
- 建立性能数据库
- 社区分享机制
参考资料¶
官方文档¶
- ARM Cortex-M4 Technical Reference Manual
- ARM CoreSight Architecture Specification
- DWT and ITM Programming Guide
推荐阅读¶
- 《ARM Cortex-M4 Cookbook》- Mark Fisher
- 《Embedded Systems Architecture》- Daniele Lacamera
- 《Real-Time Systems》- Jane W. S. Liu
开源项目¶
在线资源¶
- ARM Developer官网
- STM32社区论坛
- Embedded.com技术文章
- GitHub性能分析工具集合
附录¶
完整代码仓库¶
项目完整代码已上传至GitHub:
仓库包含: - 完整源代码 - 示例项目 - 测试用例 - 文档和教程 - Python可视化脚本
支持的MCU列表¶
| MCU系列 | 支持状态 | 说明 |
|---|---|---|
| STM32F4 | ✅ 完全支持 | 已测试 |
| STM32F7 | ✅ 完全支持 | 已测试 |
| STM32H7 | ✅ 完全支持 | 已测试 |
| STM32L4 | ✅ 完全支持 | 已测试 |
| NXP i.MX RT | ⚠️ 部分支持 | 需要适配 |
| Nordic nRF52 | ⚠️ 部分支持 | 需要适配 |
| TI TM4C | ⚠️ 部分支持 | 需要适配 |
常见问题FAQ¶
Q1: 工具支持哪些开发环境?
A: 支持Keil MDK、IAR EWARM、STM32CubeIDE、GCC等主流开发环境。
Q2: 如何最小化测量开销?
A: 使用条件编译、减少测量点数量、避免在中断中使用。
Q3: 可以在生产环境使用吗?
A: 可以,但建议通过条件编译在发布版本中禁用。
Q4: 如何处理多任务环境?
A: 工具支持RTOS环境,每个任务可以独立测量。
Q5: 数据如何导出?
A: 支持UART、USB、网络等多种方式,可输出JSON、CSV等格式。
项目作者: 嵌入式知识平台内容团队
最后更新: 2026-03-07
版本: 1.0.0
许可证: MIT License