From cf6783d6ac4dec80788e895f9e2e8a7f5240c849 Mon Sep 17 00:00:00 2001 From: Melissa Jost Date: Mon, 13 Mar 2023 01:59:16 -0700 Subject: [PATCH] cpu: Move fetch stats from simple and minor to base This summarizes a series of changes to move general Simple, Minor, O3 CPU stats to BaseCPU. This commit focuses on moving numBranches from SimpleCPU to the FetchCPUStats in the BaseCPU, and numFetchSuspends from MinorCPU into FetchCPUStats. More general information about this relation chain is below. In addition, this changeset first adds all relevant stats to base in the first half, then removes the duplicated stats in the second half. Duplicated stats are denoted in the code. In addition, to view the difference between the old stats output and the current output, view https://gem5.atlassian.net/browse/GEM5-1304 1. Summary: Moved general CPU stats found across Simple, Minor, and O3 CPU models into BaseCPU through new stat groups. The stat groups are FetchCPUStats, ExecuteCPUStats, and CommitCPUStats. Implemented the committedControl stat vector found in MinorCPU for Simple and O3 CPU. Implemented the numStoreInsts stat found in SimpleCPU for O3CPU. IPC and CPI stats are now tracked at the core and thread level in BaseCPU and are made universal for simple, minor, o3, and kvm CPUs. Duplicate stats across the models are merged into a single stat in BaseCPU under the same stat name. This change does not implement every general level stat moved to BaseCPU for every model. 2. Stat API Changes a. SimpleCPU: statExecutedInstType vector unified into committedInstType numCondCtrlInsts unified into committedControl::isControl b. O3CPU: i. Fetch Stage branches in fetch unified into with numBranches rate renamed to fetchRate insts unified into with numInsts ii. Execute Stage Regfile stats unified into base with use of Simple's stat naming numRefs in IEW unified into numMemRefs numRate from IEW renamed to instRate iii. Commit Stage committedInsts is renamed to numInstsNotNOP committedOps is renamed to numOpsNotNOP instsCommitted is unified into numInsts opsCommitted is unified into numOps branches is unified into committedControl::isControl floating is unified into numFpInsts integer is unified into numIntInsts loads is unified into numLoadInsts memRefs is renamed to numMemRefs vectorInstructions is unified into numVecInsts 3. Details: Created three stat groups in BaseCPU. FetchCPUStats track statistics related to the fetch stage. ExecuteCPUStats track statistics related to the execute stage. CommitCPUStats track statistics related to the commit stage. There are three vectors in Base that store unique pointers to per thread instances of these stat groups. The stat group pointer for thread i is accessible at index i of one of these vectors. For example, stat numCCRegReads of the execute stage for thread 0 can be accessed with executeStats[0]->numCCRegReads. The stats.txt output will print the thread ID of the stat group. For example, numVecRegReads on thread 0 of a single core prints as "board.processor.cores.core.executeStats0.numVecRegReads". NOTE: Multithreading in gem5 is untested. Therefore per thread stats output in stats.txt is not currently guaranteed to be correctly formatted. For FetchCPUStats, the stats moved from SimpleCPU are numBranches and numInsts. From MinorCPU, the stat moved is numFetchSuspends. From O3CPU, the stats moved are from the O3 fetch stage: Stat branches is unified into numBranches, stat rate is renamed to fetchRate in Base, stat insts is unified into numInsts, stat icacheStallCycles keeps the same name in Base. For ExecuteCPUStats, the stats moved from SimpleCPU are dcacheStallCycles, numCCRegReads, numCCRegWrites, numFpAluAccesses, numFpRegReads, numFpRegWrites, numIntAluAccesses, numIntRegReads, numIntRegWrites, numMemRefs, numMiscRegReads, numMiscRegWrites, numVecAluAccesses, numVecPredRegReads, numVecPredRegWrites, numVecRegReads, numVecRegWrites. The stat moved from MinorCPU is numDiscardedOps. From O3, the Regfile stats in CPU are unified into the reg stats in Base and use the names found originally in SimpleCPU. From O3 IEW stage, numInsts keeps the same name in Base, numBranches is unified into numBranches in base, numNop keeps the same name in Base, numRefs is unified into numMemRefs in Base, numLoadInsts and numStoreInsts are moved into Base, numRate is renamed to instRate in base. For CommitCPUStats, the stats moved from SimpleCPU are numCondCtrlInsts, numFpInsts, numIntInsts, numLoadInsts, numStoreInsts, numVecInsts. The stats moved from MinorCPU are numInsts, committedInstType, and committedControl. statExecutedInstType of SimpleCPU is unified with committedInstType of MinorCPU. Implemented committedControl stats from MinorCPU in Simple and O3 CPU. In MinorCPU, this stat was a 2D vector, where the first dimension is the thread ID. In base it is now a 1D vector that is tied to a thread ID via the commitStats vector that the object is accessible through. From the O3 commit stage, committedInsts is renamed to numInstsNotNOP, committedOps is renamed to numOpsNotNOP, instsCommitted is unified into numInsts, opsCommitted is renamed to numOps, committedInstType is unified into committedInstType from Minor, branches is removed because it duplicates committedControl::IsControl, floating is unified into numFpInsts, interger is unified into numIntInsts, loads is unified into numLoadInsts, numStoreInsts is implemented for tracking in O3, memRefs is renamed to numMemRefs, vectorInstructions is unified into numVecInsts. Note that numCondCtrlInsts of Simple is unified into committedControl::IsCondCtrl. Implemented IPC and CPI tracking inside BaseCPU. In BaseCPU::BaseCPUStats, numInsts and numOps track per CPU core committed instructions and operations. In BaseCPU::FetchCPUStats, numInsts and numOps track per thread fetched instructions and operations. In BaseCPU::CommitCPUStats, numInsts tracks per thread executed instructions. In BaseCPU::CommitCPUStats, numInsts and numOps track per thread committed instructions and operations. In BaseSimpleCPU, the countInst() function has been split into countInst(), countFetchInst(), and countCommitInst(). The stat count incrementation step of countInst() has been removed and delegated to the other two functions. countFetchInst() increments numInsts and numOps of the FetchCPUStats group for a thread. countCommitInst() increments the numInsts and numOps of the CommitCPUStats group for a thread and of the BaseCPUStats group for a CPU core. These functions are called in the appropriate stage within timing.cc and atomic.cc. The call to countInst() is left unchanged. countFetchInst() is called in preExecute(). countCommitInst() is called in postExecute(). For MinorCPU, only the commit level numInsts and numOps stats have been implemented. IPC and CPI stats have been added to BaseCPUStats (core level) and CommitCPUStats (thread level). The formulas for the IPC and CPI stats in CommitCPUStats are set in the BaseCPU constructor, after the CommitCPUStats stat group object has been created. These replace IPC, CPI, totalIpc, and totalCpi stats in O3. Replaced committedInsts stats of KVM CPU with commitStats.numInsts of BaseCPU. This results in IPC and CPI printing in stats.txt for KVM simulations. This change does not implement most general stats found in one or two model for all others. Change-Id: I44d8ff6f3d102e94e53f9b2ce9b7917d96341e51 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69097 Reviewed-by: Bobby Bruce Tested-by: kokoro Maintainer: Bobby Bruce --- src/cpu/base.cc | 19 +++++++++++++++++++ src/cpu/base.hh | 17 +++++++++++++++++ src/cpu/minor/execute.cc | 2 ++ src/cpu/simple/base.cc | 2 ++ 4 files changed, 40 insertions(+) diff --git a/src/cpu/base.cc b/src/cpu/base.cc index d2c0a78d44..1d293397e5 100644 --- a/src/cpu/base.cc +++ b/src/cpu/base.cc @@ -191,6 +191,11 @@ BaseCPU::BaseCPU(const Params &p, bool is_checker) modelResetPort.onChange([this](const bool &new_val) { setReset(new_val); }); + // create a stat group object for each thread on this core + fetchStats.reserve(numThreads); + for (int i = 0; i < numThreads; i++) { + fetchStats.emplace_back(new FetchCPUStats(this, i)); + } } void @@ -827,4 +832,18 @@ BaseCPU::GlobalStats::GlobalStats(statistics::Group *parent) hostOpRate = simOps / hostSeconds; } +BaseCPU:: +FetchCPUStats::FetchCPUStats(statistics::Group *parent, int thread_id) + : statistics::Group(parent, csprintf("fetchStats%i", thread_id).c_str()), + ADD_STAT(numBranches, statistics::units::Count::get(), + "Number of branches fetched"), + ADD_STAT(numFetchSuspends, statistics::units::Count::get(), + "Number of times Execute suspended instruction fetching") + +{ + numBranches + .prereq(numBranches); + +} + } // namespace gem5 diff --git a/src/cpu/base.hh b/src/cpu/base.hh index 084d9b9305..e8fb777a76 100644 --- a/src/cpu/base.hh +++ b/src/cpu/base.hh @@ -42,6 +42,7 @@ #ifndef __CPU_BASE_HH__ #define __CPU_BASE_HH__ +#include #include #include "arch/generic/interrupts.hh" @@ -676,6 +677,22 @@ class BaseCPU : public ClockedObject const Cycles pwrGatingLatency; const bool powerGatingOnIdle; EventFunctionWrapper enterPwrGatingEvent; + + + public: + struct FetchCPUStats : public statistics::Group + { + FetchCPUStats(statistics::Group *parent, int thread_id); + + /* Total number of branches fetched */ + statistics::Scalar numBranches; + + /* Number of times fetch was asked to suspend by Execute */ + statistics::Scalar numFetchSuspends; + + }; + + std::vector> fetchStats; }; } // namespace gem5 diff --git a/src/cpu/minor/execute.cc b/src/cpu/minor/execute.cc index 5eaaf5804e..c37c6c6696 100644 --- a/src/cpu/minor/execute.cc +++ b/src/cpu/minor/execute.cc @@ -1054,7 +1054,9 @@ Execute::commitInst(MinorDynInstPtr inst, bool early_memory_issue, DPRINTF(MinorInterrupt, "Suspending thread: %d from Execute" " inst: %s\n", thread_id, *inst); + // output both old and new stats cpu.stats.numFetchSuspends++; + cpu.fetchStats[thread_id]->numFetchSuspends++; updateBranchData(thread_id, BranchData::SuspendThread, inst, resume_pc, branch); diff --git a/src/cpu/simple/base.cc b/src/cpu/simple/base.cc index 768f63ede5..1632f545a2 100644 --- a/src/cpu/simple/base.cc +++ b/src/cpu/simple/base.cc @@ -396,7 +396,9 @@ BaseSimpleCPU::postExecute() } if (curStaticInst->isControl()) { + // output both old and new stats ++t_info.execContextStats.numBranches; + ++fetchStats[t_info.thread->threadId()]->numBranches; } /* Power model statistics */