From 8a11b39c41353ec5345250a0ca370b89d14e97bd Mon Sep 17 00:00:00 2001 From: Jasjeet Rangi Date: Wed, 23 Nov 2022 12:31:12 -0800 Subject: [PATCH] cpu: Move fetch stats from simple and minor to base This summarizes a series of changes to move general Simple, Minor, O3 CPU stats to BaseCPU. This commit focuses on moving numBranches from SimpleCPU to the FetchCPUStats in the BaseCPU, and numFetchSuspends from MinorCPU into FetchCPUStats. More general information about this relation chain is below 1. Summary: Moved general CPU stats found across Simple, Minor, and O3 CPU models into BaseCPU through new stat groups. The stat groups are FetchCPUStats, ExecuteCPUStats, and CommitCPUStats. Implemented the committedControl stat vector found in MinorCPU for Simple and O3 CPU. Implemented the numStoreInsts stat found in SimpleCPU for O3CPU. IPC and CPI stats are now tracked at the core and thread level in BaseCPU and are made universal for simple, minor, o3, and kvm CPUs. Duplicate stats across the models are merged into a single stat in BaseCPU under the same stat name. This change does not implement every general level stat moved to BaseCPU for every model. 2. Stat API Changes a. SimpleCPU: statExecutedInstType vector unified into committedInstType numCondCtrlInsts unified into committedControl::isControl b. O3CPU: i. Fetch Stage branches in fetch unified into with numBranches rate renamed to fetchRate insts unified into with numInsts ii. Execute Stage Regfile stats unified into base with use of Simple's stat naming numRefs in IEW unified into numMemRefs numRate from IEW renamed to instRate iii. Commit Stage committedInsts is renamed to numInstsNotNOP committedOps is renamed to numOpsNotNOP instsCommitted is unified into numInsts opsCommitted is unified into numOps branches is unified into committedControl::isControl floating is unified into numFpInsts integer is unified into numIntInsts loads is unified into numLoadInsts memRefs is renamed to numMemRefs vectorInstructions is unified into numVecInsts 3. Details: Created three stat groups in BaseCPU. FetchCPUStats track statistics related to the fetch stage. ExecuteCPUStats track statistics related to the execute stage. CommitCPUStats track statistics related to the commit stage. There are three vectors in Base that store unique pointers to per thread instances of these stat groups. The stat group pointer for thread i is accessible at index i of one of these vectors. For example, stat numCCRegReads of the execute stage for thread 0 can be accessed with executeStats[0]->numCCRegReads. The stats.txt output will print the thread ID of the stat group. For example, numVecRegReads on thread 0 of a single core prints as "board.processor.cores.core.executeStats0.numVecRegReads". NOTE: Multithreading in gem5 is untested. Therefore per thread stats output in stats.txt is not currently guaranteed to be correctly formatted. For FetchCPUStats, the stats moved from SimpleCPU are numBranches and numInsts. From MinorCPU, the stat moved is numFetchSuspends. From O3CPU, the stats moved are from the O3 fetch stage: Stat branches is unified into numBranches, stat rate is renamed to fetchRate in Base, stat insts is unified into numInsts, stat icacheStallCycles keeps the same name in Base. For ExecuteCPUStats, the stats moved from SimpleCPU are dcacheStallCycles, numCCRegReads, numCCRegWrites, numFpAluAccesses, numFpRegReads, numFpRegWrites, numIntAluAccesses, numIntRegReads, numIntRegWrites, numMemRefs, numMiscRegReads, numMiscRegWrites, numVecAluAccesses, numVecPredRegReads, numVecPredRegWrites, numVecRegReads, numVecRegWrites. The stat moved from MinorCPU is numDiscardedOps. From O3, the Regfile stats in CPU are unified into the reg stats in Base and use the names found originally in SimpleCPU. From O3 IEW stage, numInsts keeps the same name in Base, numBranches is unified into numBranches in base, numNop keeps the same name in Base, numRefs is unified into numMemRefs in Base, numLoadInsts and numStoreInsts are moved into Base, numRate is renamed to instRate in base. For CommitCPUStats, the stats moved from SimpleCPU are numCondCtrlInsts, numFpInsts, numIntInsts, numLoadInsts, numStoreInsts, numVecInsts. The stats moved from MinorCPU are numInsts, committedInstType, and committedControl. statExecutedInstType of SimpleCPU is unified with committedInstType of MinorCPU. Implemented committedControl stats from MinorCPU in Simple and O3 CPU. In MinorCPU, this stat was a 2D vector, where the first dimension is the thread ID. In base it is now a 1D vector that is tied to a thread ID via the commitStats vector that the object is accessible through. From the O3 commit stage, committedInsts is renamed to numInstsNotNOP, committedOps is renamed to numOpsNotNOP, instsCommitted is unified into numInsts, opsCommitted is renamed to numOps, committedInstType is unified into committedInstType from Minor, branches is removed because it duplicates committedControl::IsControl, floating is unified into numFpInsts, interger is unified into numIntInsts, loads is unified into numLoadInsts, numStoreInsts is implemented for tracking in O3, memRefs is renamed to numMemRefs, vectorInstructions is unified into numVecInsts. Note that numCondCtrlInsts of Simple is unified into committedControl::IsCondCtrl. Implemented IPC and CPI tracking inside BaseCPU. In BaseCPU::BaseCPUStats, numInsts and numOps track per CPU core committed instructions and operations. In BaseCPU::FetchCPUStats, numInsts and numOps track per thread fetched instructions and operations. In BaseCPU::CommitCPUStats, numInsts tracks per thread executed instructions. In BaseCPU::CommitCPUStats, numInsts and numOps track per thread committed instructions and operations. In BaseSimpleCPU, the countInst() function has been split into countInst(), countFetchInst(), and countCommitInst(). The stat count incrementation step of countInst() has been removed and delegated to the other two functions. countFetchInst() increments numInsts and numOps of the FetchCPUStats group for a thread. countCommitInst() increments the numInsts and numOps of the CommitCPUStats group for a thread and of the BaseCPUStats group for a CPU core. These functions are called in the appropriate stage within timing.cc and atomic.cc. The call to countInst() is left unchanged. countFetchInst() is called in preExecute(). countCommitInst() is called in postExecute(). For MinorCPU, only the commit level numInsts and numOps stats have been implemented. IPC and CPI stats have been added to BaseCPUStats (core level) and CommitCPUStats (thread level). The formulas for the IPC and CPI stats in CommitCPUStats are set in the BaseCPU constructor, after the CommitCPUStats stat group object has been created. These replace IPC, CPI, totalIpc, and totalCpi stats in O3. Replaced committedInsts stats of KVM CPU with commitStats.numInsts of BaseCPU. This results in IPC and CPI printing in stats.txt for KVM simulations. This change does not implement most general stats found in one or two model for all others. Jira Ticket: https://gem5.atlassian.net/browse/GEM5-1304 Change-Id: I3c852f8dba3268c71b7a3415480fb63d8dc30cb7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66031 Maintainer: Bobby Bruce Reviewed-by: Bobby Bruce Tested-by: kokoro --- src/cpu/base.cc | 19 +++++++++++++++++++ src/cpu/base.hh | 16 ++++++++++++++++ src/cpu/minor/execute.cc | 2 +- src/cpu/minor/stats.cc | 2 -- src/cpu/minor/stats.hh | 3 --- src/cpu/simple/base.cc | 2 +- src/cpu/simple/exec_context.hh | 7 ------- 7 files changed, 37 insertions(+), 14 deletions(-) diff --git a/src/cpu/base.cc b/src/cpu/base.cc index d2c0a78d44..1d293397e5 100644 --- a/src/cpu/base.cc +++ b/src/cpu/base.cc @@ -191,6 +191,11 @@ BaseCPU::BaseCPU(const Params &p, bool is_checker) modelResetPort.onChange([this](const bool &new_val) { setReset(new_val); }); + // create a stat group object for each thread on this core + fetchStats.reserve(numThreads); + for (int i = 0; i < numThreads; i++) { + fetchStats.emplace_back(new FetchCPUStats(this, i)); + } } void @@ -827,4 +832,18 @@ BaseCPU::GlobalStats::GlobalStats(statistics::Group *parent) hostOpRate = simOps / hostSeconds; } +BaseCPU:: +FetchCPUStats::FetchCPUStats(statistics::Group *parent, int thread_id) + : statistics::Group(parent, csprintf("fetchStats%i", thread_id).c_str()), + ADD_STAT(numBranches, statistics::units::Count::get(), + "Number of branches fetched"), + ADD_STAT(numFetchSuspends, statistics::units::Count::get(), + "Number of times Execute suspended instruction fetching") + +{ + numBranches + .prereq(numBranches); + +} + } // namespace gem5 diff --git a/src/cpu/base.hh b/src/cpu/base.hh index 084d9b9305..d6e5d38838 100644 --- a/src/cpu/base.hh +++ b/src/cpu/base.hh @@ -43,6 +43,7 @@ #define __CPU_BASE_HH__ #include +#include #include "arch/generic/interrupts.hh" #include "base/statistics.hh" @@ -676,6 +677,21 @@ class BaseCPU : public ClockedObject const Cycles pwrGatingLatency; const bool powerGatingOnIdle; EventFunctionWrapper enterPwrGatingEvent; + + public: + struct FetchCPUStats : public statistics::Group + { + FetchCPUStats(statistics::Group *parent, int thread_id); + + /* Total number of branches fetched */ + statistics::Scalar numBranches; + + /* Number of times fetch was asked to suspend by Execute */ + statistics::Scalar numFetchSuspends; + + }; + + std::vector> fetchStats; }; } // namespace gem5 diff --git a/src/cpu/minor/execute.cc b/src/cpu/minor/execute.cc index 5eaaf5804e..323ae2982b 100644 --- a/src/cpu/minor/execute.cc +++ b/src/cpu/minor/execute.cc @@ -1054,7 +1054,7 @@ Execute::commitInst(MinorDynInstPtr inst, bool early_memory_issue, DPRINTF(MinorInterrupt, "Suspending thread: %d from Execute" " inst: %s\n", thread_id, *inst); - cpu.stats.numFetchSuspends++; + cpu.fetchStats[thread_id]->numFetchSuspends++; updateBranchData(thread_id, BranchData::SuspendThread, inst, resume_pc, branch); diff --git a/src/cpu/minor/stats.cc b/src/cpu/minor/stats.cc index 64d4c475e0..e9ca562c16 100644 --- a/src/cpu/minor/stats.cc +++ b/src/cpu/minor/stats.cc @@ -52,8 +52,6 @@ MinorStats::MinorStats(BaseCPU *base_cpu) ADD_STAT(numDiscardedOps, statistics::units::Count::get(), "Number of ops (including micro ops) which were discarded before " "commit"), - ADD_STAT(numFetchSuspends, statistics::units::Count::get(), - "Number of times Execute suspended instruction fetching"), ADD_STAT(quiesceCycles, statistics::units::Cycle::get(), "Total number of cycles that CPU has spent quiesced or waiting " "for an interrupt"), diff --git a/src/cpu/minor/stats.hh b/src/cpu/minor/stats.hh index 1ab81f4407..524d20f85d 100644 --- a/src/cpu/minor/stats.hh +++ b/src/cpu/minor/stats.hh @@ -68,9 +68,6 @@ struct MinorStats : public statistics::Group /** Number of ops discarded before committing */ statistics::Scalar numDiscardedOps; - /** Number of times fetch was asked to suspend by Execute */ - statistics::Scalar numFetchSuspends; - /** Number of cycles in quiescent state */ statistics::Scalar quiesceCycles; diff --git a/src/cpu/simple/base.cc b/src/cpu/simple/base.cc index 768f63ede5..b2a11fd84b 100644 --- a/src/cpu/simple/base.cc +++ b/src/cpu/simple/base.cc @@ -396,7 +396,7 @@ BaseSimpleCPU::postExecute() } if (curStaticInst->isControl()) { - ++t_info.execContextStats.numBranches; + ++fetchStats[t_info.thread->threadId()]->numBranches; } /* Power model statistics */ diff --git a/src/cpu/simple/exec_context.hh b/src/cpu/simple/exec_context.hh index 0f20763f28..d4bb017481 100644 --- a/src/cpu/simple/exec_context.hh +++ b/src/cpu/simple/exec_context.hh @@ -152,8 +152,6 @@ class SimpleExecContext : public ExecContext "ICache total stall cycles"), ADD_STAT(dcacheStallCycles, statistics::units::Cycle::get(), "DCache total stall cycles"), - ADD_STAT(numBranches, statistics::units::Count::get(), - "Number of branches fetched"), ADD_STAT(numPredictedBranches, statistics::units::Count::get(), "Number of branches predicted as taken"), ADD_STAT(numBranchMispred, statistics::units::Count::get(), @@ -203,9 +201,6 @@ class SimpleExecContext : public ExecContext numIdleCycles = idleFraction * cpu->baseStats.numCycles; numBusyCycles = notIdleFraction * cpu->baseStats.numCycles; - numBranches - .prereq(numBranches); - numPredictedBranches .prereq(numPredictedBranches); @@ -297,8 +292,6 @@ class SimpleExecContext : public ExecContext statistics::Scalar dcacheStallCycles; /// @{ - /// Total number of branches fetched - statistics::Scalar numBranches; /// Number of branches predicted as taken statistics::Scalar numPredictedBranches; /// Number of misprediced branches