misc: Merge branch 'release-staging-v21-1' into develop

Change-Id: I0f69d3d0863f77c02ac8089fb4dccee3aa70a4ea
This commit is contained in:
Bobby R. Bruce
2021-07-28 17:37:04 -07:00
15 changed files with 377 additions and 50 deletions

View File

@@ -1,3 +1,146 @@
# Version 21.1.0.0
Since v21.0 we have received 780 commits with 48 unique contributors, closing 64 issues on our [Jira Issue Tracker](https://gem5.atlassian.net/).
In addition to our [first gem5 minor release](#version-21.0.1.0), we have included a range of new features, and API changes which we outline below.
## Added the Components Library [Alpha Release]
The purpose of the gem5 components library is to provide gem5 users a standard set of common and useful gem5 components, pre-built, to add to their experiments.
The gem5 components library adopts a modular architecture design so components may be easily added, removed, and extended, as needed.
Examples of using the gem5 components library can be found in [`configs/example/components-library`](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.1.0.0/configs/example/components-library).
**Important Disclaimer:**
This is a pre-alpha release.
The purpose of this release is to get community feedback.
Though some testing has been done, we expect regular fixes and improvements until the library reaches a stable state.
A Jira Ticket outlining TODOs and known bugs can be found at <https://gem5.atlassian.net/browse/GEM5-648>.
## Improvements to GPU simulation
### ROCm 4.0 support
ROCm 4.0 is now officially supported.
### gfx801 (Carrizo) and gfx803 (Fiji) support
gfx801 (Carrizo) and gfx803 (Fiji) are both supported and tested with the gem5-resources applications.
### Better scoreboarding support
Better scoreboarding support has been added.
This reduces stalls by up to 42%.
## Accuracy and coverage stat added to prefetcher caches
Accuracy and coverage stats have been added for prefetcher caches.
Accuracy is defined as the ratio of the number of prefetch requests counted as useful over the total number of prefetch requests issued.
Coverage is defined as the ratio of the number of prefetch requests counted as useful over the number of useful prefetch request plus the remaining demand misses.
## POWER 64-bit SE mode
The POWER 64-bit ISA is now supported in Syscall Execution mode.
## RISC-V PMP now supported
gem5 now supports simulation of RISC-V Physical Memory Protection (PMP).
Simulations can boot and run Keystone and Eyrie.
## Improvements to the replacement policies
The gem5 replacement policies framework now supports more complex algorithms.
It now allows using addresses, PC, and other information within a policy.
**Note:**
Assuming this information is promptly available at the cache may be unrealistic.
### Set Dueling
Classes that handle set dueling have been created ([Dueler and DuelingMonitor](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.1.0.0/src/mem/cache/tags/dueling.hh)).
They can be used in conjunction with different cache policies.
A [replacement policy that uses it](https://gem5.googlesource.com/public/gem5/+/refs/tags/v21.1.0.0/src/mem/cache/replacement_policies/dueling_rp.hh) has been added for guidance.
## RISC-V is now supported as a host machine.
gem5 is now compilable and runnable on a RISC-V host system.
## New Deprecation MARCOs added
Deprecation MACROS have been added for deprecating namespaces (`GEM5_DEPRECATED_NAMESPACE`), and deprecating other MACROs (`GEM5_DEPRECATED_MACRO`).
**Note:**
For technical reasons, using old macros won't produce any deprecation warnings.
## Refactoring of the gem5 Namespaces
Snake case has been adopted as the new convention for name spaces.
As a consequence, multiple namespaces have been renamed:
* `Minor` -> `minor`
* `Loader` -> `loader`
* `Stats` -> `statistics`
* `Enums` -> `enums`
* `Net` -> `networking`
* `ProbePoints` -> `probing`
* `ContextSwitchTaskId` -> `context_switch_task_id`
* `Prefetcher` -> `prefetch`
* `Encoder` -> `encoder`
* `Compressor` -> `compression`
* `QoS` -> `qos`
* `ReplacementPolicy` -> `replacement_policy`
* `Mouse` -> `mouse`
* `Keyboard` -> `keyboard`
* `Int` -> `as_int`
* `Float` -> `as_float`
* `FastModel` -> `fastmodel`
* `GuestABI` -> `guest_abi`
* `LockedMem` -> `locked_mem`
* `DeliveryMode` -> `delivery_mode`
* `PseudoInst` -> `pseudo_inst`
* `DecodeCache` -> `decode_cache`
* `BitfieldBackend` -> `bitfield_backend`
* `FreeBSD` -> `free_bsd`
* `Linux` -> `linux`
* `Units` -> `units`
* `SimClock` -> `sim_clock`
* `BloomFilter` -> `bloom_filter`
* `X86Macroop` -> `x86_macroop`
* `ConditionTests` -> `condition_tests`
* `IntelMP` -> `intelmp`
* `SMBios` -> `smbios`
* `RomLables` -> `rom_labels`
* `SCMI` -> `scmi`
* `iGbReg` -> `igbreg`
* `Ps2` -> `ps2`
* `CopyEngineReg` -> `copy_engine_reg`
* `TxdOp` -> `txd_op`
* `Sinic` -> `sinic`
* `Debug` -> `debug`
In addition some other namespaces were added:
* `gem5::ruby`, for Ruby-related files
* `gem5::ruby::garnet`, for garnet-related files
* `gem5::o3`, for the O3-cpu's related files
* `gem5::memory`, for files related to memories
Finally, the `m5` namespace has been renamed `gem5`.
## MACROs in `base/compiler.hh`
The MACROs in base/compiler.hh of the form `M5_*` have been deprecated and replaced with macros of the form `GEM5_*`, with some other minor name adjustments.
## MemObject Removed
MemObject simobject had been marked for deprecation and has now been officially removed from the gem5 codebase.
## Minimum GCC version increased to 7; minimum Clang version increased to 6; Clang 10 and 11 supported; C++17 supported
GCC version 5 and 6 are no longer supported.
GCC 7 is now the minimum GCC compiler version supported.
This changes allows has allowed us to move to the C++17 standard for development.
In addition, the minimum Clang version has increased to 6, and Clang 10 and 11 are now officially supported.
# Version 21.0.1.0
Version 21.0.1 is a minor gem5 release consisting of bug fixes. The 21.0.1 release:

View File

@@ -330,12 +330,6 @@ if main['GCC'] or main['CLANG']:
if GetOption('gold_linker'):
main.Append(LINKFLAGS='-fuse-ld=gold')
# Treat warnings as errors but white list some warnings that we
# want to allow (e.g., deprecation warnings).
main.Append(CCFLAGS=['-Werror',
'-Wno-error=deprecated-declarations',
'-Wno-error=deprecated',
])
else:
error('\n'.join((
"Don't know what compiler options to use for your compiler.",

View File

@@ -213,7 +213,7 @@ def define_defaults(defaults):
os.pardir,
os.pardir))
defaults.result_path = os.path.join(os.getcwd(), 'testing-results')
defaults.resource_url = 'http://dist.gem5.org/dist/develop'
defaults.resource_url = 'http://dist.gem5.org/dist/v21-1'
defaults.resource_path = os.path.abspath(os.path.join(defaults.base_dir,
'tests',
'gem5',

View File

@@ -31,7 +31,7 @@ PROJECT_NAME = gem5
# This could be handy for archiving the generated documentation or
# if some version control system is used.
PROJECT_NUMBER = DEVELOP-FOR-v21.1
PROJECT_NUMBER = v21.1.0.0
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
# base path where the generated documentation will be put.

View File

@@ -36314,7 +36314,7 @@ namespace Gcn3ISA
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -36363,7 +36363,7 @@ namespace Gcn3ISA
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
void
@@ -39384,8 +39384,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
} // execute
@@ -39448,8 +39451,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -39511,8 +39517,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -39603,8 +39612,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -39667,8 +39679,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -39731,8 +39746,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -39804,8 +39822,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -39889,8 +39910,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
} // execute
@@ -39952,8 +39976,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40015,8 +40042,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40079,8 +40109,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40151,8 +40184,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40227,8 +40263,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe
.issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40294,8 +40333,11 @@ namespace Gcn3ISA
"Flats to private aperture not tested yet\n");
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
ConstVecOperandU32 data(gpuDynInst, extData.DATA);
@@ -40408,8 +40450,11 @@ namespace Gcn3ISA
"Flats to private aperture not tested yet\n");
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40492,8 +40537,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40576,8 +40624,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
void
@@ -40834,8 +40885,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -40918,8 +40972,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -41044,8 +41101,11 @@ namespace Gcn3ISA
"Flats to private aperture not tested yet\n");
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -41129,8 +41189,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -41215,8 +41278,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -41483,8 +41549,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}
@@ -41570,8 +41639,11 @@ namespace Gcn3ISA
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
gpuDynInst->computeUnit()->globalMemoryPipe.
issueRequest(gpuDynInst);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
gpuDynInst->computeUnit()->localMemoryPipe
.issueRequest(gpuDynInst);
} else {
fatal("Non global flat instructions not implemented yet.\n");
fatal("Unsupported scope for flat instruction.\n");
}
}

View File

@@ -1277,12 +1277,12 @@ namespace Gcn3ISA
reg = extData.SRSRC;
srcOps.emplace_back(reg, getOperandSize(opNum), true,
true, false, false);
isScalarReg(reg), false, false);
opNum++;
reg = extData.SOFFSET;
srcOps.emplace_back(reg, getOperandSize(opNum), true,
true, false, false);
isScalarReg(reg), false, false);
opNum++;
}
@@ -1368,12 +1368,12 @@ namespace Gcn3ISA
reg = extData.SRSRC;
srcOps.emplace_back(reg, getOperandSize(opNum), true,
true, false, false);
isScalarReg(reg), false, false);
opNum++;
reg = extData.SOFFSET;
srcOps.emplace_back(reg, getOperandSize(opNum), true,
true, false, false);
isScalarReg(reg), false, false);
opNum++;
// extData.VDATA moves in the reg list depending on the instruction
@@ -1441,13 +1441,13 @@ namespace Gcn3ISA
reg = extData.SRSRC;
srcOps.emplace_back(reg, getOperandSize(opNum), true,
true, false, false);
isScalarReg(reg), false, false);
opNum++;
if (getNumOperands() == 4) {
reg = extData.SSAMP;
srcOps.emplace_back(reg, getOperandSize(opNum), true,
true, false, false);
isScalarReg(reg), false, false);
opNum++;
}

View File

@@ -799,35 +799,107 @@ namespace Gcn3ISA
void
initMemRead(GPUDynInstPtr gpuDynInst)
{
initMemReqHelper<T, 1>(gpuDynInst, MemCmd::ReadReq);
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
initMemReqHelper<T, 1>(gpuDynInst, MemCmd::ReadReq);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
Wavefront *wf = gpuDynInst->wavefront();
for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
if (gpuDynInst->exec_mask[lane]) {
Addr vaddr = gpuDynInst->addr[lane];
(reinterpret_cast<T*>(gpuDynInst->d_data))[lane]
= wf->ldsChunk->read<T>(vaddr);
}
}
}
}
template<int N>
void
initMemRead(GPUDynInstPtr gpuDynInst)
{
initMemReqHelper<VecElemU32, N>(gpuDynInst, MemCmd::ReadReq);
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
initMemReqHelper<VecElemU32, N>(gpuDynInst, MemCmd::ReadReq);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
Wavefront *wf = gpuDynInst->wavefront();
for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
if (gpuDynInst->exec_mask[lane]) {
Addr vaddr = gpuDynInst->addr[lane];
for (int i = 0; i < N; ++i) {
(reinterpret_cast<VecElemU32*>(
gpuDynInst->d_data))[lane * N + i]
= wf->ldsChunk->read<VecElemU32>(
vaddr + i*sizeof(VecElemU32));
}
}
}
}
}
template<typename T>
void
initMemWrite(GPUDynInstPtr gpuDynInst)
{
initMemReqHelper<T, 1>(gpuDynInst, MemCmd::WriteReq);
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
initMemReqHelper<T, 1>(gpuDynInst, MemCmd::WriteReq);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
Wavefront *wf = gpuDynInst->wavefront();
for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
if (gpuDynInst->exec_mask[lane]) {
Addr vaddr = gpuDynInst->addr[lane];
wf->ldsChunk->write<T>(vaddr,
(reinterpret_cast<T*>(gpuDynInst->d_data))[lane]);
}
}
}
}
template<int N>
void
initMemWrite(GPUDynInstPtr gpuDynInst)
{
initMemReqHelper<VecElemU32, N>(gpuDynInst, MemCmd::WriteReq);
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
initMemReqHelper<VecElemU32, N>(gpuDynInst, MemCmd::WriteReq);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
Wavefront *wf = gpuDynInst->wavefront();
for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
if (gpuDynInst->exec_mask[lane]) {
Addr vaddr = gpuDynInst->addr[lane];
for (int i = 0; i < N; ++i) {
wf->ldsChunk->write<VecElemU32>(
vaddr + i*sizeof(VecElemU32),
(reinterpret_cast<VecElemU32*>(
gpuDynInst->d_data))[lane * N + i]);
}
}
}
}
}
template<typename T>
void
initAtomicAccess(GPUDynInstPtr gpuDynInst)
{
initMemReqHelper<T, 1>(gpuDynInst, MemCmd::SwapReq, true);
if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
initMemReqHelper<T, 1>(gpuDynInst, MemCmd::SwapReq, true);
} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
Wavefront *wf = gpuDynInst->wavefront();
for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
if (gpuDynInst->exec_mask[lane]) {
Addr vaddr = gpuDynInst->addr[lane];
AtomicOpFunctor* amo_op =
gpuDynInst->makeAtomicOpFunctor<T>(
&(reinterpret_cast<T*>(
gpuDynInst->a_data))[lane],
&(reinterpret_cast<T*>(
gpuDynInst->x_data))[lane]).get();
T tmp = wf->ldsChunk->read<T>(vaddr);
(*amo_op)(reinterpret_cast<uint8_t *>(&tmp));
wf->ldsChunk->write<T>(vaddr, tmp);
(reinterpret_cast<T*>(gpuDynInst->d_data))[lane] = tmp;
}
}
}
}
void

View File

@@ -32,6 +32,6 @@ namespace gem5
/**
* @ingroup api_base_utils
*/
const char *gem5Version = "[DEVELOP-FOR-V21.01]";
const char *gem5Version = "21.1.0.0";
} // namespace gem5

View File

@@ -834,7 +834,10 @@ GPUDynInst::resolveFlatSegment(const VectorMask &mask)
if (mask[lane]) {
// flat address calculation goes here.
// addr[lane] = segmented address
panic("Flat group memory operation is unimplemented!\n");
addr[lane] = addr[lane] -
wavefront()->computeUnit->shader->ldsApe().base;
assert(addr[lane] <
wavefront()->computeUnit->getLds().getAddrRange().size());
}
}
wavefront()->execUnitId = wavefront()->flatLmUnitId;

View File

@@ -76,6 +76,11 @@ LocalMemPipeline::exec()
lmReturnedRequests.pop();
w = m->wavefront();
if (m->isFlat() && !m->isMemSync() && !m->isEndOfKernel()
&& m->allLanesZero()) {
computeUnit.getTokenManager()->recvTokens(1);
}
DPRINTF(GPUMem, "CU%d: WF[%d][%d]: Completing local mem instr %s\n",
m->cu_id, m->simdId, m->wfSlotId, m->disassemble());
m->completeAcc(m);

View File

@@ -174,6 +174,9 @@ Process::clone(ThreadContext *otc, ThreadContext *ntc,
#endif
#ifndef CLONE_THREAD
#define CLONE_THREAD 0
#endif
#ifndef CLONE_VFORK
#define CLONE_VFORK 0
#endif
if (CLONE_VM & flags) {
/**
@@ -249,6 +252,10 @@ Process::clone(ThreadContext *otc, ThreadContext *ntc,
np->exitGroup = exitGroup;
}
if (CLONE_VFORK & flags) {
np->vforkContexts.push_back(otc->contextId());
}
np->argv.insert(np->argv.end(), argv.begin(), argv.end());
np->envp.insert(np->envp.end(), envp.begin(), envp.end());
}

View File

@@ -284,6 +284,9 @@ class Process : public SimObject
// Process was forked with SIGCHLD set.
bool *sigchld;
// Contexts to wake up when this thread exits or calls execve
std::vector<ContextID> vforkContexts;
// Track how many system calls are executed
statistics::Scalar numSyscalls;
};

View File

@@ -194,6 +194,16 @@ exitImpl(SyscallDesc *desc, ThreadContext *tc, bool group, int status)
}
}
/**
* If we were a thread created by a clone with vfork set, wake up
* the thread that created us
*/
if (!p->vforkContexts.empty()) {
ThreadContext *vtc = sys->threads[p->vforkContexts.front()];
assert(vtc->status() == ThreadContext::Suspended);
vtc->activate();
}
tc->halt();
/**

View File

@@ -1453,6 +1453,7 @@ cloneFunc(SyscallDesc *desc, ThreadContext *tc, RegVal flags, RegVal newStack,
pp->euid = p->euid();
pp->gid = p->gid();
pp->egid = p->egid();
pp->release = p->release;
/* Find the first free PID that's less than the maximum */
std::set<int> const& pids = p->system->PIDs;
@@ -1521,6 +1522,10 @@ cloneFunc(SyscallDesc *desc, ThreadContext *tc, RegVal flags, RegVal newStack,
ctc->pcState(cpc);
ctc->activate();
if (flags & OS::TGT_CLONE_VFORK) {
tc->suspend();
}
return cp->pid();
}
@@ -1997,6 +2002,16 @@ execveFunc(SyscallDesc *desc, ThreadContext *tc,
}
};
/**
* If we were a thread created by a clone with vfork set, wake up
* the thread that created us
*/
if (!p->vforkContexts.empty()) {
ThreadContext *vtc = p->system->threads[p->vforkContexts.front()];
assert(vtc->status() == ThreadContext::Suspended);
vtc->activate();
}
/**
* Note that ProcessParams is generated by swig and there are no other
* examples of how to create anything but this default constructor. The
@@ -2018,6 +2033,7 @@ execveFunc(SyscallDesc *desc, ThreadContext *tc,
pp->errout.assign("cerr");
pp->cwd.assign(p->tgtCwd);
pp->system = p->system;
pp->release = p->release;
/**
* Prevent process object creation with identical PIDs (which will trip
* a fatal check in Process constructor). The execve call is supposed to
@@ -2028,7 +2044,9 @@ execveFunc(SyscallDesc *desc, ThreadContext *tc,
*/
p->system->PIDs.erase(p->pid());
Process *new_p = pp->create();
delete pp;
// TODO: there is no way to know when the Process SimObject is done with
// the params pointer. Both the params pointer (pp) and the process
// pointer (p) are normally managed in python and are never cleaned up.
/**
* Work through the file descriptor array and close any files marked
@@ -2043,10 +2061,10 @@ execveFunc(SyscallDesc *desc, ThreadContext *tc,
*new_p->sigchld = true;
delete p;
tc->clearArchRegs();
tc->setProcessPtr(new_p);
new_p->assignThreadContext(tc->contextId());
new_p->init();
new_p->initState();
tc->activate();
TheISA::PCState pcState = tc->pcState();

View File

@@ -70,7 +70,7 @@ RUN git clone -b rocm-4.0.0 \
WORKDIR /ROCclr
# The patch allows us to avoid building blit kernels on-the-fly in gem5
RUN wget -q -O - dist.gem5.org/dist/develop/rocm_patches/ROCclr.patch | git apply -v
RUN wget -q -O - dist.gem5.org/dist/v21-1/rocm_patches/ROCclr.patch | git apply -v
WORKDIR /ROCclr/build
RUN cmake -DOPENCL_DIR="/ROCm-OpenCL-Runtime" \