Commit Graph

14535 Commits

Author SHA1 Message Date
Giacomo Travaglini
2e85c95f4b arch-arm: Remove Jazelle state + ThumbEE support (#364)
This PR removes Jazelle state (while still keeping a "Trivial Jazelle
implementation",
see Arm Architecture Reference Manual) and ThumbEE support
2023-10-16 09:41:44 +01:00
Jason Lowe-Power
20f5555f30 python: Enable -m switch on gem5 binary (#453)
With -m, you can now run a module from the command line that is embedded
in the gem5 binary.
This will allow us to put some common "scripts" in the stdlib instead of
in the "configs" directory.
2023-10-14 20:08:06 -07:00
Daniel Kouchekinia
4931fb0010 mem-ruby: Always pass on GPU atomics to dir in write-through TCC (#367)
Added checks to ensure that atomics are not performed in the TCC when it
is configured as a write-through cache. Also added SLC bit overwrite to
ensure directory preforms atomics when there is a write-through TCC.

Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8
2023-10-14 06:39:50 -07:00
Yu-Cheng Chang
a3c51ca38c arch-riscv: Fix write back register issue of vmask_mv_micro (#443)
After removing the setRegOperand in VecRegOperand
https://github.com/gem5/gem5/pull/341. The vmask_vm_micro will not write
back to register because tmp_d0 is not the reference type. The PR will
make tmp_d0 as reference of regFile.

Change-Id: I2a934ad28045ac63950d4e2ed3eecc4a7d137919
2023-10-13 15:20:42 -07:00
Matthew Poremba
7706e958e5 mem-ruby: Update cache recorder to use RubyPort and remove BUILD_GPU guards (#448)
This PR updates cache recorder to use a vector of RubyPorts for cache
cooldown and warmup instead of Sequencer or GPUCoalescer vectors (refer
to issue #403 for more details). It also removes the extra guards that
were added in #377 to prevent compile-time failures in non-GPU builds.
2023-10-13 14:36:45 -07:00
Andreas Sandberg
59f96deb0f cpu: Refactor indirect predictor (#429) 2023-10-13 11:35:02 +01:00
Giacomo Travaglini
1c45cdcc41 arch-arm: Remove legacy ThumbEE references
ThumbEE had already been removed but there were still some
references to it dangling around. We were also signaling
ThumbEE as being available through HWCAPS in SE which
was not correct. This patch is fixing it

Change-Id: I8b196f5bd27822cd4dd8b3ab3ad9f12a6f54b047
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-13 09:25:48 +01:00
Giacomo Travaglini
a33f3d3967 arch-arm: Remove Jazelle state support
Jazelle state has been officially removed in Armv8.
Every AArch32 implementation must still support the
"Trivial Jazelle implementation", which means that while
the instruction set has been removed, it is still possible
for privileged software to access some Jazelle registers
like JIDR,JMCR, and JOSCR which are just treated as RAZ

Change-Id: Ie403c4f004968eb4cb45fa51067178a550726c87
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-13 09:25:48 +01:00
Vishnu Ramadas
8d54a5cbab mem-ruby: Remove BUILD_GPU guards from ruby coalescer models
A previous commit added BUILD_GPU guards to gpu coalescer models since
a related cache recorder commit added GPU support. This is no longer
needed since the cache recorder moved to using a vector of RubyPorts
instead of Sequencer/GPUCoalescer pointers. This commit removes
BUILD_GPU guards from the Ruby coalescer models

Change-Id: I23a7957d82524d6cd3483d22edfb35ac51796eca
2023-10-12 14:53:29 -05:00
Vishnu Ramadas
08c1af1b16 mem-ruby: Use RubyPort vector to access Ruby in cache recorder
Previously, the cache recorder used a vector of sequencer pointers to
access Ruby objects. A recent commit updated the cache recorder to also
maintain a vector of GPUCoalescer pointers in order for GPUs to support
flushin. This added redundant code to the cache recorder. This commit
replaces the sequencer and GPUCoalescer vectors with a vector of
RubyPort pointers so that the code does not contain redundant lines

Change-Id: Id5da33fb870f17bb9daef816cc43c0bcd70a8706
2023-10-12 14:49:06 -05:00
Matthew Poremba
4d336c0636 arch-vega: Implement buffer_atomic_cmpswap (#439)
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).

This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.

Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
2023-10-12 07:33:40 -07:00
Matthew Poremba
4b7f25fcb6 arch-vega: Ignore s_setprio instruction instead of panic
This instruction is used by ML frameworks to prioritize certain
wavefronts. Since gem5 does not have any support for wavefront
scheduling based on priority (besides wavefront age), we ignore this
instruction and warn_once rather than calling panic. Since hardware can
override this priority anyways, we can be sure that ignoring the value
will not inhibit forward progress resulting in application hangs.

Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc
2023-10-11 15:55:16 -05:00
Matthew Poremba
4b85a1710e arch-vega: Implement buffer_atomic_cmpswap
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).

This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.

Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
2023-10-11 15:42:50 -05:00
Bobby R. Bruce
70b6b53e54 misc,python: Add pyupgrade to pre-commit (#424)
This adds the [pyupgrade](https://github.com/asottile/pyupgrade) hook to
pre-commit.

This hook automatically upgrades the syntax to the recommended standards
for the newer version of the language.
2023-10-11 09:07:09 -07:00
Matthew Poremba
da11427ba6 gpu-compute: Update tokens for flat global/scratch (#408)
Memory instructions acquire coalescer tokens in the schedule stage.
Currently this is only done for buffer and flat instructions, but not
flat global or flat scratch. This change now acquires tokens for flat
global and flat scratch instructions. This provides back-pressure to the
CUs and helps to avoid deadlocks in Ruby.

The change also handles returning tokens for buffer, flat global, and
flat scratch instructions. This was previously only being done for
normal flat instructions leading to deadlocks in some applications when
the tokens were exhausted.

To simplify the logic, added a needsToken() method to GPUDynInst which
return if the instruction is buffer or any flat segment.

The waitcnts were also incorrect for flat global and flat scratch. We
should always decrement vmem and exp count for stores and only normal
flat instructions should decrement lgkm. Currently vmem/exp are not
decremented for flat global and flat scratch which can lead to deadlock.
This change set fixes this by always decrementing vmem/exp and lgkm only
for normal flat instructions.

Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5
2023-10-11 09:00:10 -07:00
Andreas Sandberg
891250192d arch-arm: Implement FEAT_TCR2 and FEAT_SCTLR2 (#416)
This is simply adding the new Armv8.9 registers defined in the related
features:

- FEAT_TCR2
- FEAT_SCTLR2
2023-10-11 10:14:31 +01:00
David Schall
f65df9b959 cpu: Refactor indirect predictor
Simplify indirect predictor interface. Several of the existing
functions where merged together into four clear once. Those
four are similar to the main direction predictor interface.
'lookup', 'update', 'squash' and 'commit'. This makes the
interface much more clear, allows better functionality isolation
and makes it simpler to develop new predictor models.

A new parameter is added to allow additional buffer space for
speculative path history.

Change-Id: I6d6b43965b2986ef959953a64c428e50bc68d38e
Signed-off-by: David Schall <david.schall@ed.ac.uk>
2023-10-11 07:50:32 +00:00
Bobby R. Bruce
c4156b06fb python: Fix base logic in MetaSimObject
This ensures `class Foo` is considered equivalent to `class
Foo(object)`.

Change-Id: I65a8aec27280a0806308bbc9d32281dfa6a8f84e
2023-10-10 21:47:08 -07:00
Bobby R. Bruce
298119e402 misc,python: Run pre-commit run --all-files
Applies the `pyupgrade` hook to all files in the repo.

Change-Id: I9879c634a65c5fcaa9567c63bc5977ff97d5d3bf
2023-10-10 21:47:07 -07:00
Bobby R. Bruce
3f5d7d647a misc: Run pre-commit autoupdate (#419)
1. Runs `pre-commit autoupdate`.
2. Runs `pre-commit run --all-files`.
3. Adds (2.) to ".git-blame-ignore-rev".
2023-10-10 21:41:33 -07:00
Bobby R. Bruce
d559c24ac2 stdlib: Improve handing of errors in Atlas request failures (#404)
Now:

* The Atlas Client will attempt a connection 4 times, using an
exponential backoff approach between attempts.
* When a failure does arise a rich output is given so problems can be
easily diagnosed.

Addresses: #340
2023-10-10 21:34:24 -07:00
Harshil Patel
bbc301f2f0 stdlib, tests: Fixed bugs and tests
- Fixed bugs rekated to retrying on request faliure.
- Updated the pyunit tests.

Change-Id: Ia484690267bf27018488324f3408f7e47c59bef3
2023-10-10 15:54:20 -07:00
Bobby R. Bruce
ddf6cb88e4 misc: Run pre-commit run --all-files
This is reflect the updates made to black when running `pre-commit
autoupdate`.

Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919
2023-10-10 14:01:58 -07:00
Bobby R. Bruce
0ec1fb167b stdlib: Fix use internal _hashlib in md5_utils.py (#427)
Removes the use of the internal _hashlib, which is an internal Python
API
This is a fix for issue #383
2023-10-10 08:32:45 -07:00
Yu-Cheng Chang
141b06d335 arch,arch-riscv: Remove setRegOperand in VecRegOperand (#341)
The RISC-V vector instructions still work without setRegOperand.
We should fix the register statistic issue by
https://github.com/gem5/gem5/pull/360 to avoid duplicate statistic
register write count



Change-Id: Ib6a52935e00c3e557b366abfcf60450dca05614d
2023-10-10 08:00:10 -07:00
Matthew Poremba
9f4d334644 gpu-compute: Update tokens for flat global/scratch
Memory instructions acquire coalescer tokens in the schedule stage.
Currently this is only done for buffer and flat instructions, but not
flat global or flat scratch. This change now acquires tokens for flat
global and flat scratch instructions. This provides back-pressure to the
CUs and helps to avoid deadlocks in Ruby.

The change also handles returning tokens for buffer, flat global, and
flat scratch instructions. This was previously only being done for
normal flat instructions leading to deadlocks in some applications when
the tokens were exhausted.

To simplify the logic, added a needsToken() method to GPUDynInst which
return if the instruction is buffer or any flat segment.

The waitcnts were also incorrect for flat global and flat scratch. We
should always decrement vmem and exp count for stores and only normal
flat instructions should decrement lgkm. Currently vmem/exp are not
decremented for flat global and flat scratch which can lead to deadlock.
This change set fixes this by always decrementing vmem/exp and lgkm only
for normal flat instructions.

Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5
2023-10-10 09:48:16 -05:00
Matt Sinclair
ec633b3d68 dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377)
Earlier, GPU checkpointing was working only if a checkpoint was created
before the first kernel execution. This pull request adds support to
checkpoint in-between any two kernel calls. It does so by doing the
following.

- Adds flush support in the GPU_VIPER protocol
- Adds flush support in the GPUCoalescer
- Updates cache recorder to use the GPUCoalescer during simulation
cooldown and cache warmup times.
2023-10-10 09:41:21 -05:00
Giacomo Travaglini
8acf49b6fa arch-arm: Revamp takeInt to take VHE/SEL2 into account
The new implementation matches the table in the ARM Architecture
Reference Manual (version DDI 0487J.a, section D1.3.6, table R_SXLWJ)

It takes into consideration features like FEAT_SEL2 (scr.eel2 bit) and
FEAT_VHE (hcr.e2h bit) which affect the masking of interrupts under
certain circumstances

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I07ebd8d859651475bd32fd201eea0f4e64a7dd5f
2023-10-10 09:46:47 +01:00
Giacomo Travaglini
e412ddddbd arch-arm: Split takeInt into AArch64/32 versions
We pay a small duplication cost but we make the code
more readable and we enable further modifications to the
AArch64 code without forcing the same code on the AArch32
method

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f
2023-10-10 09:45:59 +01:00
root
05ebbd2184 stdlib: Fix use internal _hashlib in md5_utils.py
Removes the use of the internal _hashlib, which is an
internal Python API

Change-Id: Id4541a143adb767ca7d942c0fd8a1cf1a08a04ab
2023-10-10 06:18:59 +00:00
Bobby R. Bruce
51c881d0f1 stdlib: Improve handing of errors in Atlas request failures
Now:

* The Atlas Client will attempt a connection 4 times, using an
  exponential backoff approach between attempts.
* When a failure does arise a rich output is given so problems can be
  easily diagnosed.

Change-Id: I3df332277c33a040c0ed734b9f3e28f38606af44
2023-10-09 16:30:02 -07:00
Bobby R. Bruce
93704a81f1 dev-amdgpu,gpu-compute: Implement GPU and HSA timestamps (#410)
This PR adds two commit to handle timestamps in the ROCm runtime. ROCr
uses a mix of GPU timestamp reads and HSA packet timestamps to output
profiling information for a task dispatch.

The first patch added timestamps to the HSA completion signal indicating
when the task started and ended and require changing the flow of
completion signal DMAs to ensure the DMA of the timestamp values
completed before writing the completion signal value.

Second commit adds MMIOs for reading the GPU's timestamp counter. This
MMIO resides in the GFX MMIO space so a new class is added to handle
MMIOs in that address range.
2023-10-09 14:11:52 -07:00
Bobby R. Bruce
bbe05b0cba tests,misc: Fix compilation tests failures (#400)
Exposed in our failing compiler tests:
https://github.com/gem5/gem5/actions/runs/6348223508, this PR:

* Adds missing overrides to `PCState`'s `set` function.
* Removes `std::binary_function` from DramPower (it was deprecated in
CPP-11 and officially removed in CPP-17).
2023-10-09 11:20:52 -07:00
Harshil Patel
452a600c49 New function to kernel_disk_workload to allow new disk device location (#151)
Added a parameter (_disk_device) to kernel_disk_workload which allows
users to change the disk device location. get_disk_device() now chooses
between the parameter and, if no parameter was passed, it calls a new
function _get_default_disk_device() which is implemented by each board
and has a default disk device according to each board, eg /dev/hda in
the x86_board. The previous way of setting a disk device still exists as
a default, however, with the new function users can now override this
default
2023-10-09 10:33:45 -07:00
Harshil Patel
79f40ffdab stdlib: Del comment stating SE mode limited to single thread (#402)
This comment was left in the codebase in error. The
`set_se_binary_workload` function works fine with multi-threaded
applications. This hasn't been a restriction for some time.
2023-10-09 10:30:32 -07:00
Harshil Patel
d8fc0180a5 cpu: Restructure BTB (#412)
This is the first PR in a series of enhancements to the BPU proposed in
#358.
However, I think putting everything into one PR is not nice to review
and prone to oversee I might did.

This PR restructures the BTB:
- A new abstract BTB class is created to enable different BTB
implementations. The new BTB class gets its own parameter and stats.
- An enum is added to differentiate branch instruction types. This enum
is used to enhance statistics and BPU management.
- The existing BTB is moved into `simple_btb` as default.
- An additional function is added to store the static instruction in the
BTB. This function is used for the decoupled front-end.
- Update configs to match new BTB parameters.
2023-10-09 10:13:00 -07:00
Giacomo Travaglini
eac5a8b215 arch-arm: Implement FEAT_TCR2
Change-Id: I0396f5938c09b68fcc3303a6fdda1e4dde290869
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 17:19:57 +01:00
Giacomo Travaglini
49cbb24351 arch-arm: Implement FEAT_SCTLR2
Change-Id: Ifb8c8dc1729cc21007842b950273fe38129d9539
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 17:12:53 +01:00
Giacomo Travaglini
c4c5d2e172 arch-arm: Implement ID_AA64MMFR3_EL1 register
Change-Id: If8c37bdccf35a070870900c06dc4640348f0f063
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 17:12:53 +01:00
Andreas Sandberg
ec7921305b arch-arm: Implement FEAT_TLBIRANGE extension (#414) 2023-10-09 17:09:31 +01:00
Jason Lowe-Power
d4be9c76c5 cpu-kvm, arch-x86: flush TLB after syscalls (#411)
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done by
reloading the value in %cr3. Doing this requires an intermediate GPR,
which we store in a new scratch buffer following the syscall code at
address `syscallDataBuf`.

GitHub issue: https://github.com/gem5/gem5/issues/409
2023-10-09 08:16:06 -07:00
David Schall
edf9092fee cpu: Restructure BTB
- A new abstract BTB class is created to enable different BTB
  implementations. The new BTB class gets its own parameter
  and stats.
- An enum is added to differentiate branch instruction types.
  This enum is used to enhance statistics and BPU management.
- The existing BTB is moved into `simple_btb` as default.
- An additional function is added to store the static instruction in
  the BTB. This function is used for the decoupled front-end.
- Update configs to match new BTB parameters.

Change-Id: I99b29a19a1b57e59ea2b188ed7d62a8b79426529
Signed-off-by: David Schall <david.schall@ed.ac.uk>
2023-10-09 14:37:47 +00:00
Giacomo Travaglini
39fdfaea5a arch-arm: Implement FEAT_TLBIRANGE
Change-Id: I7eb020573420e49a8a54e1fc7a89eb6e2236dacb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:59:47 +01:00
Giacomo Travaglini
6b698630a2 arch-arm: Check VMID in secure mode as well (NS=0)
This is still trying to completely remove any artifact
which implies virtualization is only supported in
non-secure mode (NS=1)

Change-Id: I83fed1c33cc745ecdf3c5ad60f4f356f3c58aad5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:56:57 +01:00
Giacomo Travaglini
a8efded644 arch-arm: Include Granule Size in a TLB entry
This info can be used during TLB invalidation

Change-Id: I81247e40b11745f0207178b52c47845ca1b92870
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:56:57 +01:00
Giacomo Travaglini
5cd70bf9bf sim-se: zero out memory allocated via brk() (#343)
The syscall emulation of brk() incorrectly did not ensure that newly
allocated memory was zero-initialized, which Linux guarantees and which
seems to be the expectation of glibc's malloc() and free()
implementation. This patch fixes the incorrect behavior by zero-
initalizing all memory allocations via brk().

GitHub issue: https://github.com/gem5/gem5/issues/342

Change-Id: I53cf29d6f3f83285c8e813e18c06c2e9a69d7cc2
2023-10-09 13:48:53 +01:00
Nicholas Mosier
7a0e84d853 cpu-kvm, arch-x86: flush TLB after syscalls
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done
by reloading the value in %cr3. Doing this requires an intermediate
GPR, which we store in a new scratch buffer following the syscall code
at address `syscallDataBuf`.

GitHub issue: https://github.com/gem5/gem5/issues/409

Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d
2023-10-06 20:41:59 +00:00
Nicholas Mosier
0dcf0fb829 sim-se: unmap reclaimed heap pages in brk syscall emulation
gem5::MemState::updateBrkRegion(), which is called during the syscall
emulation of brk, did not unmap deallocated heap pages when the brk
region is receding. Instead, it kept it mapped for simplicity. This
introduced a bug where subequent expansions of the brk region reused
prior heap page mappings that were not zero-filled. This violates
the assumptions of glibc malloc, resulting in heap corruption and
crashes.

This patch fixes the bug by always unmapping pages that are deallocated
during a call to brk() that reduces the heap size. This makes the
gem5::MemState::_endBrkPoint field obsolete, so this patch removes it.

GitHub issue: https://github.com/gem5/gem5/issues/342

Change-Id: Ib2244e1aa4d2a26666ad60d231fdde2c22d2df35
2023-10-06 20:39:57 +00:00
Matthew Poremba
75a7f30dfb dev-amdgpu: Implement GPU clock MMIOs
The ROCr runtime uses a combination of HSA signal timestamps and
hardware MMIOs to calculate profiling times. At the beginning of an
application a timestamp is read from the GPU using MMIOs. The clock
MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added
to handle these MMIOs.

The timestamp value is expected to be in nanoseconds, so we simply use
the gem5 tick converted to ns.

Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4
2023-10-06 13:21:40 -05:00
Matthew Poremba
6a4b2bb096 dev-hsa,gpu-compute: Add timestamps to AMD HSA signals
The AMD specific HSA signal contains start/end timestamps for dispatch
packet completion signals. These are current always zero. These
timestamp values are used for profiling in the ROCr runtime.
Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check
for zero values before dividing, causing applications that use profiling
to crash with SIGFPE. Profiling is used via hipEvents in the HACC
application, so these should be supported in gem5.

In order to handle writing the timestamp values, we need to DMA the
values to memory before writing the completion signal. This changes the
flow of the async completion signal write to be (1) read mailbox pointer
(2) if valid, write the mailbox data, other skip to 4 (3) write mailbox
data if pointer is valid (4) write timestamp values (5) write completion
signal. The application will process the timestamp data as soon as the
completion signal is received, so we need to ordering to ensure the DMA
for timestamps was completed.

HACC now runs to completion on GPUFS and has the same output was
hardware.

Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e
2023-10-06 13:21:40 -05:00