The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID. However, it
may cause wrong instruction flags set for jal because the section
"handle the 'Jalr' instruction" misses the opcode checking. The PR fix
the issue to ensure the IsReturn can be only set in Jalr.
It is possible to execute a GPU atomic instruction using a memory
address that is in the host memory space (e.g, HMM, __managed__,
hipHostMalloc'd address). Since these are in host memory they are passed
to the SystemHub DmaDevice. However, this currently executes as a write
packet without modifying data. This leads to hangs in applications that
use atomics for forward progress (e.g., HeteroSync).
It is not clear where these are handled on a real GPU, but they are
certainly not handled by the software stack nor driver, so they must be
handled in hardware and therefore implemented in gem5. Handling for
atomics in the SystemHub makes the most sense.
To make atomics work a few extra changes need to be made to the
SystemHub. (1) The atomic is implemented as a host memory read, followed
by calling the AtomicOpFunctor, followed by a write. This requires a
second event to handle read response, performing atomic, and issuing a
write. (2) Atomics must be serialized otherwise two atomics might return
the same value which is incorrect. This patch adds serialization logic
for all request types to the same address to handle this. (3) With the
added complexity of the SystemHub, a new debug flag explicitly for
SystemHub is added.
Testing done: The heterosync application with input "sleepMutex 10 16 4"
previously hung before this patch. It passes with the patch applied.
This application tests both (1) and (2) above, as it allocates locks
with hipHostMalloc and has multiple workgroups sending an atomic request
in the same Tick, verifying the serialization mechanism.
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce incorrect
results. Specifically, due to a signedness error, the overflow check for
negative integers being packed always evaluated to true, resulting in
all negative integers being packed as -1 in the output.
This patch fixes the signedness error that causes the bug.
GitHub issue: https://github.com/gem5/gem5/issues/331
It is possible to execute a GPU atomic instruction using a memory
address that is in the host memory space (e.g, HMM, __managed__,
hipHostMalloc'd address). Since these are in host memory they are passed
to the SystemHub DmaDevice. However, this currently executes as a write
packet without modifying data. This leads to hangs in applications that
use atomics for forward progress (e.g., HeteroSync).
It is not clear where these are handled on a real GPU, but they are
certianly not handled by the software stack nor driver, so they must be
handled in hardware and therefore implemented in gem5. Handling for
atomics in the SystemHub makes the most sense.
To make atomics work a few extra changes need to be made to the
SystemHub. (1) The atomic is implemented as a host memory read, followed
by calling the AtomicOpFunctor, followed by a write. This requires a
second event to handle read response, performing atomic, and issuing a
write. (2) Atomics must be serialized otherwise two atomics might return
the same value which is incorrect. This patch adds serialization logic
for all request types to the same address to handle this. (3) With the
added complexity of the SystemHub, a new debug flag explicitly for
SystemHub is added.
Testing done: The heterosync application with input "sleepMutex 10 16 4"
previously hung before this patch. It passes with the patch applied.
This application tests both (1) and (2) above, as it allocates locks
with hipHostMalloc and has multiple workgroups sending an atomic request
in the same Tick, verifying the serialization mechanism.
Change-Id: Ife84b30037d1447dd384340cfeb06fdfd472fff9
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID.
However, it may cause wrong instruction flags set for jal because
the section "handle the 'Jalr' instruction" misses the opcode
checking. The PR fix the issue to ensure the IsReturn can be only
set in Jalr.
Change-Id: I9ad867a389256f9253988552e6567d2b505a6901
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce
incorrect results. Specifically, due to a signedness error, the
overflow check for negative integers being packed always evaluated
to true, resulting in all negative integers being packed as -1 in
the output.
This patch fixes the signedness error that causes the bug.
GitHub issue: https://github.com/gem5/gem5/issues/331
Change-Id: I44b7328a8ce31742a3c0dfaebd747f81751e8851
This splits the CI Tests to one job per sub-directory in "tests/gem5"
via a matrix.
Advantages:
* We can utilize more runners to run the quick tests. This should mean
tests run quicker.
* This approach does not require editing of the workflow as more tests
are added or taken away.
* There is now an output artifact for each directory in "tests/gem5"
instead of one for the entriety of every quick test in "tests".
In addition:
* The artifact retention for the test outputs has been increased to 30 days.
* The output test artifacts have been renamed to be more descriptive of
the job, run, attempt, directory run, and the status.
* The 'tar' step has been removed. GitHub's 'action/artifact' can handle
directories.
Change-Id: I5b3132b424e3769d81d9cd75db2a8c59dbe4a7e5
While there was code present in "serialize.cc" to create the checkpoint
directory, it did not do recursively. This patch ensures all the
directories are created in a path to the checkpoint directory.
Change-Id: Ibcf7f800358fd89946f550b8cfb0cef8b51fceac
This allows for build target information (i.e., the gem5 binary to be
built for the tests) to be returned.
Change-Id: I6638b54cbb1822555f58e74938d36043c11108ba
Now the "tests/main.py" script will accept the `--fixtures` flag when
using the `list` command. This will only list the fixtures needed.
To have this implemented `__str__` for the `Fixture` class has been
implemented.
Change-Id: I4bba26e923c8b0001163726637f2e48c801e92b1
Using just a print was causing this warning to print even with the `-q`
flag was passed. The `-q` flag sets the output to machine readable,
which the warning statement is not.
Change-Id: I139e2565dbc53aaee9027c0e003d34ba800a7ef4
While it makes sense to define the cache_line_size as a 32-bit unsigned int,
the use of cache_line_size is way out of its original scope.
cache_line_size has been used to produce an address mask, which masking out
the offset bits from an address. For example, [1], [2], [3], and [4].
However, since the cache_line_size is an "unsigned int", the type of the
value is not guaranteed to be 64-bit long. Subsequently, the
bit twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask,
i.e., 0x00000000FFFFFFC0.
This behavior at least caused a problem in LLSC in RISC-V [5], where the
load reservation (LR) relies on the mask to produce the cache block address.
Two distinct 64-bit addresses can be mapped to the same cache block using
the above mask.
This patch explicitly defines cache_line_size as a 64-bit unsigned int so
the cache block mask can be produced correctly for 64-bit addresses.
[1] 3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)
[2] 3bdcfd6f7a/src/cpu/simple/timing.hh (L224)
[3] 3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)
[4] 3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)
[5] 3bdcfd6f7a/src/arch/riscv/isa.cc (L787)
Change-Id: I29abc7aaab266a37326846bbf7a82219071c4ffe
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
When there is race between FwdGetX
and PUTX on owner. Owner in this case hands off
ownership to GetX requestor and PUTX still goes
through. But since owner has changed, state should go back to M and PUTX
is essentially trashed.
An Unblock to the Directory in this case will give an undefined
transition. I have added transitions which indicate that when an Unblock
is served to the Directory, it means that some kind of ownership
transfer has happened while a PUTX/PUTO was in progress.
- Implemented a __str__ for AbstractResource
__str__ prints resource category, id and version.
link to resources website is also printed.
Change-Id: Iad5825ff7d8d505ceb236e00dc49bb56055fc8f0
This allows users to obtain resources via the CLI instead of having to
write a python script to do so. It is essentially a nice CLI wrapper for
"gem5.resources.resource.obtain_resource"
## Usage
```sh
> scons build/ALL/gem5.opt -j `nproc`
> ./build/ALL/gem5.opt util/obtain-resource.py --help
usage: obtain-resource.py [-h] [-p PATH] [-q] id
positional arguments:
id The resource id to download.
options:
-h, --help show this help message and exit
-p PATH, --path PATH The path the resource is to be downloaded to. If not specified, the resource will be downloaded to the default
location in the gem5 local cache of resources
-q, --quiet Suppress output.
```
E.g.:
```sh
./build/ALL/gem5.opt util/obtain-resource.py arm-hello64-static -p arm-hello
```
Will download the resource with ID `arm-hello64-static` to `arm-hello`
in the CWD.
This allows for a user to specify the exact path they want a resource to
be downloaded to. This differs from 'resource_direcctory' in that a user
may specify the file/directory name of the resource (using just the
'resource_directory' will have the resource as its ID in that directory.
Change-Id: I887be6216c7607c22e49cf38226a5e4600f39057
Via this workflow we now can build and push our docker images to the
GitHub Docker container registry:
26a1ee4e61/.github/workflows/docker-build.yaml
GitHub does not charge for downloads to runners (hosted or self-hosted).
This can therefore save the project money if we download from GitHub's
Docker reigstry over Google Cloud's.
This is a test to ensure this works as intended.
The long/daily tests in "tests/gem5/gem5_library_tests" were running in
both the "testlib-long-tests" and the
"testlib-long-gem5_library_example_tests" job in the Daily tests
Workflow. The running in "testlib-long-tests" is removed in this PR.
This change, https://github.com/gem5/gem5/pull/205, mistakenly allocates
write buffer for clflush instruction when there's a cache miss. However,
clflush in gem5 is not a write instruction. Thus, the cache should
allocate miss buffer in this case.
Via this workflow we now can build and push our docker images to
the GitHub Docker container registry:
26a1ee4e61/.github/workflows/docker-build.yaml
GitHub does not charge for downloads to runners (hosted or self-hosted).
This can therefore save the project money if we download from GitHub's
Docker reigstry over Google Cloud's.
This is a test to ensure this works as intended.
Change-Id: Iccdb1b7a912f1e0a0d82b7f888694958099315b3
The long/daily tests in "tests/gem5/gem5_library_tests" were running in
both the "testlib-long-tests" and the
"testlib-long-gem5_library_example_tests" job in the Daily tests
Workflow. The running in "testlib-long-tests" is removed in this patch.
Change-Id: I1c665529e3dcb594ffb7f6e2224077ae366772d6
When there is race between FwdGetX
and PUTX on owner. Owner in this case hands off
ownership to GetX requestor and PUTX still goes
through. But since owner has changed, state should
go back to M and PUTX is essentially trashed.
An Unblock to the Directory in this case will give an undefined
transition. I have added transitions which indicate that when
an Unblock is served to the Directory, it means that some kind
of ownership transfer has happened while a PUTX/PUTO was in
progress.
Change-Id: I37439b5a363417096030a0875a51c605bd34c127
https://github.com/gem5/gem5/actions/runs/6114221855 failure was due to
to running the actions inside our 22.04-all-dependencies container. This
container does not contain docker. We must therefore run this action
outside of the container. However, due to our policy of checking out the
code within this container, we must split this into two jobs and use the
artifact upload and download to get the resources we want.
Links to #293
After calling m5_dump_reset_stats(0,0) in a test program, some
statistics like
l1_controllers.L1Dcache.m_demand_hits,
l1_controllers.L1Dcache.m_demand_misses,
l1_controllers.L1Dcache.m_demand_accesses
were not getting reset in the newer stat dumps.
This one line patch fixes that. Changes were tested with calling two
m5_dump_reset_stats(0,0) in a row for a system with 1 core, tested on
both SE and FS.
Credits: @MeatBoy106
The x87 FPU tag word (FTW) was not explicitly initialized in
{X86_64,i386}Process::initState(), resulting in holding an initial value
of zero, resulting in an invalid x87 FPU state. This commit initializes
FTW to 0xFFFF, indicating the FPU is empty at program start during
syscall emulation.
The 16-bit FTW register was also incorrectly masked down to 8-bits in
X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU state
that later caused crashes in the X86KvmCPU. This commit corrects the
bitwidth of the mask to 16.
GitHub issue: https://github.com/gem5/gem5/issues/303
A recent PR [1] moved the TraceCPU away from the BaseCPU hierarchy.
While the common etrace_replayer.py has been amended, I missed these
hybrid TLM + TraceCPU example scripts.
[1]: https://github.com/gem5/gem5/pull/302