Since returned data is not needed for AtomicNoReturn and Store memory
requests, the coalescer need not spend time writing in dummy data for
packets of these types.
Change-Id: Ie669e8c2a3bf44b5b0c290f62c49c5d4876a9a6a
I believe the weekly test failures (example:
https://github.com/gem5/gem5/actions/runs/6832805510/job/18592876184)
are due to a container running out of memory when running the very-long
x86 boot tests. I found that the `-t $(nproc)` flag meant, on our
runners, 4 x86 full system gem5 simulations were being pawned. Locally I
found these gem5 x86 boot sims can reach 4GB in size so I suspect they
eventually grew big enough exceed the 16GB memory of the VM.
I have removed `-t $(nproc)` meaning each execution to see if this fixes
the issue (we may want to use `-t 2` later if the Weeklies take too long
running single-threaded).
Recent breaking changes in the DRAMSys API require user code to be
updated. These updates have been applied to the gem5 integration.
Furthermore, as DRAMSys started to use CMake dependency management,
it is no longer sensible to maintain two separate build systems for
DRAMSys. The use of the DRAMSys integration in gem5 will therefore
from now on require that CMake is installed on the target machine.
Additionally, support for snapshots have been implemented into DRAMSys
and coupled with gem5's checkpointing API.
This commit fixes a violation of the TLM2.0 protocol as well as a
bug regarding back-pressure:
- In the BEGIN_REQ phase, transaction objects are required to set
their response status to TLM_INCOMPLETE_RESPONSE. This was not
the case in the packet2payload function that converts gem5 packets
to TLM2.0 generic payloads.
- When the target applies back-pressure to the initiator, an assert
condition was triggered as soon as the response is retried. The
cause of this was an unintentional nullptr-access into a map.
Augmenting Datablock and WriteMask to support optional arg to
distinguish between return and no return. In the case of atomic no
return requests, log entries should not be created when performing the
atomic.
Change-Id: Ic3112834742f4058a7aa155d25ccc4c014b60199a
Added a resource constraint, AtomicALUOperation, to GLC atomics
performed in the TCC.
The resource constraint uses a new class, ALUFreeList array. The class
assumes the following:
- There are a fixed number of atomic ALU pipelines
- While a new cache line can be processed in each pipeline each cycle,
if a cache line is currently going through a pipeline, it can't be
processed again until it's finished
Two configuration parameters have been used to tune this behavior:
- tcc-num-atomic-alus corresponds to the number of atomic ALU pipelines
- atomic-alu-latency corresponds to the latency of atomic ALU pipelines
Change-Id: I25bdde7dafc3877590bb6536efdf57b8c540a939
While we do run compiler tests weekly, 9/10 the issue is a strict check
in clang we did not check before incorporating code into the codebase.
Therefore, running a clang compilation as part of our CI would help us
catch errors quicker.
This reverts gem5#133, the temporary work-around for gem5#131, allowing
both SLC and GLC atomic requests to be made in the GPU tester.
The underlying issues behind gem5#131 have been resolved by gem5#367 and
gem5#397.
The decode_inst_dep_trace.py opens the trace file in read mode, and
subsequently reads the magic number from the trace file. Once the number
is read, it is compared against the string 'gem5' without decoding it
first. This causes the comparison to fail.
The fix addresses this by calling the decode() routine on the output of
the read() call. Please find the details in the associated issue #543
When compiling GCC-9 gem5 the gem5 object files are near double the size
than when compiling with other GCC versions. This increase in size means
we need >16GB of memory available when linking. As we do not want to
mandate >16GB systems for building gem5, we are going to drop GCC-9. The
exact cause of this bug unknown. This is highlighted in Issue #555.
mem-ruby, gpu-compute: fix formatting of TCC
Fix several not properly indented prints and extraneous extra lines in
the SLICC code for the GPU TCC (L2 cache).
mem-ruby, gpu-compute: fix typo in GPU coalescer deadlock print
The GPU Coalescer's deadlock print did not previously print a newline at
the end of each deadlock, which caused confusion when there were
multiple deadlocks as each deadlock print would appear to go with the
address after it. This patch fixes this issue.
gpu-compute: Fix typo with GPUTLB print
Print was not properly ending in a newline, which caused confusion when
looking a trace with GPUTLB enabled. This fixes that.
mem-ruby, gpu-compute: fix GPU SQC/TCP Ruby formatting
Fix several not properly indented prints and extraneous extra lines in
the SLICC code for the GPU SQC (L1I$) and TCP (L1D$).
Symbol type is part of the info provided by an ELF object's symtab.
It indicates whether a symbol is a file symbol, or a function symbol,
etc.
This chain of commits introduces a way to only load function symbols
to the gem5's symbol table. The RISC-V BootloaderKernelWorkload now
loads only function symbols from the bootloader and the kernel binaries
by default.
arch-riscv: Fix implementation of CMO extension instructions
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size
granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
---
arch-riscv: Fix generateDisassembly for Store with 1 source reg
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
---
arch-riscv: Make Zicbom/Zicboz extensions optional in FS mode
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
---
arch-riscv: Add all supporting Z extensions to RISC-V isa string
When compiling GCC-9 gem5 the gem5 object files are near double the size
than when compiling with other GCC versions. This increase in size means
we need >16GB of memory available when linking. As we do not want to
mandate >16GB systems for building gem5, we are going to drop GCC-9. The
exact cause of this bug unknown.
Change-Id: I43744d421b88b79ccb21a76badd6b525e894e973
mem-ruby: update RubyRequest print to include GPU fields
The print function used for RubyRequests did not include the GPU
specific fields (for the GLC and SLC bits, which are cache modifiers
that specify what level of the memory hierarchy a request should be
performed at). This causes confusion when the GPU Ruby SLICC code prints
out RubyRequest messages, since important fields are missing.
Thus this commit adds that support. Since these fields are already part
of the RubyRequest class, and are always 0 for non-GPU requests, it
should not affect other components beyond slightly longer prints.
Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333
Made the following changes to fix the behavior of GLC atomics in a WB
L2:
- Stored atomic write mask in TBE For GLC atomics on an invalid line
that bypass to the directory, but have their atomics performed on the
return path.
- Replaced !presentOrAvail() check for bypassing atomics to directory
(which will then be performed on return path), with check for invalid
line state.
- Replaced wdb_writeDirtyBytes action used when performing atomics with
owm_orWriteMask action that doesn't write from invalid atomic request
data block
- Fixed atomic return path actions
Change-Id: I6a406c313d2f9c88cd75bfe39187ef94ce84098f
This change updates the gem5 SST bridge to call m5.instantiate() in the
gem5 config script instead of in the SST component. This allows more
flexibility for the gem5-SST setup, as we can now write traffic
generators using the bridge.
Previously the GPU L1 I$ (SQC) was not updating the MRU information on
hits in the SQC. This commit resolves that by adding support to the
appropriate Ruby transition.
Previously, opening a config file (such as
`configs/example/hmc_hello.py`) containing non-ASCII characters causes
UnicodeDecodeError.
Also switch to use more an more idiomatic context manager for handling
files.
Change-Id: Ia39cbe2c420e9c94f3a84af459b7e5f4d9718d14
An integer division in the compression:Base:getSize() was being done,
which led to rounding down instead of up.
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
The print function used for RubyRequests did not include the GPU
specific fields (for the GLC and SLC bits, which are cache modifiers
that specify what level of the memory hierarchy a request should be
performed at). This causes confusion when the GPU Ruby SLICC code
prints out RubyRequest messages, since important fields are missing.
Thus this commit adds that support. Since these fields are already
part of the RubyRequest class, and are always 0 for non-GPU requests,
it should not affect other components beyond slightly longer prints.
Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333
SST SimpleMem will be deprecated in SST 14. PR 396 updated the
bridge to use StandardMem, which is the new memory interface in
SST. This change removes all references to SimpleMem.
Change-Id: I6e4d645317d95ebb610e3dfc93a30d53b91b6b5d
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
This change updates the gem5 SST bridge to call m5.instantiate()
in the gem5 config script instead of in the SST component. This
allows more flexibility for the gem5-SST setup, as we can now write
traffic generators using the bridge.
Change-Id: I510a8c15f8fb00bdbdd60dafa2d9f5ad011e48f2
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
GPR allocation is using fields in the AMD kernel code structure which
are not backwards compatible and are not populated in more recent
compiler versions. Use the granulated fields instead which is enfored to
be backwards compatible.
Change-Id: I718716226f5dbeb08369d5365d5e85b029027932
This fixes occasional readBlob fatals caused by the functional read of
system memory, seen often with the KVM CPU.
Change-Id: Ifccee666f62faa5b2fcf0a64a9d77c8cf95b3add
The amdgpu driver can, at *any* time, tell the device to unmap a queue
to force the queue descriptor to be written back to main memory in the
form of a memory queue descriptor (MQD). It will then immediately remap
the queue and continue writing the doorbell to the queue. It is possible
that the doorbell write occurs after the queue is unmapped but before it
is remapped. In this situation, we need to check the updated value of
the doorbell for the queue and write that to the queue after it is
mapped.
To handle this, a pending doorbell packet map is created to hold a
packet to replay when the queue is mapped. Because PCI in gem5
implements only the atomic protocol port, we cannot use the original
packet as it must respond in the same Tick. This patch fixes issues with
the doorbell maps not being cleared on unmapping to ensure the doorbell
is not found in writeDoorbell and places in the pending doorbell map.
This includes fixing the doorbell offset value in the doorbell to VMID
map which was is now multiplied by four as it is a dword address.
This was tested using tensorflow 2.0's MNIST example which was seeing
this issue consistently. With this patch it now makes progress and does
issue pending doorbell writes.
Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e
gem5 does not currently implement any vendor-specific HSA packets.
Starting in ROCm 5.5, vendor packets appear to end with a completion
signal. Not sending this completion causes gem5 to hang. Since these
packets are not documented anywhere and need to be reverse engineered we
send the completion signal, if non-zero, and finish the packet as is the
current behavior.
Testing: HIP examples working on most recent ROCm release (5.7.1).
Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c
Currently, we're enable Zicbom/Zicboz by default. Since those
extensions might be buggy as they are not well-tested, making
those entensions optional allows running simulation where
the performance implication of the instructions do not matter.
Effectively, by turning off the extensions, we simply remove
those extensions from the device tree, so the OS would not
use them. It doesn't prohibit the userspace application to
use those instructions, however.
Change-Id: Ib30e98c4c39f741dec5f7d31bd7b832391686840
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, store instructions are assumed to have two source registers.
However, since we are supporting the RISC-V CMO instructions, which
are Store instructions in gem5 but they only have one source register.
This change allows printing disassembly of Store instructions with
one source register.
Change-Id: I4dd7818c9ac8a89d5e10e77db72248942a25e938
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This change introduces a template for store instruction's mem access.
The new template is called CacheBlockBasedStore.
The reasons for not reusing the current Store's mem access template
are as follows,
- The CMO extension instructions operate on cache block size granularity,
while regular load/store instructions operate on data of size 64 bits or
fewer.
- The writeMemAtomicLE/writeMemTimingLE interfaces do not allow passing
nullptr as data. However, CPUs in gem5 rely on (data == NULL) to detect
CACHE_BLOCK_ZERO instructions. Setting `Mem = 0;` to `uint64_t Mem;`
does not solve the problem as the reference is allocated and thus,
it's always true that `&Mem != NULL`. This change uses the
writeMemAtomic/writeMemTiming interfaces instead.
- Per CMO v1.0.1, the instructions in the spec do not generate
address misaligned faults.
- The CMO extension instructions do not use IMM.
Change-Id: I323615639a4ba882fe40a55ed32c7632e0251421
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Symbol type is part of the info provided by an ELF object's symtab.
It indicates whether a symbol is a file symbol, or a function symbol, etc.
Change-Id: I827e79f8439c47ac9e889734aaf354c653aff530
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, we are hardcoding the ISA string in the device tree
generator. The ISA string from the device tree affects which
ISA extensions will be used by the bootloader/kernel.
This function allows generating the ISA string from the gem5's
ISA object rather than using hardcoded values.
This series of changes also correct a couple of hardcoded
RISC-V ISA strings in the standard library, as well as not
enable RVV instructions for the U74 core model.
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This PR is fixing https://github.com/gem5/gem5/issues/449 by applying
the following changes
1) Setting up alloc_on_atomic=False in the stdlib
This is directly related to the error message reported by the Issue #449
2) Disabling far atomics in stdlib with policy type = 0
There is an invalid transaction error, likely caused by the fact the
current implementation
is expecting a 2 level cache hierarchy whereas the stdlib example only
allocates one
level of caches (L1). This needs further investigation
3) Explicitly clearing the atomic log
Even by disabling far atomics, the execution of atomicPartial was
populating
the atomic log queue without ever clearing it. This caused the OOM
killer in Linux
to detect the leak and to kill it when the physical resources of the
machine no longer
sufficed. IMHO the atomic log interface should be revamped as atomic
users should
be allocating the atomic log only if explicitly needed
Currently, the kernel's symbols are shifted by `kernel_paddr_offset`,
which is where the kernel is located in the physcial address space.
However, the symbols are mapped to virtual addresses, which stay the
same even though the physical address space is shifted.
This patch removes the offset for the kernel's symbols virtual
addresses.
Change-Id: I7c35f925777220f56bd8c69bba14c267d2048ade
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
As pointed out by [1], Arm doesn't seem to respect the cacheability
attribute when mapping uncacheable memory. This is because the request
is not tagged as uncacheable during SE translation With this patch we
are checking for the cacheability attribute before finalizing
translation
[1]: https://github.com/gem5/gem5/issues/509
Change-Id: I42df0e119af61763971d5766ae764a540055781b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is a temporary solution to fix daily tests. We could revert
to the default (policy_type = 1) once the problem is properly
fixed
Change-Id: Ia80af9a7d84d5c777ddeb441110a91a1680c1030
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>