1. All VMs are deployable from a single Vagrantfile (per host machine).
2. Runners within VMs are now ephemeral. They cease to exist after a job
is complete. After the VM cleans the workspace and creates a new runner.
This will reduce old data, scripts, and images causing space issues on
our VMs
3. No more 'vm_manager.sh' script. The standard `vagrant` command to
manage the VMs will work.
4. Adds Copyright notices where missing.
This patch removed the bespoke "vm_manager.sh" script in favor of a
Multi-Machine Vagrantfile.
With this the users needs to only change the variables in Vagrantfile
then use the standard `vagrant` commands to launch the VMs/Runners.
Change-Id: Ida5d2701319fd844c6a5b6fa7baf0c48b67db975
Prior to this change we were limited to a root partition with only 60GB
of space which caused issues when running larger simulations (see:
https://github.com/gem5/gem5/issues/165).
There are two factors in this issue which this patch resolves:
1. The root partition in the VM was capped at 60GB despite the virtual
machines size being capped at 128GB. This resulted in libvirt giving the
VM free space it couldn't use. To fix this `lvextend` was added to the
"provision_root.sh" script to resize the root partition to fill the
available space.
2. The virtual machine size can be set via the `machine_virtual_size`
parameter. The minimum and default value is 128GB. This wasn't exposed
previously. Now, if we required, we can increase the size of the VM/Root
partition if we require (though I believe 128GB is more than sufficient
for now).
Fixes: https://github.com/gem5/gem5/issues/165
Change-Id: I82dd500d8807ee0164f92d91515729d5fbd598e3
The "action-run.sh" action replaces inline scripting in the Vagrantfile.
The major improvement is this script runs an infinite loop and
configures the runners to be ephemeral. This means they cease to exist
after a job is complete. The script then cleans the VM workspace and the
loop restarts by configuring and setting up another runner. This means
our VMs no longer accumulate files that eventually lead to the VM
running out of space.
Change-Id: Iba6dc9a480f5805042602f120fc84bdc47a96d55
There are two places self-hosted runners can exist on GitHub:
1. At the level of the repository: In this case the runners can only be
used by that repository and runners can only be distinguished from one
another by labels.
2. At the level of the organization: In this case the runners can be
used by any repository in the organization, thus increasing their
versatility. In addition to labels, runners in the level of the
organization can be organized into groups.
While we do not use our self-hosted runners on other repositories, there
may be future use for this, so we might as well enable it now.
Change-Id: Id5e113194314336221dcdc8c2858b352afcbaf6e
Having two types of GitHub Action Runners has not yielded much benefit
and caused confusion and inefficiencies. This change simplifies things
to having just one runner with 8-cores and 16GB of memory. It is
sufficient to build gem5 and run most simulations.
Change-Id: Ic49ae5e98b02086f153f4ae2a4eedd8a535786c8
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done
by reloading the value in %cr3. Doing this requires an intermediate
GPR, which we store in a new scratch buffer following the syscall code
at address `syscallDataBuf`.
GitHub issue: https://github.com/gem5/gem5/issues/409
Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d
gem5::MemState::updateBrkRegion(), which is called during the syscall
emulation of brk, did not unmap deallocated heap pages when the brk
region is receding. Instead, it kept it mapped for simplicity. This
introduced a bug where subequent expansions of the brk region reused
prior heap page mappings that were not zero-filled. This violates
the assumptions of glibc malloc, resulting in heap corruption and
crashes.
This patch fixes the bug by always unmapping pages that are deallocated
during a call to brk() that reduces the heap size. This makes the
gem5::MemState::_endBrkPoint field obsolete, so this patch removes it.
GitHub issue: https://github.com/gem5/gem5/issues/342
Change-Id: Ib2244e1aa4d2a26666ad60d231fdde2c22d2df35
The ROCr runtime uses a combination of HSA signal timestamps and
hardware MMIOs to calculate profiling times. At the beginning of an
application a timestamp is read from the GPU using MMIOs. The clock
MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added
to handle these MMIOs.
The timestamp value is expected to be in nanoseconds, so we simply use
the gem5 tick converted to ns.
Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4
The AMD specific HSA signal contains start/end timestamps for dispatch
packet completion signals. These are current always zero. These
timestamp values are used for profiling in the ROCr runtime.
Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check
for zero values before dividing, causing applications that use profiling
to crash with SIGFPE. Profiling is used via hipEvents in the HACC
application, so these should be supported in gem5.
In order to handle writing the timestamp values, we need to DMA the
values to memory before writing the completion signal. This changes the
flow of the async completion signal write to be (1) read mailbox pointer
(2) if valid, write the mailbox data, other skip to 4 (3) write mailbox
data if pointer is valid (4) write timestamp values (5) write completion
signal. The application will process the timestamp data as soon as the
completion signal is received, so we need to ordering to ensure the DMA
for timestamps was completed.
HACC now runs to completion on GPUFS and has the same output was
hardware.
Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e
Added a new feature to CHI protocol (in collaboration with @tiagormk).
Here is the Jira Ticket
[https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326
). As described in CHI specs, far atomic transactions enable remote
execution of Atomic Memory Operations. This pull request incorporates
several changes:
* Fix Arm ISA definition of Swap instructions. These instructions should
return an operand, so their ISA definition should be Return Operation.
* Enable AMOs in Ruby Mem Test to verify that AMOs work
* Enable near and far AMO in the Cache Controler of CHI
Three configuration parameters have been used to tune this behavior:
* policy_type: sets the atomic policy to one of the described in [our
paper](https://dl.acm.org/doi/10.1145/3579371.3589065)
* atomic_op_latency: simulates the AMO ALU operation latency
* comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
Ruby was recently updated to support flushes and warmup for GPUs. Since
this support uses the GPUCoalescer, non-GPU builds face a compile time
issue. This is because GPU code is not built for non-GPU builds. This
commit addes "#if BUILD_GPU" guards around the GPU-related code in
common files like AbstractController.hh, CacheRecorder.*, RubySystem.cc,
GPUCoalescer.hh, and VIPERCoalescer.hh. This support allows GPU builds
to use flushing while non-GPU builds compile without problems
Change-Id: If8ee4ff881fe154553289e8c00881ee1b6e3f113
Previously, the L1, L2 number of banks and L2 latencies were not
configurable through command line arguments. This commit adds support to
configure them through the arguments '--tcp-num-banks' for number of
banks in L1, '--tcc-num-banks' for number of banks in L2, and
'--tcc-tag-access-latency', and '--tcc-data-access-latency'
Change-Id: Ie3b713ead16865fd7120e2d809ebfa56b69bc4a1
ROCm supports dynamically allocating scratch space, which resides in
framebuffer memory, to reduce the amount of memory allocated for kernels
that have not yet launched. The size of the scratch space allocated is
located in task->amdQueue.compute_tmpring_size_wavesize. This size is in
kilobytes. The AQL task contains the number of bytes requested *per work
item*, however we currently check if there is enough tmpring space by
comparing a single work item. This should instead check the size *per
wavefront*.
This causes problems in applications where multiple kernels use dynamic
scratch allocation and a later kernel requires more space than the
earlier kernel. The only application being tested that does this is
LULESH. This was resulting in the scratch space being too small,
resulting in workgroups clobbering each other's private memory leading
to some nasty bugs. It is fixed by this patch as task->amdQueue will be
re-read from the host and will contain the updated tmpring size. After
this there is enough scratch space and LULESH makes forward progress.
This comment was left in the codebase in error. The
`set_se_binary_workload` function works fine with multi-threaded
applications. This hasn't been a restriction for some time.
Change-Id: I1b1d27c86f8d9284659f62ae27d752bf5325e31b
`std::binary_function` was deprecated in C++11 and officially
removed in CPP-17.
This caused a compilation error on some systems. Fortunately it can be
safely removed. It was unecessary. The commandItemSorter was compliant
witih the `sort` regardless.
Change-Id: I0d910e50c51cce2545dd89f618c99aef0fe8ab79
As highlighed in this failing compiler test:
https://github.com/gem5/gem5/actions/runs/6348223508/job/17389057995
Clang was failing when compiling "build/ALL/gem5.opt" due missing
overrides in `PCState`'s "set" function.
This was observed in Clang-14 and, stangely, Clang-8.
Change-Id: I240c1087e8875fd07630e467e7452c62a5d14d5b
Introduce far atomic operations in CHI protocol.
Three configuration parameters have been used to tune this behavior:
policy_type: sets the atomic policy to one of the described in our paper
atomic_op_latency: simulates the AMO ALU operation latency
comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
Change-Id: I087afad9ad9fcb9df42d72893c9e32ad5a5eb478
Swap instructions are configured as non returning AMO operations. This is wrong because they
return the previous value stored in the target memory position
Change-Id: I84d75a571a8eaeaee0dbfac344f7b34c72b47d53
This introduces the changes necessary for clang-15 and clang-16 to run
within gem5, and adds them to the compiler tests.
This also updates the dockerfiles for ubuntu 22.04 to include the steps
necessary to compile clang-15 and clang-16.
ROCm supports dynamically allocating scratch space, which resides in
framebuffer memory, to reduce the amount of memory allocated for kernels
that have not yet launched. The size of the scratch space allocated is
located in task->amdQueue.compute_tmpring_size_wavesize. This size is in
kilobytes. The AQL task contains the number of bytes requested *per work
item*, however we currently check if there is enough tmpring space by
comparing a single work item. This should instead check the size *per
wavefront*.
This causes problems in applications where multiple kernels use dynamic
scratch allocation and a later kernel requires more space than the
earlier kernel. The only application being tested that does this is
LULESH. This was resulting in the scratch space being too small,
resulting in workgroups clobbering each other's private memory leading
to some nasty bugs. It is fixed by this patch as task->amdQueue will be
re-read from the host and will contain the updated tmpring size. After
this there is enough scratch space and LULESH makes forward progress.
Change-Id: Ie9e0f92bb98fd3c3d6c2da3db9ee65352f9ae070
Add the instruction size of a static instruction. x86 and arm decoders
add now the instruction size to the macro instruction. However, microops
are still handled by the fetch stage which is not nice.
Furthermore, we add a set method to the PC state. It allows setting a PC
state to acertain address.
Both methods are required for the decoupled front-end.
Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
GPUFS+KVM simulations automatically enable AVX. This commit adds a
command line option to disable AVX if its not needed for a GPUFS
simulation.
Change-Id: Ic22592767dbdca86f3718eca9c837a8e29b6b781
Previously, the L1, L2 number of banks and L2 latencies were not
configurable through command line arguments. This commit adds support to
configure them through the arguments '--tcp-num-banks' for number of
banks in L1, '--tcc-num-banks' for number of banks in L2, and
'--tcc-tag-access-latency', and '--tcc-data-access-latency'
Change-Id: Ie3b713ead16865fd7120e2d809ebfa56b69bc4a1
During checkpoint restoration, the unserialize() function writes rptr,
wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields.
This commit updates this to write only the relevant pointers to the
queue structure. If indirect buffers are used, then it writes only the
indirect buffer pointers to the queue. If they are not used, then it
writes rptr, wptr values to the queue.
Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45
Previously, the cache recorder used the Sequencer to issue flush
requests and cache warmup requests. The GPU however uses GPUCoalescer to
access the cache, and not the Sequencer. This commit adds a GPUCoalescer
map to the cache recorder and uses it to send flushes and cache warmup
requests to any GPU caches in the system
Change-Id: I10490cf5e561c8559a98d4eb0550c62eefe769c9
This commit adds flush support to the GPU VIPER coherence protocol. The
L1 cache will now initiate a flush request if the packet it receives
is of type RubyRequestType_FLUSH. During the flush process, the L1 cache
will a request to L2 if its in either V or I state. L2 will issue a
flush request to the directory if its cache line is in the valid
state before invalidating its copy. The directory, on receiving this
request, writes data to memory and sends an ack back to the L2. L2
forwards this ack back to the L1, which then ends the flush by calling
the write callback
Change-Id: I9dfc0c7b71a1e9f6d5e9e6ed4977c1e6a3b5ba46
The GPU Coalescer does not contain cache cooldown and warmup support.
This commit updates the coalsecer to support cache cooldown during flush
and warmup during checkpoint restore.
Change-Id: I5459471dec20ff304fd5954af1079a7486ee860a
GPUFS uses aql information from PM4 queues to initialize doorbells. This
commit adds aql information to the checkpoint so that it can be used
during restoration to correctly initialize all doorbells. Additionally,
this commit also sets the hsa queue correctly during checkpoint-restoration
Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89
I believe the point of this binary was to allow people to use the m5
objects without the entire gem5 binary. However, without adding the
importer call, this did not work. Unfortunately, with the importer call
there is a circular dependence on the original gem5py.cc file.
Therefore, this change creates a new file that has the importer call.
Now, with the `gem5py_m5` binary you can run python code that references
modules in `src/python`. Note that `_m5` is not available, so anything
that depends on the gem5 SimObjects' implementation will not work.
However, this can still be useful for things like getting Resources,
processing stats, etc.
Adds the instruction size to all static instruction. x86, arm
and RISC-V decoders add the instruction size to every decoded
macro instruction. As microops should reflect the size of the
their parent macroop the set method is overwritten to pass the
size to all microops.
Furthermore, we add a set method to the PC state. It allows
setting a PC state to a certain address.
Both methods are required for the decoupled front-end.
Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
Signed-off-by: David Schall <david.schall@ed.ac.uk>
* Removes `+` symbol accidently left in (this broke building).
* Removes `ARG` thus making the Docker exclusively for clang-16.
* Adds "llvm.sh" to the repo. This stops us being dependent on the url
download. The script is under the apache license therefore compatible.
* Merges several `apt install` commands into one.
Change-Id: Iaf411656aac83f67f5395b20efd96ecc1eabb263
The L3 cache did not work due to argument type mismatch in the call to
the constructor `DMAController`. The second argument is expecting a
`RubySystem` type but the code passes in a `cache_line_size` variable.
After I change the second argument to `self.ruby_system` everything
works.