derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Giacomo Travaglini	2e85c95f4b	arch-arm: Remove Jazelle state + ThumbEE support (#364 ) This PR removes Jazelle state (while still keeping a "Trivial Jazelle implementation", see Arm Architecture Reference Manual) and ThumbEE support	2023-10-16 09:41:44 +01:00
Jason Lowe-Power	20f5555f30	python: Enable -m switch on gem5 binary (#453 ) With -m, you can now run a module from the command line that is embedded in the gem5 binary. This will allow us to put some common "scripts" in the stdlib instead of in the "configs" directory.	2023-10-14 20:08:06 -07:00
Daniel Kouchekinia	4931fb0010	mem-ruby: Always pass on GPU atomics to dir in write-through TCC (#367 ) Added checks to ensure that atomics are not performed in the TCC when it is configured as a write-through cache. Also added SLC bit overwrite to ensure directory preforms atomics when there is a write-through TCC. Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8	2023-10-14 06:39:50 -07:00
Yu-Cheng Chang	a3c51ca38c	arch-riscv: Fix write back register issue of vmask_mv_micro (#443 ) After removing the setRegOperand in VecRegOperand https://github.com/gem5/gem5/pull/341. The vmask_vm_micro will not write back to register because tmp_d0 is not the reference type. The PR will make tmp_d0 as reference of regFile. Change-Id: I2a934ad28045ac63950d4e2ed3eecc4a7d137919	2023-10-13 15:20:42 -07:00
Matthew Poremba	7706e958e5	mem-ruby: Update cache recorder to use RubyPort and remove BUILD_GPU guards (#448 ) This PR updates cache recorder to use a vector of RubyPorts for cache cooldown and warmup instead of Sequencer or GPUCoalescer vectors (refer to issue #403 for more details). It also removes the extra guards that were added in #377 to prevent compile-time failures in non-GPU builds.	2023-10-13 14:36:45 -07:00
Andreas Sandberg	59f96deb0f	cpu: Refactor indirect predictor (#429 )	2023-10-13 11:35:02 +01:00
Giacomo Travaglini	1c45cdcc41	arch-arm: Remove legacy ThumbEE references ThumbEE had already been removed but there were still some references to it dangling around. We were also signaling ThumbEE as being available through HWCAPS in SE which was not correct. This patch is fixing it Change-Id: I8b196f5bd27822cd4dd8b3ab3ad9f12a6f54b047 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-13 09:25:48 +01:00
Giacomo Travaglini	a33f3d3967	arch-arm: Remove Jazelle state support Jazelle state has been officially removed in Armv8. Every AArch32 implementation must still support the "Trivial Jazelle implementation", which means that while the instruction set has been removed, it is still possible for privileged software to access some Jazelle registers like JIDR,JMCR, and JOSCR which are just treated as RAZ Change-Id: Ie403c4f004968eb4cb45fa51067178a550726c87 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-13 09:25:48 +01:00
Vishnu Ramadas	8d54a5cbab	mem-ruby: Remove BUILD_GPU guards from ruby coalescer models A previous commit added BUILD_GPU guards to gpu coalescer models since a related cache recorder commit added GPU support. This is no longer needed since the cache recorder moved to using a vector of RubyPorts instead of Sequencer/GPUCoalescer pointers. This commit removes BUILD_GPU guards from the Ruby coalescer models Change-Id: I23a7957d82524d6cd3483d22edfb35ac51796eca	2023-10-12 14:53:29 -05:00
Vishnu Ramadas	08c1af1b16	mem-ruby: Use RubyPort vector to access Ruby in cache recorder Previously, the cache recorder used a vector of sequencer pointers to access Ruby objects. A recent commit updated the cache recorder to also maintain a vector of GPUCoalescer pointers in order for GPUs to support flushin. This added redundant code to the cache recorder. This commit replaces the sequencer and GPUCoalescer vectors with a vector of RubyPort pointers so that the code does not contain redundant lines Change-Id: Id5da33fb870f17bb9daef816cc43c0bcd70a8706	2023-10-12 14:49:06 -05:00
Matthew Poremba	4d336c0636	arch-vega: Implement buffer_atomic_cmpswap (#439 ) This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-12 07:33:40 -07:00
Matthew Poremba	4b7f25fcb6	arch-vega: Ignore s_setprio instruction instead of panic This instruction is used by ML frameworks to prioritize certain wavefronts. Since gem5 does not have any support for wavefront scheduling based on priority (besides wavefront age), we ignore this instruction and warn_once rather than calling panic. Since hardware can override this priority anyways, we can be sure that ignoring the value will not inhibit forward progress resulting in application hangs. Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc	2023-10-11 15:55:16 -05:00
Matthew Poremba	4b85a1710e	arch-vega: Implement buffer_atomic_cmpswap This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-11 15:42:50 -05:00
Bobby R. Bruce	70b6b53e54	misc,python: Add `pyupgrade` to pre-commit (#424 ) This adds the [pyupgrade](https://github.com/asottile/pyupgrade) hook to pre-commit. This hook automatically upgrades the syntax to the recommended standards for the newer version of the language.	2023-10-11 09:07:09 -07:00
Matthew Poremba	da11427ba6	gpu-compute: Update tokens for flat global/scratch (#408 ) Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-11 09:00:10 -07:00
Andreas Sandberg	891250192d	arch-arm: Implement FEAT_TCR2 and FEAT_SCTLR2 (#416 ) This is simply adding the new Armv8.9 registers defined in the related features: - FEAT_TCR2 - FEAT_SCTLR2	2023-10-11 10:14:31 +01:00
David Schall	f65df9b959	cpu: Refactor indirect predictor Simplify indirect predictor interface. Several of the existing functions where merged together into four clear once. Those four are similar to the main direction predictor interface. 'lookup', 'update', 'squash' and 'commit'. This makes the interface much more clear, allows better functionality isolation and makes it simpler to develop new predictor models. A new parameter is added to allow additional buffer space for speculative path history. Change-Id: I6d6b43965b2986ef959953a64c428e50bc68d38e Signed-off-by: David Schall <david.schall@ed.ac.uk>	2023-10-11 07:50:32 +00:00
Bobby R. Bruce	c4156b06fb	python: Fix `base` logic in `MetaSimObject` This ensures `class Foo` is considered equivalent to `class Foo(object)`. Change-Id: I65a8aec27280a0806308bbc9d32281dfa6a8f84e	2023-10-10 21:47:08 -07:00
Bobby R. Bruce	298119e402	misc,python: Run `pre-commit run --all-files` Applies the `pyupgrade` hook to all files in the repo. Change-Id: I9879c634a65c5fcaa9567c63bc5977ff97d5d3bf	2023-10-10 21:47:07 -07:00
Bobby R. Bruce	3f5d7d647a	misc: Run `pre-commit autoupdate` (#419 ) 1. Runs `pre-commit autoupdate`. 2. Runs `pre-commit run --all-files`. 3. Adds (2.) to ".git-blame-ignore-rev".	2023-10-10 21:41:33 -07:00
Bobby R. Bruce	d559c24ac2	stdlib: Improve handing of errors in Atlas request failures (#404 ) Now: * The Atlas Client will attempt a connection 4 times, using an exponential backoff approach between attempts. * When a failure does arise a rich output is given so problems can be easily diagnosed. Addresses: #340	2023-10-10 21:34:24 -07:00
Harshil Patel	bbc301f2f0	stdlib, tests: Fixed bugs and tests - Fixed bugs rekated to retrying on request faliure. - Updated the pyunit tests. Change-Id: Ia484690267bf27018488324f3408f7e47c59bef3	2023-10-10 15:54:20 -07:00
Bobby R. Bruce	ddf6cb88e4	misc: Run `pre-commit run --all-files` This is reflect the updates made to black when running `pre-commit autoupdate`. Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919	2023-10-10 14:01:58 -07:00
Bobby R. Bruce	0ec1fb167b	stdlib: Fix use internal _hashlib in md5_utils.py (#427 ) Removes the use of the internal _hashlib, which is an internal Python API This is a fix for issue #383	2023-10-10 08:32:45 -07:00
Yu-Cheng Chang	141b06d335	arch,arch-riscv: Remove setRegOperand in VecRegOperand (#341 ) The RISC-V vector instructions still work without setRegOperand. We should fix the register statistic issue by https://github.com/gem5/gem5/pull/360 to avoid duplicate statistic register write count Change-Id: Ib6a52935e00c3e557b366abfcf60450dca05614d	2023-10-10 08:00:10 -07:00
Matthew Poremba	9f4d334644	gpu-compute: Update tokens for flat global/scratch Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-10 09:48:16 -05:00
Matt Sinclair	ec633b3d68	dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377 ) Earlier, GPU checkpointing was working only if a checkpoint was created before the first kernel execution. This pull request adds support to checkpoint in-between any two kernel calls. It does so by doing the following. - Adds flush support in the GPU_VIPER protocol - Adds flush support in the GPUCoalescer - Updates cache recorder to use the GPUCoalescer during simulation cooldown and cache warmup times.	2023-10-10 09:41:21 -05:00
Giacomo Travaglini	8acf49b6fa	arch-arm: Revamp takeInt to take VHE/SEL2 into account The new implementation matches the table in the ARM Architecture Reference Manual (version DDI 0487J.a, section D1.3.6, table R_SXLWJ) It takes into consideration features like FEAT_SEL2 (scr.eel2 bit) and FEAT_VHE (hcr.e2h bit) which affect the masking of interrupts under certain circumstances Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I07ebd8d859651475bd32fd201eea0f4e64a7dd5f	2023-10-10 09:46:47 +01:00
Giacomo Travaglini	e412ddddbd	arch-arm: Split takeInt into AArch64/32 versions We pay a small duplication cost but we make the code more readable and we enable further modifications to the AArch64 code without forcing the same code on the AArch32 method Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f	2023-10-10 09:45:59 +01:00
root	05ebbd2184	stdlib: Fix use internal _hashlib in md5_utils.py Removes the use of the internal _hashlib, which is an internal Python API Change-Id: Id4541a143adb767ca7d942c0fd8a1cf1a08a04ab	2023-10-10 06:18:59 +00:00
Bobby R. Bruce	51c881d0f1	stdlib: Improve handing of errors in Atlas request failures Now: * The Atlas Client will attempt a connection 4 times, using an exponential backoff approach between attempts. * When a failure does arise a rich output is given so problems can be easily diagnosed. Change-Id: I3df332277c33a040c0ed734b9f3e28f38606af44	2023-10-09 16:30:02 -07:00
Bobby R. Bruce	93704a81f1	dev-amdgpu,gpu-compute: Implement GPU and HSA timestamps (#410 ) This PR adds two commit to handle timestamps in the ROCm runtime. ROCr uses a mix of GPU timestamp reads and HSA packet timestamps to output profiling information for a task dispatch. The first patch added timestamps to the HSA completion signal indicating when the task started and ended and require changing the flow of completion signal DMAs to ensure the DMA of the timestamp values completed before writing the completion signal value. Second commit adds MMIOs for reading the GPU's timestamp counter. This MMIO resides in the GFX MMIO space so a new class is added to handle MMIOs in that address range.	2023-10-09 14:11:52 -07:00
Bobby R. Bruce	bbe05b0cba	tests,misc: Fix compilation tests failures (#400 ) Exposed in our failing compiler tests: https://github.com/gem5/gem5/actions/runs/6348223508, this PR: * Adds missing overrides to `PCState`'s `set` function. * Removes `std::binary_function` from DramPower (it was deprecated in CPP-11 and officially removed in CPP-17).	2023-10-09 11:20:52 -07:00
Harshil Patel	452a600c49	New function to kernel_disk_workload to allow new disk device location (#151 ) Added a parameter (_disk_device) to kernel_disk_workload which allows users to change the disk device location. get_disk_device() now chooses between the parameter and, if no parameter was passed, it calls a new function _get_default_disk_device() which is implemented by each board and has a default disk device according to each board, eg /dev/hda in the x86_board. The previous way of setting a disk device still exists as a default, however, with the new function users can now override this default	2023-10-09 10:33:45 -07:00
Harshil Patel	79f40ffdab	stdlib: Del comment stating SE mode limited to single thread (#402 ) This comment was left in the codebase in error. The `set_se_binary_workload` function works fine with multi-threaded applications. This hasn't been a restriction for some time.	2023-10-09 10:30:32 -07:00
Harshil Patel	d8fc0180a5	cpu: Restructure BTB (#412 ) This is the first PR in a series of enhancements to the BPU proposed in #358. However, I think putting everything into one PR is not nice to review and prone to oversee I might did. This PR restructures the BTB: - A new abstract BTB class is created to enable different BTB implementations. The new BTB class gets its own parameter and stats. - An enum is added to differentiate branch instruction types. This enum is used to enhance statistics and BPU management. - The existing BTB is moved into `simple_btb` as default. - An additional function is added to store the static instruction in the BTB. This function is used for the decoupled front-end. - Update configs to match new BTB parameters.	2023-10-09 10:13:00 -07:00
Giacomo Travaglini	eac5a8b215	arch-arm: Implement FEAT_TCR2 Change-Id: I0396f5938c09b68fcc3303a6fdda1e4dde290869 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:19:57 +01:00
Giacomo Travaglini	49cbb24351	arch-arm: Implement FEAT_SCTLR2 Change-Id: Ifb8c8dc1729cc21007842b950273fe38129d9539 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:12:53 +01:00
Giacomo Travaglini	c4c5d2e172	arch-arm: Implement ID_AA64MMFR3_EL1 register Change-Id: If8c37bdccf35a070870900c06dc4640348f0f063 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:12:53 +01:00
Andreas Sandberg	ec7921305b	arch-arm: Implement FEAT_TLBIRANGE extension (#414 )	2023-10-09 17:09:31 +01:00
Jason Lowe-Power	d4be9c76c5	cpu-kvm, arch-x86: flush TLB after syscalls (#411 ) Modified the x86 KVM-in-SE syscall handler to flush the TLB following each syscall, in case the page table has been modified. This is done by reloading the value in %cr3. Doing this requires an intermediate GPR, which we store in a new scratch buffer following the syscall code at address `syscallDataBuf`. GitHub issue: https://github.com/gem5/gem5/issues/409	2023-10-09 08:16:06 -07:00
David Schall	edf9092fee	cpu: Restructure BTB - A new abstract BTB class is created to enable different BTB implementations. The new BTB class gets its own parameter and stats. - An enum is added to differentiate branch instruction types. This enum is used to enhance statistics and BPU management. - The existing BTB is moved into `simple_btb` as default. - An additional function is added to store the static instruction in the BTB. This function is used for the decoupled front-end. - Update configs to match new BTB parameters. Change-Id: I99b29a19a1b57e59ea2b188ed7d62a8b79426529 Signed-off-by: David Schall <david.schall@ed.ac.uk>	2023-10-09 14:37:47 +00:00
Giacomo Travaglini	39fdfaea5a	arch-arm: Implement FEAT_TLBIRANGE Change-Id: I7eb020573420e49a8a54e1fc7a89eb6e2236dacb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 13:59:47 +01:00
Giacomo Travaglini	6b698630a2	arch-arm: Check VMID in secure mode as well (NS=0) This is still trying to completely remove any artifact which implies virtualization is only supported in non-secure mode (NS=1) Change-Id: I83fed1c33cc745ecdf3c5ad60f4f356f3c58aad5 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 13:56:57 +01:00
Giacomo Travaglini	a8efded644	arch-arm: Include Granule Size in a TLB entry This info can be used during TLB invalidation Change-Id: I81247e40b11745f0207178b52c47845ca1b92870 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 13:56:57 +01:00
Giacomo Travaglini	5cd70bf9bf	sim-se: zero out memory allocated via brk() (#343 ) The syscall emulation of brk() incorrectly did not ensure that newly allocated memory was zero-initialized, which Linux guarantees and which seems to be the expectation of glibc's malloc() and free() implementation. This patch fixes the incorrect behavior by zero- initalizing all memory allocations via brk(). GitHub issue: https://github.com/gem5/gem5/issues/342 Change-Id: I53cf29d6f3f83285c8e813e18c06c2e9a69d7cc2	2023-10-09 13:48:53 +01:00
Nicholas Mosier	7a0e84d853	cpu-kvm, arch-x86: flush TLB after syscalls Modified the x86 KVM-in-SE syscall handler to flush the TLB following each syscall, in case the page table has been modified. This is done by reloading the value in %cr3. Doing this requires an intermediate GPR, which we store in a new scratch buffer following the syscall code at address `syscallDataBuf`. GitHub issue: https://github.com/gem5/gem5/issues/409 Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d	2023-10-06 20:41:59 +00:00
Nicholas Mosier	0dcf0fb829	sim-se: unmap reclaimed heap pages in brk syscall emulation gem5::MemState::updateBrkRegion(), which is called during the syscall emulation of brk, did not unmap deallocated heap pages when the brk region is receding. Instead, it kept it mapped for simplicity. This introduced a bug where subequent expansions of the brk region reused prior heap page mappings that were not zero-filled. This violates the assumptions of glibc malloc, resulting in heap corruption and crashes. This patch fixes the bug by always unmapping pages that are deallocated during a call to brk() that reduces the heap size. This makes the gem5::MemState::_endBrkPoint field obsolete, so this patch removes it. GitHub issue: https://github.com/gem5/gem5/issues/342 Change-Id: Ib2244e1aa4d2a26666ad60d231fdde2c22d2df35	2023-10-06 20:39:57 +00:00
Matthew Poremba	75a7f30dfb	dev-amdgpu: Implement GPU clock MMIOs The ROCr runtime uses a combination of HSA signal timestamps and hardware MMIOs to calculate profiling times. At the beginning of an application a timestamp is read from the GPU using MMIOs. The clock MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added to handle these MMIOs. The timestamp value is expected to be in nanoseconds, so we simply use the gem5 tick converted to ns. Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4	2023-10-06 13:21:40 -05:00
Matthew Poremba	6a4b2bb096	dev-hsa,gpu-compute: Add timestamps to AMD HSA signals The AMD specific HSA signal contains start/end timestamps for dispatch packet completion signals. These are current always zero. These timestamp values are used for profiling in the ROCr runtime. Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check for zero values before dividing, causing applications that use profiling to crash with SIGFPE. Profiling is used via hipEvents in the HACC application, so these should be supported in gem5. In order to handle writing the timestamp values, we need to DMA the values to memory before writing the completion signal. This changes the flow of the async completion signal write to be (1) read mailbox pointer (2) if valid, write the mailbox data, other skip to 4 (3) write mailbox data if pointer is valid (4) write timestamp values (5) write completion signal. The application will process the timestamp data as soon as the completion signal is received, so we need to ordering to ensure the DMA for timestamps was completed. HACC now runs to completion on GPUFS and has the same output was hardware. Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e	2023-10-06 13:21:40 -05:00

1 2 3 4 5 ...

14535 Commits