derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Bobby R. Bruce	a9464a41f5	stdlib,resources: Generalize exception for request retry (#466 ) In commit `bbc301f2f0` the generalized `Exception` was changed back to the more specific `HTTPError`. In this case we do not desire specific error handling. If the connection to the database fails I want the exception handled in the way outlined: i.e., i want the connection to be retried 4 times before giving up. With `HTTPError`, only `HTTPError`s warrent a retry. Changing this to `HTTPError` cause tests to fail due to a failure to retry downloading of a resource. Here is an example: https://github.com/gem5/gem5/actions/runs/6521543885/job/17710779784 In this case `request.urlopen` raised a `URLError`. I suspect this was some issued to do with reaching the DNS servers. It likely would've succeeded if it had just tried again.	2023-10-16 09:39:44 -07:00
Bobby R. Bruce	97f4b44dd3	arch-arm: Fix line-length error in misc.cc (#459 )	2023-10-16 08:35:54 -07:00
Giacomo Travaglini	f9cf8bf8a2	cpu, arch-arm: Add IsPseudo tag for gem5 pseudo instructions (#465 ) This only applies to pseudo instructions with their own encoding (m5 ops)... In other words memory mapped m5 operations are not supported. This make sense as they should rather be treated as device accesses... Though it is something to take into consideration when relying on the flag	2023-10-16 16:15:05 +01:00
Bobby R. Bruce	d42eeb6b68	cpu: Explicitly define cache_line_size -> 64-bit unsigned int (#329 ) While it's plausible to define the cache_line_size as a 32-bit unsigned int, the use of cache_line_size is way out of its original scope. cache_line_size has been used to produce an address mask, which masking out the offset bits from an address. For example, [1], [2], [3], and [4]. However, since the cache_line_size is an "unsigned int", the type of the value is not guaranteed to be 64-bit long. Subsequently, the bit twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e., 0x00000000FFFFFFC0. This behavior at least caused a problem in LLSC in RISC-V [5], where the load reservation (LR) relies on the mask to produce the cache block address. Two distinct 64-bit addresses can be mapped to the same cache block using the above mask. This patch explicitly defines cache_line_size as a 64-bit unsigned int so the cache block mask can be produced correctly for 64-bit addresses. [1] `3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)` [2] `3bdcfd6f7a/src/cpu/simple/timing.hh (L224)` [3] `3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)` [4] `3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)` [5] `3bdcfd6f7a/src/arch/riscv/isa.cc (L787)`	2023-10-16 07:50:35 -07:00
Jason Lowe-Power	d702d3b90a	misc: fix clang13 overloaded-virtual warning (#454 ) Like #363 clang is also unhappy about the overloaded virtual. However, clang needs to have the diagnostic in a different place Fixes #437	2023-10-16 07:23:08 -07:00
Giacomo Travaglini	3f925c4084	arch-arm: Mark gem5 pseudo-ops with IsPseudo flag Change-Id: I9c8a146d73596597f28cdeca22ad7b7b01b381a7 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-16 13:42:23 +01:00
Giacomo Travaglini	a3b1bfdbf0	cpu: Add a IsPseudo StaticInstFlag for gem5 pseudo-ops Being able to recognise pseudo ops from the static instruction pointer is actually quite useful in several circumstances Change-Id: Ib39badf9aabba15ab3ebe7a8e9717583412731e4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-16 13:41:04 +01:00
Giacomo Travaglini	2e85c95f4b	arch-arm: Remove Jazelle state + ThumbEE support (#364 ) This PR removes Jazelle state (while still keeping a "Trivial Jazelle implementation", see Arm Architecture Reference Manual) and ThumbEE support	2023-10-16 09:41:44 +01:00
Jason Lowe-Power	20f5555f30	python: Enable -m switch on gem5 binary (#453 ) With -m, you can now run a module from the command line that is embedded in the gem5 binary. This will allow us to put some common "scripts" in the stdlib instead of in the "configs" directory.	2023-10-14 20:08:06 -07:00
Daniel Kouchekinia	4931fb0010	mem-ruby: Always pass on GPU atomics to dir in write-through TCC (#367 ) Added checks to ensure that atomics are not performed in the TCC when it is configured as a write-through cache. Also added SLC bit overwrite to ensure directory preforms atomics when there is a write-through TCC. Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8	2023-10-14 06:39:50 -07:00
Yu-Cheng Chang	a3c51ca38c	arch-riscv: Fix write back register issue of vmask_mv_micro (#443 ) After removing the setRegOperand in VecRegOperand https://github.com/gem5/gem5/pull/341. The vmask_vm_micro will not write back to register because tmp_d0 is not the reference type. The PR will make tmp_d0 as reference of regFile. Change-Id: I2a934ad28045ac63950d4e2ed3eecc4a7d137919	2023-10-13 15:20:42 -07:00
Matthew Poremba	7706e958e5	mem-ruby: Update cache recorder to use RubyPort and remove BUILD_GPU guards (#448 ) This PR updates cache recorder to use a vector of RubyPorts for cache cooldown and warmup instead of Sequencer or GPUCoalescer vectors (refer to issue #403 for more details). It also removes the extra guards that were added in #377 to prevent compile-time failures in non-GPU builds.	2023-10-13 14:36:45 -07:00
Andreas Sandberg	59f96deb0f	cpu: Refactor indirect predictor (#429 )	2023-10-13 11:35:02 +01:00
Giacomo Travaglini	1c45cdcc41	arch-arm: Remove legacy ThumbEE references ThumbEE had already been removed but there were still some references to it dangling around. We were also signaling ThumbEE as being available through HWCAPS in SE which was not correct. This patch is fixing it Change-Id: I8b196f5bd27822cd4dd8b3ab3ad9f12a6f54b047 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-13 09:25:48 +01:00
Giacomo Travaglini	a33f3d3967	arch-arm: Remove Jazelle state support Jazelle state has been officially removed in Armv8. Every AArch32 implementation must still support the "Trivial Jazelle implementation", which means that while the instruction set has been removed, it is still possible for privileged software to access some Jazelle registers like JIDR,JMCR, and JOSCR which are just treated as RAZ Change-Id: Ie403c4f004968eb4cb45fa51067178a550726c87 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-13 09:25:48 +01:00
Vishnu Ramadas	8d54a5cbab	mem-ruby: Remove BUILD_GPU guards from ruby coalescer models A previous commit added BUILD_GPU guards to gpu coalescer models since a related cache recorder commit added GPU support. This is no longer needed since the cache recorder moved to using a vector of RubyPorts instead of Sequencer/GPUCoalescer pointers. This commit removes BUILD_GPU guards from the Ruby coalescer models Change-Id: I23a7957d82524d6cd3483d22edfb35ac51796eca	2023-10-12 14:53:29 -05:00
Vishnu Ramadas	08c1af1b16	mem-ruby: Use RubyPort vector to access Ruby in cache recorder Previously, the cache recorder used a vector of sequencer pointers to access Ruby objects. A recent commit updated the cache recorder to also maintain a vector of GPUCoalescer pointers in order for GPUs to support flushin. This added redundant code to the cache recorder. This commit replaces the sequencer and GPUCoalescer vectors with a vector of RubyPort pointers so that the code does not contain redundant lines Change-Id: Id5da33fb870f17bb9daef816cc43c0bcd70a8706	2023-10-12 14:49:06 -05:00
Matthew Poremba	4d336c0636	arch-vega: Implement buffer_atomic_cmpswap (#439 ) This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-12 07:33:40 -07:00
Matthew Poremba	4b7f25fcb6	arch-vega: Ignore s_setprio instruction instead of panic This instruction is used by ML frameworks to prioritize certain wavefronts. Since gem5 does not have any support for wavefront scheduling based on priority (besides wavefront age), we ignore this instruction and warn_once rather than calling panic. Since hardware can override this priority anyways, we can be sure that ignoring the value will not inhibit forward progress resulting in application hangs. Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc	2023-10-11 15:55:16 -05:00
Matthew Poremba	4b85a1710e	arch-vega: Implement buffer_atomic_cmpswap This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-11 15:42:50 -05:00
Bobby R. Bruce	70b6b53e54	misc,python: Add `pyupgrade` to pre-commit (#424 ) This adds the [pyupgrade](https://github.com/asottile/pyupgrade) hook to pre-commit. This hook automatically upgrades the syntax to the recommended standards for the newer version of the language.	2023-10-11 09:07:09 -07:00
Matthew Poremba	da11427ba6	gpu-compute: Update tokens for flat global/scratch (#408 ) Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-11 09:00:10 -07:00
Andreas Sandberg	891250192d	arch-arm: Implement FEAT_TCR2 and FEAT_SCTLR2 (#416 ) This is simply adding the new Armv8.9 registers defined in the related features: - FEAT_TCR2 - FEAT_SCTLR2	2023-10-11 10:14:31 +01:00
David Schall	f65df9b959	cpu: Refactor indirect predictor Simplify indirect predictor interface. Several of the existing functions where merged together into four clear once. Those four are similar to the main direction predictor interface. 'lookup', 'update', 'squash' and 'commit'. This makes the interface much more clear, allows better functionality isolation and makes it simpler to develop new predictor models. A new parameter is added to allow additional buffer space for speculative path history. Change-Id: I6d6b43965b2986ef959953a64c428e50bc68d38e Signed-off-by: David Schall <david.schall@ed.ac.uk>	2023-10-11 07:50:32 +00:00
Bobby R. Bruce	c4156b06fb	python: Fix `base` logic in `MetaSimObject` This ensures `class Foo` is considered equivalent to `class Foo(object)`. Change-Id: I65a8aec27280a0806308bbc9d32281dfa6a8f84e	2023-10-10 21:47:08 -07:00
Bobby R. Bruce	298119e402	misc,python: Run `pre-commit run --all-files` Applies the `pyupgrade` hook to all files in the repo. Change-Id: I9879c634a65c5fcaa9567c63bc5977ff97d5d3bf	2023-10-10 21:47:07 -07:00
Bobby R. Bruce	3f5d7d647a	misc: Run `pre-commit autoupdate` (#419 ) 1. Runs `pre-commit autoupdate`. 2. Runs `pre-commit run --all-files`. 3. Adds (2.) to ".git-blame-ignore-rev".	2023-10-10 21:41:33 -07:00
Bobby R. Bruce	d559c24ac2	stdlib: Improve handing of errors in Atlas request failures (#404 ) Now: * The Atlas Client will attempt a connection 4 times, using an exponential backoff approach between attempts. * When a failure does arise a rich output is given so problems can be easily diagnosed. Addresses: #340	2023-10-10 21:34:24 -07:00
Harshil Patel	bbc301f2f0	stdlib, tests: Fixed bugs and tests - Fixed bugs rekated to retrying on request faliure. - Updated the pyunit tests. Change-Id: Ia484690267bf27018488324f3408f7e47c59bef3	2023-10-10 15:54:20 -07:00
Bobby R. Bruce	ddf6cb88e4	misc: Run `pre-commit run --all-files` This is reflect the updates made to black when running `pre-commit autoupdate`. Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919	2023-10-10 14:01:58 -07:00
Bobby R. Bruce	0ec1fb167b	stdlib: Fix use internal _hashlib in md5_utils.py (#427 ) Removes the use of the internal _hashlib, which is an internal Python API This is a fix for issue #383	2023-10-10 08:32:45 -07:00
Yu-Cheng Chang	141b06d335	arch,arch-riscv: Remove setRegOperand in VecRegOperand (#341 ) The RISC-V vector instructions still work without setRegOperand. We should fix the register statistic issue by https://github.com/gem5/gem5/pull/360 to avoid duplicate statistic register write count Change-Id: Ib6a52935e00c3e557b366abfcf60450dca05614d	2023-10-10 08:00:10 -07:00
Matthew Poremba	9f4d334644	gpu-compute: Update tokens for flat global/scratch Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-10 09:48:16 -05:00
Matt Sinclair	ec633b3d68	dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377 ) Earlier, GPU checkpointing was working only if a checkpoint was created before the first kernel execution. This pull request adds support to checkpoint in-between any two kernel calls. It does so by doing the following. - Adds flush support in the GPU_VIPER protocol - Adds flush support in the GPUCoalescer - Updates cache recorder to use the GPUCoalescer during simulation cooldown and cache warmup times.	2023-10-10 09:41:21 -05:00
Giacomo Travaglini	8acf49b6fa	arch-arm: Revamp takeInt to take VHE/SEL2 into account The new implementation matches the table in the ARM Architecture Reference Manual (version DDI 0487J.a, section D1.3.6, table R_SXLWJ) It takes into consideration features like FEAT_SEL2 (scr.eel2 bit) and FEAT_VHE (hcr.e2h bit) which affect the masking of interrupts under certain circumstances Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I07ebd8d859651475bd32fd201eea0f4e64a7dd5f	2023-10-10 09:46:47 +01:00
Giacomo Travaglini	e412ddddbd	arch-arm: Split takeInt into AArch64/32 versions We pay a small duplication cost but we make the code more readable and we enable further modifications to the AArch64 code without forcing the same code on the AArch32 method Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f	2023-10-10 09:45:59 +01:00
root	05ebbd2184	stdlib: Fix use internal _hashlib in md5_utils.py Removes the use of the internal _hashlib, which is an internal Python API Change-Id: Id4541a143adb767ca7d942c0fd8a1cf1a08a04ab	2023-10-10 06:18:59 +00:00
Bobby R. Bruce	51c881d0f1	stdlib: Improve handing of errors in Atlas request failures Now: * The Atlas Client will attempt a connection 4 times, using an exponential backoff approach between attempts. * When a failure does arise a rich output is given so problems can be easily diagnosed. Change-Id: I3df332277c33a040c0ed734b9f3e28f38606af44	2023-10-09 16:30:02 -07:00
Bobby R. Bruce	93704a81f1	dev-amdgpu,gpu-compute: Implement GPU and HSA timestamps (#410 ) This PR adds two commit to handle timestamps in the ROCm runtime. ROCr uses a mix of GPU timestamp reads and HSA packet timestamps to output profiling information for a task dispatch. The first patch added timestamps to the HSA completion signal indicating when the task started and ended and require changing the flow of completion signal DMAs to ensure the DMA of the timestamp values completed before writing the completion signal value. Second commit adds MMIOs for reading the GPU's timestamp counter. This MMIO resides in the GFX MMIO space so a new class is added to handle MMIOs in that address range.	2023-10-09 14:11:52 -07:00
Bobby R. Bruce	bbe05b0cba	tests,misc: Fix compilation tests failures (#400 ) Exposed in our failing compiler tests: https://github.com/gem5/gem5/actions/runs/6348223508, this PR: * Adds missing overrides to `PCState`'s `set` function. * Removes `std::binary_function` from DramPower (it was deprecated in CPP-11 and officially removed in CPP-17).	2023-10-09 11:20:52 -07:00
Harshil Patel	452a600c49	New function to kernel_disk_workload to allow new disk device location (#151 ) Added a parameter (_disk_device) to kernel_disk_workload which allows users to change the disk device location. get_disk_device() now chooses between the parameter and, if no parameter was passed, it calls a new function _get_default_disk_device() which is implemented by each board and has a default disk device according to each board, eg /dev/hda in the x86_board. The previous way of setting a disk device still exists as a default, however, with the new function users can now override this default	2023-10-09 10:33:45 -07:00
Harshil Patel	79f40ffdab	stdlib: Del comment stating SE mode limited to single thread (#402 ) This comment was left in the codebase in error. The `set_se_binary_workload` function works fine with multi-threaded applications. This hasn't been a restriction for some time.	2023-10-09 10:30:32 -07:00
Harshil Patel	d8fc0180a5	cpu: Restructure BTB (#412 ) This is the first PR in a series of enhancements to the BPU proposed in #358. However, I think putting everything into one PR is not nice to review and prone to oversee I might did. This PR restructures the BTB: - A new abstract BTB class is created to enable different BTB implementations. The new BTB class gets its own parameter and stats. - An enum is added to differentiate branch instruction types. This enum is used to enhance statistics and BPU management. - The existing BTB is moved into `simple_btb` as default. - An additional function is added to store the static instruction in the BTB. This function is used for the decoupled front-end. - Update configs to match new BTB parameters.	2023-10-09 10:13:00 -07:00
Giacomo Travaglini	eac5a8b215	arch-arm: Implement FEAT_TCR2 Change-Id: I0396f5938c09b68fcc3303a6fdda1e4dde290869 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:19:57 +01:00
Giacomo Travaglini	49cbb24351	arch-arm: Implement FEAT_SCTLR2 Change-Id: Ifb8c8dc1729cc21007842b950273fe38129d9539 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:12:53 +01:00
Giacomo Travaglini	c4c5d2e172	arch-arm: Implement ID_AA64MMFR3_EL1 register Change-Id: If8c37bdccf35a070870900c06dc4640348f0f063 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:12:53 +01:00
Andreas Sandberg	ec7921305b	arch-arm: Implement FEAT_TLBIRANGE extension (#414 )	2023-10-09 17:09:31 +01:00
Jason Lowe-Power	d4be9c76c5	cpu-kvm, arch-x86: flush TLB after syscalls (#411 ) Modified the x86 KVM-in-SE syscall handler to flush the TLB following each syscall, in case the page table has been modified. This is done by reloading the value in %cr3. Doing this requires an intermediate GPR, which we store in a new scratch buffer following the syscall code at address `syscallDataBuf`. GitHub issue: https://github.com/gem5/gem5/issues/409	2023-10-09 08:16:06 -07:00
David Schall	edf9092fee	cpu: Restructure BTB - A new abstract BTB class is created to enable different BTB implementations. The new BTB class gets its own parameter and stats. - An enum is added to differentiate branch instruction types. This enum is used to enhance statistics and BPU management. - The existing BTB is moved into `simple_btb` as default. - An additional function is added to store the static instruction in the BTB. This function is used for the decoupled front-end. - Update configs to match new BTB parameters. Change-Id: I99b29a19a1b57e59ea2b188ed7d62a8b79426529 Signed-off-by: David Schall <david.schall@ed.ac.uk>	2023-10-09 14:37:47 +00:00
Giacomo Travaglini	39fdfaea5a	arch-arm: Implement FEAT_TLBIRANGE Change-Id: I7eb020573420e49a8a54e1fc7a89eb6e2236dacb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 13:59:47 +01:00

1 2 3 4 5 ...

14545 Commits