derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
atrah22	fab458daa2	util: Update & fix bug in m5stats2streamline.py 1) writeBinary function binary_list can have either string or ints and it needs to be properly converted to bytes 2) packed32(x) function can have x as int or float. Incase of float it needs to be converted to int Change-Id: I6a52aa59e1582dd6bb06b2d1c49ddaf8fe61c997	2023-08-27 19:07:29 -07:00
Matthew Poremba	82ffc16e6e	gpu-compute: Flat scratch implementation and bug fixes (#231 ) Add commits fixing private segment counters, flat scratch address calculation, and implementation of flat scratch instructions. These commits were tested using a modified version of 'square': template <typename T> __global__ void scratch_square(T C_d, T A_d, size_t N) { size_t offset = (blockIdx.x * blockDim.x + threadIdx.x); size_t stride = blockDim.x * gridDim.x ; volatile int foo; // Volatile ensures scratch / unoptimized code for (size_t i=offset; i<N; i+=stride) { foo = A_d[i]; C_d[i] = foo * foo; } }	2023-08-27 07:40:24 -07:00
Matthew Poremba	60f071d09a	gpu-compute,arch-vega: Implement flat scratch insts Flat scratch instructions (aka private) are the 3rd and final segment of flat instructions in gfx9 (Vega) and beyond. These are used for things like spills/fills and thread local storage. This commit enables two forms of flat scratch instructions: (1) flat_load/flat_store instructions where the memory address resolves to private memory and (2) the new scratch_load/scratch_store instructions in Vega. The first are similar to older generation ISAs where the aperture is unknown until address translation. The second are instructions guaranteed to go to private memory. Since these are very similar to flat global instructions there are minimal changes needed: - Ensure a flat instruction is either regular flat, global, XOR scratch - Rename the global op_encoding methods to GlobalScratch to indicate they are for both and are intentionally used. - Flat instructions in segment 1 output scratch_ in the disassembly - Flat instruction executed as private use similar mem helpers as global - Flat scratch cannot be an atomic This was tested using a modified version of the 'square' application: template <typename T> __global__ void scratch_square(T C_d, T A_d, size_t N) { size_t offset = (blockIdx.x * blockDim.x + threadIdx.x); size_t stride = blockDim.x * gridDim.x ; volatile int foo; // Volatile ensures scratch / unoptimized code for (size_t i=offset; i<N; i+=stride) { foo = A_d[i]; C_d[i] = foo * foo; } } Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d	2023-08-26 13:40:12 -05:00
Matthew Poremba	4506188e00	gpu-compute: Fix private offset/size register indexes According to the ABI documentation from LLVM, the low register of flat scratch (maxSGPR - 4) is the offset and the high register (maxSGPR - 3) is size. These are currently backwards, resulting in some gnarly addresses being generated leading to page fault and/or incorrect data. This commit fixes this by setting the order correctly. Change-Id: I0b1d077c49c0ee2a4e59b0f6d85cdb8f17f9be61	2023-08-26 13:40:12 -05:00
Matthew Poremba	e0379f4526	gpu-compute: Fix flat scratch resource counters Flat instructions may access memory locations in LDS (scratchpad) and global (VRAM/framebuffer) and therefore increment both counters when dispatched. Once the aperture is known, we decrement the counters of the aperture that was not used. This is done incorrectly for scratch / private flat instruction. Private memory is global and therefore local memory counters should be decremented. This commit fixes the counters by changing the global decrements to local decrements. Change-Id: I25890446908df72e5469e9dbaba6c984955196cf	2023-08-26 13:40:12 -05:00
Matthew Poremba	a9b32cdb3a	gpu-compute: Use timing DMAs for GPUFS HSA signals (#230 ) The functional HSA signal read was a hack left in the gpu-compute code. In full system, this functional read is causing problems occasionally with the translation not yet being in the page table. The error message output by gem5 was a fatal message on the readBlob method in port proxy. Changing this to a timing DMA fixes this problem. This commit adds the various timing DMA functions to send and receive response and clean up. A helper method "sendCompletionSignal" is added to the GPUCommandProcessor because the indentation level was getting too deep. This change applies only to FS mode. Code for SE mode is equivalent to what it was before this commit. Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48	2023-08-26 11:38:37 -07:00
Matthew Poremba	57b3d2897c	gpu-compute: Use timing DMAs for GPUFS HSA signals The functional HSA signal read was a hack left in the gpu-compute code. In full system, this functional read is causing problems occasionally with the translation not yet being in the page table. The error message output by gem5 was a fatal message on the readBlob method in port proxy. Changing this to a timing DMA fixes this problem. This commit adds the various timing DMA functions to send and receive response and clean up. A helper method "sendCompletionSignal" is added to the GPUCommandProcessor because the indentation level was getting too deep. This change applies only to FS mode. Code for SE mode is equivalent to what it was before this commit. Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48	2023-08-25 13:10:51 -05:00
Bobby R. Bruce	5cb604559a	misc: Move compiler tests to run on 'build' runners (#222 ) This is an experiment. The runners were sometimes running out of memory building gem5. The builders have more memory so should be able to handling this. The runners have 4-cores so compilation should be faster (note the inclusion of the `-j$(nproc)`.	2023-08-25 03:24:03 -07:00
Matthew Poremba	fcbed2bd8a	dev-amdgpu: Tell OS about PCIe atomic support (#224 ) configs,dev-amdgpu: Add PCI express capability info The ROCm stack requires PCI express atomics. Currently the first PCI CapabilityPtr does not point to anything, which signals to the OS (Linux) that this is an early generation PCI device. As PCI express atomics were introduced later, the CapabilityPtr needs to point to at least a PCI express capability structure. This capability is defined as 0x10 in Linux. We additionally set the PCI atomic based bits and implement device specific PCI configuration space reads and writes to the amdgpu device. The second commit, output of simulation when loading the amdgpu driver no longer outputs "PCIE atomics not supported". Further, an application which uses PCIe atomics (PyTorch with a reduce_sum kernel) now makes further progress. First commit is a minor typo fix changing PCI capability struct to union.	2023-08-24 11:19:30 -07:00
Bobby R. Bruce	cf997c93a5	tests, gpu-compute: Updating weekly.sh to use mmapped version of FW (#186 )	2023-08-24 10:16:25 -07:00
Bobby R. Bruce	7aa896fe8f	cpu-minor: Separate the reg_index of VecClassReg and VecElemReg (#225 ) In the RISC-V system, we need to VecClassReg to run RISC-V vector instruction, and VecElemReg is not applicable because the element length of vector can be resizable via vset\vl\ instruction. The change will seperate the reg_index for VecReg and VecElemReg to ensure that have the space for VecReg when VecElemReg is not applicable.	2023-08-24 10:13:21 -07:00
Giacomo Travaglini	56a8ab3f3c	sim: provide a signal constructor with an init_state (#210 ) The current SignalSinkPort and SignalSourcePort have no ways to assign the init value of the state. Add a new constructor for them with the param init_state Bug: 293410800 Test: boot to linux Change-Id: Idde0a12aa0ddd0c9c599ef47059674fb12aa5d68 Reviewed-on: https://soc-sim-external-review.googlesource.com/c/gem5/gem5/+/13159 Gem5-Virtual-Platform-Presubmit-Ready: Johnny Ko <johnnyko@google.com> Reviewed-by: Yu-hsin Wang <yuhsingw@google.com> Perf-Presubmit-Ready: Johnny Ko <johnnyko@google.com> Gem5-Virtual-Platform-Verified: kokoro <noreply+kokoro@google.com> Perf-Verified: kokoro <noreply+kokoro@google.com>	2023-08-24 18:06:21 +01:00
Bobby R. Bruce	e77666d9e8	mem-ruby: fix CHI Evict race condition (#217 ) When an Evict request is received from upstream for a shared line and the line is no longer cached locally (or on any other upstream cache), we need to also send an Evict downstream. In this case we need to wait until our outgoing Evict completes before completing the Evict from upstream in order be able to resolve race conditions with incoming snoops. E.g.: while our outgoing Evict is pending we may receive a snoop requesting data, but we won't be able to complete this snoop if we have already completed all upstream Evicts and we no longer have the line.	2023-08-24 10:04:28 -07:00
Matthew Poremba	9fd846f48d	gpu-compute,arch-vega: Fix ALU-only LDS counters (#223 ) There are a few LDS instructions that perform local ALU operations and writeback which are marked as loads. These are marked as loads because they fit in the pipeline logic better, according to a several year old comment. In the VEGA ISA these instructions (swizzle, permute, bpermute) are not decrementing the LDS load counter. As a result, the counter will gradually increase over time. Since wavefront slots are persistent, this can cause applications with a few thousand kernels to eventually hang thinking there are not enough resources. This changeset fixes this by decrementing the LDS load counter for these instructions. This fix was already integrated in the GCN3 ISA in the exact same way. This changeset moves it near a similar comment about scheduling register file writes. Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c	2023-08-24 07:12:56 -07:00
Matthew Poremba	addba01d29	configs,dev-amdgpu: Add PCI express capability info The ROCm stack requires PCI express atomics. Currently the first PCI CapabilityPtr does not point to anything, which signals to the OS (Linux) that this is an early generation PCI device. As PCI express atomics were introduced later, the CapabilityPtr needs to point to at least a PCI express capability structure. This capability is defined as 0x10 in Linux. We additionally set the PCI atomic based bits and implement device specific PCI configuration space reads and writes to the amdgpu device. With this commit, the output of simulation when loading the amdgpu driver no longer outputs "PCIE atomics not supported". Further, an application which uses PCIe atomics (PyTorch with a reduce_sum kernel) now makes further progress. Change-Id: I5e3866979659a2657f558941106ef65c2f4d9988	2023-08-24 09:10:35 -05:00
Bobby R. Bruce	2d9ad02ae7	ext: Specialize GDBSignal MACRO to gem5 (#209 ) The goal is to fix this issue which appears to be affects some Apple users: https://github.com/gem5/gem5/issues/94. By specializing the `EXC_*` to gem5 we avoid the name conflicts plagiing some users.	2023-08-24 02:44:56 -07:00
Roger Chang	5c28113a06	cpu-minor: Separate the reg_index of VecClassReg and VecElemReg In the RISC-V system, we need to VecClassReg to run RISC-V vector instruction, and VecElemReg is not applicable because the element length of vector can be resizable via vsetvl instruction. The change will seperate the reg_index for VecReg and VecElemReg to ensure that have the space for VecReg when VecElemReg is not applicable. Change-Id: I99a82dec273baeee31df89a0ee0f5e87f3ff187c	2023-08-24 13:27:27 +08:00
Matthew Poremba	8b4c38302f	dev: PCI: Fix PCI express capability union The capabilities for PCI express is a struct, instead of a union, like the other capability unions. A union is used here to provide access to the ordinal data values when reading/writing an offset while simultaneously providing human readable field values that can be set when writing the code. This commit changes it to union which is likely should be. Nothing appears to be using this union yet so it is likely an oversight. Change-Id: I85fe7cc62914525c70fd7a5946d725ed308f8775	2023-08-23 19:32:38 -05:00
Matthew Poremba	90a518e885	gpu-compute,arch-vega: Fix ALU-only LDS counters There are a few LDS instructions that perform local ALU operations and writeback which are marked as loads. These are marked as loads because they fit in the pipeline logic better, according to a several year old comment. In the VEGA ISA these instructions (swizzle, permute, bpermute) are not decrementing the LDS load counter. As a result, the counter will gradually increase over time. Since wavefront slots are persistent, this can cause applications with a few thousand kernels to eventually hang thinking there are not enough resources. This changeset fixes this by decrementing the LDS load counter for these instructions. This fix was already integrated in the GCN3 ISA in the exact same way. This changeset moves it near a similar comment about scheduling register file writes. Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c	2023-08-23 19:30:24 -05:00
Bobby R. Bruce	b2d40edc62	misc: Move compiler tests to run on 'build' runners This is an experiment. The runners were sometimes running out of memory building gem5. The builders have more memory to handle this. The runners have 4-cores so compilation should be faster (note the inclusion of the `-j$(nproc)`. Change-Id: I964c5a778938b449502d92dec3431f8b788397e4	2023-08-23 17:17:28 -07:00
Reiley Jeyapaul	c9ff54677f	mem-ruby: fix CHI Evict race condition When an Evict request is received from upstream for a shared line and the line is no longer cached locally (or on any other upstream cache), we need to also send an Evict downstream. In this case we need to wait until our outgoing Evict completes before completing the Evict from upstream in order be able to resolve race conditions with incoming snoops. E.g.: while our outgoing Evict is pending we may receive a snoop requesting data, but we won't be able to complete this snoop if we have already completed all upstream Evicts and we no longer have the line. Change-Id: I23ac4f0a9c4ddd81e2425376c8d1e1c7fb66d107 Signed-off-by: Tiago Mück <tiago.muck@arm.com>	2023-08-23 15:49:51 -05:00
Johnny	76fe71ebd0	sim: provide a signal constructor with an init_state Add more description to the code Change-Id: Iff8fb20762baa0c9d0b7e5f24fb8769d7e198b5c	2023-08-23 10:49:15 +08:00
Johnny	6acb687975	sim: provide a signal constructor with an init_state 1. The current SignalSinkPort and SignalSourcePort have no ways to assign the init value of the state. Add a new constructor for them with the param init_state 2. After the source and sink are bound, the state at both side should be the same. Set the the state of sink to the state of source in the bind() function. Change-Id: Idde0a12aa0ddd0c9c599ef47059674fb12aa5d68	2023-08-23 10:12:41 +08:00
Bobby R. Bruce	c218104f52	tests: Update asmtest script and add more test binaries (#206 ) Upload the config script to make it only for riscv asmtest and replace Resource with obtain_resourse. Also adds more test binaries.	2023-08-22 13:59:56 -07:00
Jason Lowe-Power	e3414c7098	base: Make 'findLsbSetFallback' constexpr to fix gcc-8 comp (#203 ) Compilation bug found on: https://github.com/gem5/gem5/actions/runs/5899831222/job/16002984553 In gcc Version 8 and below the following error is received: ``` src/base/bitfield.hh: In function ‘constexpr int gem5::findLsbSet(uint64_t)’: src/base/bitfield.hh:365:34: error: call to non-‘constexpr’ function ‘int gem5::{anonymous}::findLsbSetFallback(uint64_t)’ return findLsbSetFallback(val); ~~~~~~~~~~~~~~~~~~^~~~~ scons: *** [build/ALL/kern/linux/events.o] Error 1 ``` `findLsbSet` cannot be `constexr` as it calls non-constexpr function `findLsbSetFallback`. `findLsbSetFallback`. The problematic function is the `count` on the std::bitset. This patch changes this to a constexpr.	2023-08-22 11:23:59 -07:00
Roger Chang	f41172f9e4	tests: Add RV32 test binaries	2023-08-22 16:00:16 +08:00
Roger Chang	61488e1e17	tests: Add more tests for RV64	2023-08-22 16:00:16 +08:00
Roger Chang	fee1c3fc7a	tests: Update asmtest script Upload the config script to make it only for riscv asmtest and replace Resource with obtain_resourse Change-Id: I0bab96ea352b7ce1c6838203bfa13eee795f41f9	2023-08-22 16:00:16 +08:00
Bobby R. Bruce	f9a4a794b7	misc: Add DRAMSys tests to our weekly tests (#198 ) This adds the DRAMSys tests to our weekly-tests.yaml file	2023-08-21 17:31:36 -07:00
Bobby R. Bruce	6f7fc51a18	ext: Specialize GDBSignal MACRO to gem5 The goal is to fix this issue which appears to be affects some Apple users: https://github.com/gem5/gem5/issues/94. By specializing the `EXC_*` to gem5 we avoid the name conflicts plagiing some users. Change-Id: I031f7110b4b4ae82677b6586903cd57b22ca2137	2023-08-21 17:23:09 -07:00
Bobby R. Bruce	709f632730	base: Make 'findLsbSetFallback' constexpr to fix gcc-8 comp Compilation bug found on: https://github.com/gem5/gem5/actions/runs/5899831222/job/16002984553 In gcc Version 8 and below the following error is received: ``` src/base/bitfield.hh: In function ‘constexpr int gem5::findLsbSet(uint64_t)’: src/base/bitfield.hh:365:34: error: call to non-‘constexpr’ function ‘int gem5::{anonymous}::findLsbSetFallback(uint64_t)’ return findLsbSetFallback(val); ~~~~~~~~~~~~~~~~~~^~~~~ scons: *** [build/ALL/kern/linux/events.o] Error 1 ``` `findLsbSet` cannot be `constexr` as it calls non-constexpr function `findLsbSetFallback`. `findLsbSetFallback`. The problematic function is the `count` on the std::bitset. This patch changes this to a constexpr. Change-Id: I48bd15d03e4615148be6c4d926a3c9c2f777dc3c	2023-08-21 14:04:36 -07:00
Melissa Jost	e611cc66b1	misc: ADD DRAMSys tests to our weekly tests This adds the DRAMSys tests to our weekly-tests.yaml file Change-Id: Ieb7903a3a7ffae6359b3de5f66e1dd65eb51fc80	2023-08-21 11:53:08 -07:00
Bobby R. Bruce	63b91b51a2	mem-cache: Allow clflush's uncacheable requests on classic cache (#205 ) When a linux kernel changes a page property, it flushes the related cache lines. The kernel might change the page property before flushing the cache lines. This results in the clflush might occur in an uncacheable region. Currently, an uncacheable request must be a read or a write. However, clflush request is neither of them. This change aims to allow clflush requests to work on uncacheable regions. Since there is no straightforward way to check if a packet is from a clflush instruction, this change permits all Clean Invalidate Requests, which is the type of request produced by clflush, to work on uncacheable regions.	2023-08-21 10:42:10 -07:00
Bobby R. Bruce	f98cd15ec7	arch-riscv,systemc: Update cxx_config_cc.py to use is port.is_source (#196 ) Fix for issue #181. Update the port description generation to use the port.is_source attribute.	2023-08-20 21:00:33 -07:00
Bobby R. Bruce	e5fcc116ec	ext: Update DRAMSys README (#202 ) This fixes: 1. Most importantly: The submodule recursive update was incorrect. This adds the recursive obtaining of submodules as a seperate explicity step. 2. Changes the `git clone` to use https.	2023-08-20 20:13:17 -07:00
Thilo Vörtler	73b6e98f51	arch-riscv,systemc: Fix cxx_config_cc.py to use is is_source Update the cxx_config_cc.oy port description generation to use the port.is_source attribute. Github Issue: https://github.com/gem5/gem5/issues/181 Change-Id: I3fa12c2fbb06083379118e57aedb8be414c0d929	2023-08-20 14:06:37 +00:00
Hoa Nguyen	9e007e5bd7	mem-cache: fix wrong function call Change-Id: I924ede89f373ec21557faf25c96b36f4bc8430dd Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-08-19 22:56:55 +00:00
Hoa Nguyen	f442846d9d	mem-cache: Fix another typo Change-Id: Ib2051f9bda6e6d9002d3be1dbf0b890299098201 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-08-19 22:50:53 +00:00
Hoa Nguyen	7b897a30fa	mem-cache: Fix syntax error Change-Id: I1360879c13d377661e9eeeddf345b785c01efeb6 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-08-19 21:27:53 +00:00
Hoa Nguyen	98daec7d99	mem-cache: Allow clflush's uncacheable requests on classic cache When a linux kernel changes a page property, it flushes the related cache lines. The kernel might change the page property before flushing the cache lines. This results in the clflush might occur in an uncacheable region. Currently, an uncacheable request must be a read or a write. However, clflush request is neither of them. This change aims to allow clflush requests to work on uncacheable regions. Since there is no straightforward way to check if a packet is from a clflush instruction, this change permits all Clean Invalidate Requests, which is the type of request produced by clflush, to work on uncacheable regions. Change-Id: Ib3ec01d9281d3dfe565a0ced773ed912edb32b8f Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-08-19 18:20:16 +00:00
Bobby R. Bruce	16752b7ca2	ext: Update DRAMSys README This fixes: 1. Most importantly: The submodule recursive update was incorrect. This adds the recursive obtaining of submodules as a seperate explicity step. 2. Changes the `git clone` to use https. Change-Id: Iad69e44b927a5aa982b49dffa6929c52fcc7ee72	2023-08-18 15:43:14 -07:00
Bobby R. Bruce	d7d441becb	tests: Add checkpoint tests for all ISAs (#167 ) Added save and restore checkpoint tests for arm-hello, x86-hello, x86-fs, power-hello Added mips and sparc test but mips does not support checkpoint and there is a bug in sparc. Added test file to run the tests.	2023-08-18 15:01:39 -07:00
Bobby R. Bruce	ac88871017	misc: Update matrix runs in scheduled tests (#194 ) This changes continue-on-error to be fail-fast instead, as continue-on-error will mark failed matrix runs as successful, whereas fail-fast makes sure everything in the matrix runs, but gets marked as failed if part of it fails.	2023-08-18 10:56:26 -07:00
Bobby R. Bruce	30ab2c19b1	stdlib: Allow passing of func as Exit Event generator (#195 ) In this case the function is turned into a generator with the "yield" of the generator the return the function's execution. Translation of this stale Gerrit Change: https://gem5-review.googlesource.com/c/public/gem5/+/62872	2023-08-18 10:55:50 -07:00
Harshil Patel	9d86a559ed	tests: removed mips tests and added issue link. - Removed MIPS tests. - Added link to github issue sparc test bug. Change-Id: Ib3c69dca578371ecf0ac2d7694f46f24834a7e5f	2023-08-18 09:51:40 -07:00
Bobby R. Bruce	c0216dbe48	stdlib: Allow passing of func as Exit Event generator In this case the function is turned into a generator with the "yield" of the generator the return the function's execution. Change-Id: I4b06d64c5479638712a11e3c1a2f7bd30f60d188	2023-08-17 16:48:33 -07:00
Jason Lowe-Power	22c52f4fba	Fix reporting traps (faults) to GDB in SE mode (#166 ) This addresses #123	2023-08-17 16:08:49 -07:00
Melissa Jost	fa49de5b98	misc: Update matrix runs in scheduled tests This changes continue-on-error to be fail-fast instead, as continue-on-error will mark failed matrix runs as successful, whereas fail-fast makes sure everything in the matrix runs, but gets marked as failed if part of it fails. Change-Id: Ie20652c229b6cce9f1c0a45958b088391e7aae97	2023-08-17 15:56:02 -07:00
Bobby R. Bruce	fe43e4a3e3	arch-riscv: Check CSR before executing VMem instructions (#187 ) Any instructions require vector register should check if vector is enabled. Any instructions need vtype CSR to execute them should check vill bit beforehead.	2023-08-17 11:20:21 -07:00
Jan Vrany	3564348eec	arch-riscv: Report traps to GDB in SE mode This commit add code to report illegal instruction and breakpoint traps to GDB (if connected). This merely follows what POWER does.	2023-08-17 15:55:04 +01:00

1 2 3 4 5 ...

20438 Commits