derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Ivana Mitrovic	235f6bd43f	misc: Update .mailmap file (#739 ) The .mailmap file is designed to maintain a record of unique contributors, aiming for a single identifier for each person. What is included in this file does not impact or alter commits; rather, it just merges the counts for all commits by one person under a single name.	2024-01-25 12:00:13 -08:00
Ivana Mitrovic	1c0127ae7c	base: Fix Integer overflow in AddrRange (#786 ) This PR fixes the bug mentioned in #240.	2024-01-25 10:18:29 -08:00
Ivana Mitrovic	24e0d71034	arch-gcn3: Remove gcn3 (#781 ) Related to issue #703 , this PR removes GCN3 related files and updates source code, documentation, and tests to switch over to Vega is that was not done already. Highlights are: - Remove all src/arch/amdgpu/gcn3 files and update Kconfigs. - Remove references to GCN3 and replace with Vega where applicable. - Update the build targets in the gcn-gpu Docker. This will need to be rebuilt but not urgently. - Remove the GCN3 tag in testlib. Most tests seem to be using Vega already, so that commit is small.	2024-01-25 10:14:46 -08:00
QQeg	7a96709b11	arch-riscv: Fix vsadd_vi and vsaddu_vi to match v-spec (#805 ) This commit fixes the implementation of two instructions, vsadd_vi and vsaddu_vi, in the OPIVI category to match the RISC-V vector specification. According to [riscv-v-spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#101-vector-arithmetic-instruction-encoding), the immediate field of these two instructions should be sign extended. > For integer operations, the scalar can be a 5-bit immediate, imm[4:0], encoded in the rs1 field. The value is sign-extended to SEW bits, unless otherwise specified. There is an example in both [vsadd](https://github.com/QQeg/rvv_intrinsic_testcases/tree/master/vsadd_vi) and [vsaddu](https://github.com/QQeg/rvv_intrinsic_testcases/tree/master/vsaddu_vi). Change-Id: Ib877627ba01c0868b2103d41613651df488fca13	2024-01-24 17:21:26 -08:00
Yu-Cheng Chang	6dd936e5b5	arch-riscv: Simply implementation of vector multiply and divide instructions (#793 ) Align the implementation of scalar multiply and divide instructions Change-Id: I53297d4c841c41593baaae0ea140bfbbd874a1d9	2024-01-24 13:20:15 -08:00
Matthew Poremba	44c78d843c	arch-vega: Implement memory aperture operands (#803 ) Vega (gfx900) introduced new memory aperture registers to get the base address and limit for LDS and private (scratch) memory. These have not commonly been used by the compiler until ROCm 6. Now that the compiler is generating reads from these special registers, implement the support for them. Tested with LULESH which is using the SHARED_BASE register (LDS) with ROCm 6.0. This assembly seems to replace S_GETREG_B32 emitted by the ROCm 5 compiler. Change-Id: Id2bd26ce8ef687c84a647fa2ac2da54d657913e5	2024-01-24 11:19:43 -08:00
Matthew Poremba	0ac110ac95	dev-amdgpu: Check privledge bit for SDMA RLC queues (#792 ) By default all SDMA queues are privileged queues, meaning the addresses in SDMA packets use the privileged translation tables. RLC queues (sometimes called user queues) are not necessarily privileged and might use user translation tables. RLC queues are used more often in ROCm 6.0 exposing an issue with invalid translations with RLC queues. This changeset checks the priv bit in the SDMA MQD when an RLC queue is mapped. Each packet type which uses an address then checks the bit before performing translation. Tested with daily/weekly tests with a ROCm 6.0 disk image and tests are passing. Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0	2024-01-24 07:25:43 -08:00
Matthew Poremba	dfafc5792a	arch-vega: Remove deleted instruction.cc from build (#801 ) Change-Id: I03073d35a0d36788dfe8309e6ed466d0a496e31e	2024-01-23 18:47:01 -08:00
Harshil Patel	78613e2307	base: Add a check for edge case - Now check for the condition where the bigger address range wraps but smaller does not. Change-Id: Icc7a549afaf82a277dc2845255aa1702a1d662e0	2024-01-23 11:35:54 -08:00
Harshil Patel	fea4106414	util: updated resource manager dependencies (#737 ) Change-Id: Ia07eed6c2f2e55f1a2cb8da30e75f0b3a2fb3bc3 Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>	2024-01-23 11:09:15 -08:00
Matthew Poremba	4fe6489038	arch-vega: Reorganize inst and misc files (#789 ) This PR reorganizes the instructions.cc into multiple files and renames some files which do not match their corresponding header file names. The intention is to make iterating on development of these files faster.	2024-01-23 10:06:40 -08:00
Harshil Patel	7372097376	base: fix Integer overflow in AddrRange bug An issue raised in #240 where if an address range ends at the last byte of a 64 bit address space, it will be considered a subset of any other address range that starts at the first byte of the range. Change-Id: I517f4717052eda2504de971be0eb59ee9a623dd3	2024-01-22 15:43:11 -08:00
Ivana Mitrovic	f2916e1b2b	misc: Merge Weekly GPU tests into Weekly Tests (#647 ) This separation was only for convenience while GPU tests were under development and rapidly changing. This test merges the GPU tests into the weekly tests where they belong.	2024-01-22 10:53:28 -08:00
Matthew Poremba	a5757e7e01	arch-vega: Rename mismatched source/header files The files registers.cc, isa.cc, and decoder.cc do not match the header name. This is a minor cleanup to make development more straightforward. Change-Id: Ibab18dfe315b0ce84359939b490f8227ea43cac0	2024-01-19 13:32:24 -06:00
Matthew Poremba	cd91c6321f	arch-vega: Reorganize instructions to multiple files The Vega instructions.cc file is 47k lines long which results in both large compilation times whenever it is modified and long style check times. This makes iterating over more complex instruction implementations very time consuming. This commit moves the instruction definitions to multiple files based on the instruction encoding (SOP2, VOP2, FLAT, DS, etc.). The resulting files are much smaller (max is 8k lines) and compilation and style check times are much more reasonable. Other than moving code around, there are no functional changes in this commit. Change-Id: Id4ac8e98ef11a58de5fd328f8a0cd7ce60a11819	2024-01-19 13:32:24 -06:00
Jason Lowe-Power	a555449c12	arch-arm: Fix compile error in kvm (#784 ) The addition of std::optional in #732 caused a compile error. This change fixes the error by checking to see if the value is present and panicing otherwise. Change-Id: I46c3fb76eb0e14ba7bede7c336293fbe9add8c84 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-01-19 07:59:59 -08:00
Bobby R. Bruce	5f767d7836	misc: Fixing comment indentation in weekly-tests.yaml Change-Id: I047ef921703e635b37bacb54cd5b091c2a41b1d3	2024-01-18 15:55:25 -08:00
Yu-Cheng Chang	f56459470a	arch-riscv: Refactor the RISC-V multiplication utility (#780 ) 1. Add the new double width for int64_t and uint64_t 2. Use the wider type to get the upper result of multiplication Change-Id: Id6cfa6f274c65592b2b3e2b70c00f82954b41f1a	2024-01-18 12:40:11 -08:00
Matthew Poremba	9b89149142	tests,ext: Remove GCN3 tags, update tests to Vega Change-Id: I782b6e61cd43b51cfbe80161d4dc1cee125f7f64	2024-01-17 11:13:50 -06:00
Matthew Poremba	0f45ae424c	util: Remove GCN3 references and target from gcn-gpu docker Change-Id: I622470588a7e02088a1b9bb3dcfaa677e835e87c	2024-01-17 11:12:36 -06:00
Matthew Poremba	63caa780c2	misc: Remove all references to GCN3 Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename FIJI to Vega and Carrizo to Raven. Using misc since there is not enough room to fit all the tags. Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891	2024-01-17 11:11:06 -06:00
QQeg	511729ab76	arch-riscv: Fix issue when vl=0 in VectorIntMaskMacroConstructor (#715 ) I’ve been working on a fix for the issue #759 where ‘vd’ incorrectly stores all zeros when ‘vl’ is set to 0 in VectorIntMaskMacroConstructor. My solution seems to work, but it behaves differently from other macros when ‘vl’ = 0. Instead of pushing a ‘nop’ to ‘microops’, it pushes a micro operation that remains ineffective due to ‘vl’ being 0.	2024-01-17 08:45:08 -08:00
Matthew Poremba	57fb083f43	arch-gcn3: Remove all GCN3 files Change-Id: Ib7d9e8676a31e51a330e68d81099580e2509a90a	2024-01-17 10:44:44 -06:00
Nitish Arya	c2a22b03b4	mem-ruby: fix ruby startup() to reset exit event correctly (#773 ) When restoring the simulate_limit_event pointer is not restored after running the dry simulation run which ends up in "Panic: event not found!" In this commit we fix this issue by correctly restoring the pointer value along with the event queue head Change-Id: Id5ad4d2a270a6cd34eec1dc5c9b170b2b84610d4 --------- Co-authored-by: narya <nitish.arya@bsc.es> Co-authored-by: Jason Lowe-Power <jason@lowepower.com>	2024-01-17 08:41:10 -08:00
Matthew Poremba	70376d43a3	arch-vega: Fix upsize cast error in newer compilers (#774 ) Newer compilers error on -Warray-length in the recent MI200 patches due to casting from a 32-bit data type to a 64-bit type. Change it to cast the 32-bit integer first then 64-bit integer latter to remove the warning. Rerun of validation tests on the three instructions passed. Change-Id: I0309e5f7b5b8cc8ce1651660ddddb120fa6e7666	2024-01-16 09:41:23 -08:00
Matthew Poremba	6a9e80c54c	gpu-compute: Support for MI200 GPU model (#733 )	2024-01-15 08:18:34 -08:00
Hoa Nguyen	85eb99388a	arch-riscv: Remove the check of bit 63 of the physical address (#756 ) Currently, the TLB enforces that the bit 63 of a physical address to be zero. This check stems from the riscv-tests that checks for the bit 63 of a physical address [1]. This is due to the fact that the ISA implicitly says that the physical address must be zero-extended on the most significant bits that are not translated [2]. More details on this issue is here [3]. The check for bit 63 of a physical address in the TLB is rather too specific, and I believe the check of invalid physical address is alread implemented in PMA. Thus, this change proposes to remove this check from RISC-V TLB. [1] `bd0a19c136/isa/rv64mi/access.S (L18)` [2] https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/8kO7X0y4ubo [3] https://github.com/gem5/gem5/issues/238 Change-Id: I247e4d4c75c1ef49a16882c431095f6e83f30383 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-01-12 15:17:49 -08:00
Arteen Abrishami	e5bdc760e3	mem-ruby: allow comparison of int and Addr in SLICC (#701 ) allow easy isolation of specific addresses in coherence protocols. useful for debugging. Change-Id: I93e07956b8e29837219d328dacfbd5c6067c1a62	2024-01-12 10:02:29 -08:00
Giacomo Travaglini	7487c13181	configs: Add o3 --cpu choice to the starter_se.py script (#764 ) This is matching what we are already doing in the starter_fs.py script Change-Id: I50239050be9bd151a607ec892f8dd9322b24040b Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-01-12 07:47:51 -08:00
Yu-Cheng Chang	2f24ee570e	arch-riscv: Move PMAChecker and PMP to RiscvISA namespace (#691 ) The PMAChecker and PMP are only used in the RisvISA and it should be in the RiscvISA to simply the implementation Change-Id: I4968e2de4c028cb2dceed977f2173fc8b1efd175	2024-01-10 16:58:13 -08:00
Yu-Cheng Chang	74dd0bb9bb	fastmodel: Fix the Fastmodel RemoteGDB initial (#735 ) Change-Id: Iec9ef145ccac353b8a41f501dd76bf53288dd478	2024-01-10 16:55:54 -08:00
Matt Sinclair	ab9e61ea03	gpu-compute: WAX dependency detection (#731 ) WAX Dependencies would be missed if a RAW Dependency also existed.	2024-01-05 12:57:24 -06:00
Matt Sinclair	dc85d1492c	gpu-compute: Added register file cache support (#730 ) The RFC is defaulted to a size of 0 which removes it completely. To use the RFC set the --register-file-cache-size to a non-zero multiple of two. In addition, rfc_pipe_length may be altered to increase or decrease RFC latency benefit.	2024-01-05 12:57:06 -06:00
KaiBatley	359ac63280	gpu-compute: Added register file cache support The RFC is defaulted to a size of 0 which removes it completely. To use the RFC set the --register-file-cache-size to a non-zero multiple of two. In addition, rfc_pipe_length may be altrered to increase or decrease RFC latency benefit. Change-Id: I6f5bf5b750eb64155fbc8c8343e9feadce5c9f79	2024-01-04 22:43:05 -06:00
Tiago Mück	b652ab8558	mem-ruby: fix missing txnId for prefetch requests (#734 ) Internal prefetch message generation at AllocateTBE_PfRequest was missing the expected txnId value. Change-Id: I7d1ead24db947a15133f6ec45b27a47c70096682 Signed-off-by: Tiago Mück <tiago.muck@arm.com>	2024-01-04 07:55:11 -08:00
Giacomo Travaglini	5e2e748f3a	arch-arm: Handle invalid case for encodeAArch64SysReg (#732 ) This patch is amending encodeAArch64SysReg so that it covers the case where there are no arch numbers available for the misc index passed as an argument. This could happen if the register ID is a gem5 pseudo register which is not associated with any architected op1/op2/crn/crm tuple. Rather than panicking we return a nullopt. Change-Id: I7ab70467105ef93c0c78ac4e999c7dc8e5e09925 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-01-04 10:04:40 +00:00
KaiBatley	55fce58c19	gpu-compute: WAX dependency detection WAX Dependencies would be missed if a RAW Dependency also existed. Change-Id: I2a9e50b9d0540a30de9c1bf6bb544c7b9654cb29	2024-01-03 22:02:02 -06:00
Matthew Poremba	31e63b01ad	arch-vega: Add vop3p DOT instructions Implemented according to the ISA spec. Validated with silion. In particular the sign extend is important for the signed variants and the unsigned variants seem to overflow lanes (hence why there is no mask() in the unsigned varints. FP16 -> FP32 continues using ARM's fplib. Tested vs. an MI210. Clamp has not been verified. Change-Id: Ifc09aecbc1ef2c92a5524a43ca529983018a6d59	2024-01-03 15:41:06 -06:00
Matthew Poremba	a40f8f0efa	configs: Add MI200 script This is the MI200 equivalent of configs/example/gpufs/vega10.py. Change-Id: Ib9761caa4326abe6b90099e6a77111b2acce0f76	2024-01-03 15:41:06 -06:00
Matthew Poremba	420cda1bef	arch-vega: Implement FP32 packed math Starting with MI200, packed math can operate on double dword inputs. In this case, 64-bits of inputs (two VGPRs per lane) contain two FP32 values. Add instructions to perform add, multiply, and FMA on packed FP32 types. Change-Id: Ib838bff91a10e02e013cc7c33ec3d91ff08647b0	2024-01-03 15:41:06 -06:00
Matthew Poremba	7b0c47d52f	arch-vega: Implement all global atomics up to gfx90a This change adds all of the missing flat/global atomics up to including the new atomics in gfx90a (MI200). Adds all decodings and instruction implementations with the exception of __half2 which does not have a corresponding data type in gem5. This refactors the execute() and completeAcc() methods by creating helper functions similar to what initiateAcc() uses. This reduces redundant code for global atomic instruction implementations. Validated all except PK_ADD_F16, ADD_F32, and ADD_F64 which will be done shortly. Verified the source/dest register sizes in the header are correct and the template parameters for the new execute()/completeAcc() methods are correct. Change-Id: I4b3351229af401a1a4cbfb97166801aac67b74e4	2024-01-03 15:41:06 -06:00
Matthew Poremba	472c697d88	arch-vega: Implement v_mfma_i32_16x16x16i8 Tested using AMD labs notes examples located on github: https://github.com/amd/amd-lab-notes/blob/release/matrix-cores/ src/mfma_i32_16x16x16i8.cpp Change-Id: Ib0e50162288528012b6d3395e1f629ebf12e8e54	2024-01-03 15:41:06 -06:00
Matthew Poremba	cc75281802	gpu-compute: Update code object to latest LLVM The AMDKernelCode struct is very outdated. Most of the fields are no longer used and have been replaced with new fields that are used. Therefore in order to support the new fields the code object needs to be updated. The new structure is based on the table located at https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-kernel-descriptor Most notably this adds the new compute_pgm_rsrc3 and kernarg preload fields which are new features in gfx90a (MI200). The accum_offset in compute_pgm_rsrc3 and kergarg preload values are necessary to run application which enable those features and therefore a way to check their values is needed. Also noteable is the removal of enable_sgpr_workgroup_id_{X,Y,Z}. These seem to be unused in all versions of ROCm that gem5 supports and therefore these fields can be removed. They are replaced with a reserved field in the new code object. Change-Id: I5542442e1e5961b05e17affad0adb5186d6d9d1a	2024-01-03 15:41:06 -06:00
Matthew Poremba	7e1b27969f	arch-vega: Improve FLAT disassembly Use the opSelectorToRegSym which will print the full range of VGPRs (e.g., will now print v[2:3] instead of v2 when the source / dest is 64-bits). Fixes atomic disassembly prints. Now shows "glc" if GLC bit is enabled. Fixes some VGPR fields being printed as an SGPR in places where the 9-bit register index bit is implied (e.g., VDST). This makes it easier to use a GPUExec trace to match with LLVM disassembly when debugging. Change-Id: Ia163774850f0054243907aca8fc8d0361e37fdd5	2024-01-03 10:40:34 -06:00
Matthew Poremba	bc69ab0a1f	arch-vega: Add VOP3P encodings and packed 16b insts This adds the VOP3P and VOP3P_MAI encodings from the MI200 spec. These instructions are used for packed math and miSIMD instructions. The first 19 VOP3P opcodes are implemented and validated against hardware. This includes all instructions which operate on one dword containing two packed 16-bit values of fp16, int16_t, or uint16_t. Implement one MFMA instruction for now which was also validated against hardware.	2024-01-03 10:40:34 -06:00
Matthew Poremba	4903fe2db1	arch-arm: Allow fplib to be used outside of ARM build This is useful in other ISAs to implement FP16 computation. For example, it can be used in the GPU model. The ARM specific misc register is ignored in that case. Change-Id: I339ac0ccd9be4371b0f220ad99068e5e12b3d263	2024-01-03 10:40:34 -06:00
Matthew Poremba	8c016ebbbc	gpu-compute: Implement packed workitem ABI init This initialization method is used in gfx90a (MI200). Rather than using three VGPRs for X,Y,Z dimensions of the kernel, pack them into one register with 10-bits for each dimensions. Change-Id: I8e5b681c8287779ff9f80451d6028e862322294a	2024-01-03 10:40:34 -06:00
Matthew Poremba	5e45233484	gpu-compute: Add gfx version to HSA task entry The version is necessary for determining the correct ABI init process. Add it to the task queue so it is accessible when doing ABI init. Change-Id: If77434b0f93614057b5c40fcf612d59b54e05dbb	2024-01-03 10:40:34 -06:00
Alexander Richardson	e7d7199ea4	scons: Add option to use libc++ (#680 ) this adds an option --with-libcxx, that adds the -stdlib=libc++ flag to link against libc++ instead of libstdc++ on Linux. Currently this is only possible with clang and may not work with all build configurations (e.g. protobuf linked against libstdc++), so this needs to be opt-in rather than being on by default for clang whenever libc++ is detected. Change-Id: Ib4022a58bb2dbd32417c58f01c7443a02ff710fe	2023-12-28 12:49:44 -08:00
Bobby R. Bruce	88ea70886b	misc: Merge v23.1 staging branch into develop (#716 ) This is just to triple mark sure everything on staging is in the develop branch.	2023-12-27 20:16:30 -08:00

1 2 3 4 5 ...

21235 Commits