derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Ali Saidi	7d0344704a	arch, cpu: Add support for flattening misc register indexes. With ARMv8 support the same misc register id results in accessing different registers depending on the current mode of the processor. This patch adds the same orthogonality to the misc register file as the others (int, float, cc). For all the othre ISAs this is currently a null-implementation. Additionally, a system variable is added to all the ISA objects.	2014-01-24 15:29:30 -06:00
Giacomo Gabrielli	3436de0c2a	cpu: Add support for Memory+Barrier instruction types in O3 cpu.	2014-01-24 15:29:30 -06:00
Ali Saidi	90b1775a8f	cpu: Add support for instructions that zero cache lines.	2014-01-24 15:29:30 -06:00
Ali Saidi	6bed6e0352	cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped. This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support.	2014-01-24 15:29:30 -06:00
Dam Sunwoo	85e8779de7	mem: per-thread cache occupancy and per-block ages This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.	2014-01-24 15:29:30 -06:00
Matt Horsnell	739c6df94e	base: add support for probe points and common probes The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model. What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out your analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh\|cc] and src/cpu/SimpleTrace.py What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners. FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful. What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction) What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event. What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener. What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes. Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats. Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting. What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%).	2014-01-24 15:29:30 -06:00
Matt Horsnell	ca89eba79e	mem: track per-request latencies and access depths in the cache hierarchy Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request.	2014-01-24 15:29:30 -06:00
Andreas Hansson	7db542c0dd	cpu: Relax check on squashed non-speculative instructions This patch relaxes the check performed when squashing non-speculative instructions, as it caused problems with loads that were marked ready, and then stalled on a blocked cache. The assertion is now allowing memory references to be non-faulting.	2014-01-24 15:29:29 -06:00
Dam Sunwoo	f1cd6b1ba8	cpu: remove faulty simpoint basic block inst count assertion This patch removes an assertion in the simpoint profiling code that asserts that a previously-seen basic block has the exact same number of instructions executed as before. This can be false if the basic block generates aborts or takes interrupts at different locations within the basic block. The basic block profiling are not affected significantly as these events are rare in general.	2014-01-24 15:29:29 -06:00
Nilay Vaish	5800e83223	cpu: call BaseCPU startup() function in o3 cpu	2013-12-03 10:36:04 -06:00
Andreas Sandberg	4b8be6a90b	kvm: Set the perf exclude_host attribute if available The performance counting framework in Linux 3.2 and onwards supports an attribute to exclude events generated by the host when running KVM. Setting this attribute allows us to get more reliable measurements of the guest machine. For example, on a highly loaded system, the instruction counts from the guest can be severely distorted by the host kernel (e.g., by page fault handlers). This changeset introduces a check for the attribute and enables it in the KVM CPU if present.	2013-10-15 10:09:23 +02:00
Andreas Sandberg	e5d63d0535	kvm: Remove the unused hostFreq member from BaseKvmCPU	2013-11-26 17:40:58 +01:00
Steve Reinhardt ext:(%2C%20Nilay%20Vaish%20%3Cnilay%40cs.wisc.edu%3E%2C%20Ali%20Saidi%20%3CAli.Saidi%40ARM.com%3E)	de366a16f1	sim: simulate with multiple threads and event queues This patch adds support for simulating with multiple threads, each of which operates on an event queue. Each sim object specifies which eventq is would like to be on. A custom barrier implementation is being added using which eventqs synchronize. The patch was tested in two different configurations: 1. ruby_network_test.py: in this simulation L1 cache controllers receive requests from the cpu. The requests are replied to immediately without any communication taking place with any other level. 2. twosys-tsunami-simple-atomic: this configuration simulates a client-server system which are connected by an ethernet link. We still lack the ability to communicate using message buffers or ports. But other things like simulation start and end, synchronizing after every quantum are working. Committed by: Nilay Vaish	2013-11-25 11:21:00 -06:00
Anthony Gutierrez	8a53da22c2	cpu: allow the fetch buffer to be smaller than a cache line the current implementation of the fetch buffer in the o3 cpu is only allowed to be the size of a cache line. some architectures, e.g., ARM, have fetch buffers smaller than a cache line, see slide 22 at: http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf this patch allows the fetch buffer to be set to values smaller than a cache line.	2013-11-15 13:21:15 -05:00
Andreas Hansson	f028da7af7	cpu: Fix Checker register index use This patch fixes an issue in the checker CPU register indexing. The code will not even compile using LTO as deep inlining causes the used index to be outside the array bounds.	2013-11-15 03:47:10 -05:00
Faissal Sleiman	397dc784fd	cpu: Construct ROB with cpu params struct instead of each variable Most other structures/stages get passed the cpu params struct.	2013-10-31 13:41:13 -05:00
Ali Saidi	79f81e2641	cpu: Fix O3 issuse with load+barrier instructions. Fix a problem in the O3 CPU for instructions that are both memory loads and memory barriers (e.g. load acquire) and to uncacheable memory. This combination can confuse the commit stage into commitng an instruction that hasn't executed and got it's value yet. At the same time refactor the code slightly to remove duplication between two of the cases.	2013-10-31 13:41:13 -05:00
Matt Horsnell	6decd70bfb	cpu: add consistent guarding to *_impl.hh files.	2013-10-17 10:20:45 -05:00
Faissal Sleiman	1746eb4a11	cpu: Removing an unused variable in rename	2013-10-17 10:20:45 -05:00
Faissal Sleiman	9195f1fbfd	cpu: Change IEW DPRINTF to use IEW debug flag IEW DPRINTF uses Decode debug flag, which appears to be a copying error. This patch changes this to the IEW Debug flag.	2013-10-17 10:20:45 -05:00
Faissal Sleiman	e516531bd0	cpu: Put in assertions to check for maximum supported LQ/SQ size LSQSenderState represents the LQ/SQ index using uint8_t, which supports up to 256 entries (including the sentinel entry). Sending packets to memory with a higher index than 255 truncates the index, such that the response matches the wrong entry. For instance, this can result in a deadlock if a store completion does not clear the head entry.	2013-10-17 10:20:45 -05:00
Ali Saidi	cf266f05a9	cpu: Fix O3 uncacheable load that is replayed but misses the TLB This change fixes an issue in the O3 CPU where an uncachable instruction is attempted to be executed before it reaches the head of the ROB. It is determined to be uncacheable, and is replayed, but a PanicFault is attached to the instruction to make sure that it is properly executed before committing. If the TLB entry it was using is replaced in the interveaning time, the TLB returns a delayed translation when the load is replayed at the head of the ROB, however the LSQ code can't differntiate between the old fault and the new one. If the translation isn't complete it can't be faulting, so clear the fault.	2013-10-17 10:20:45 -05:00
Andreas Sandberg	cc42e87b85	kvm: Fix latency calculation of IPR accesses When handling IPR accesses in doMMIOAccess, the KVM CPU used clockEdge() to convert between cycles and ticks. This is incorrect since doMMIOAccess is supposed to return a latency in ticks rather than when the access is done. This changeset fixes this issue by returning clockPeriod() * ipr_delay instead.	2013-10-16 18:12:15 +02:00
Yasuko Eckert	1bb293d1e7	arch/x86: add support for explicit CC register file Convert condition code registers from being specialized ("pseudo") integer registers to using the recently added CC register class. Nilay Vaish also contributed to this patch.	2013-10-15 14:22:44 -04:00
Yasuko Eckert	2c293823aa	cpu: add a condition-code register class Add a third register class for condition codes, in parallel with the integer and FP classes. No ISAs use the CC class at this point though.	2013-10-15 14:22:44 -04:00
Steve Reinhardt	5526221847	cpu/o3: clean up rename map and free list Restructured rename map and free list to clean up some extraneous code and separate out common code that can be reused across different register classes (int and fp at this point). Both components now consist of a set of Simple* objects that are stand-alone rename map & free list for each class, plus a Unified* object that presents a unified interface across all register classes and then redirects accesses to the appropriate Simple* object as needed. Moved free list initialization to PhysRegFile to better isolate knowledge of physical register index mappings to that class (and remove the need to pass a number of parameters to the free list constructor). Causes a small change to these stats: cpu.rename.int_rename_lookups cpu.rename.fp_rename_lookups because they are now categorized on a per-operand basis rather than a per-instruction basis. That is, an instruction with mixed fp/int/misc operand types will have each operand categorized independently, where previously the lookup was categorized based on the instruction type.	2013-10-15 14:22:44 -04:00
Steve Reinhardt	219c423f1f	cpu: rename _DepTag constants to _Reg_Base Make these names more meaningful. Specifically, made these substitutions: s/FP_Base_DepTag/FP_Reg_Base/g; s/Ctrl_Base_DepTag/Misc_Reg_Base/g; s/Max_DepTag/Max_Reg_Index/g;	2013-10-15 14:22:43 -04:00
Steve Reinhardt	9bd017b8ae	cpu/o3: clean up scoreboard object It had a bunch of fields (and associated constructor parameters) thet it didn't really use, and the array initialization was needlessly verbose. Also just hardwired the getReg() method to aleays return true for misc regs, rather than having an array of bits that we always kept marked as ready.	2013-10-15 14:22:43 -04:00
Steve Reinhardt	c009d0eb2a	cpu/o3: clean up physical register file No need for PhysRegFile to be a template class, or have a pointer back to the CPU. Also made some methods for checking the physical register type (int vs. float) based on the phys reg index, which will come in handy later.	2013-10-15 14:22:43 -04:00
Steve Reinhardt	06d246ab4a	cpu/inorder: merge register class enums The previous patch introduced a RegClass enum to clean up register classification. The inorder model already had an equivalent enum (RegType) that was used internally. This patch replaces RegType with RegClass to get rid of the now-redundant code.	2013-10-15 14:22:43 -04:00
Steve Reinhardt	7aa423acad	cpu: clean up architectural register classification Move from a poorly documented scheme where the mapping of unified architectural register indices to register classes is hardcoded all over to one where there's an enum for the register classes and a function that encapsulates the mapping.	2013-10-15 14:22:42 -04:00
Andreas Sandberg	0dd6f87e63	kvm: Service events in the instruction event queues This changset adds calls to the service the instruction event queues that accidentally went missing from commit [0063c7dd18ec]. The original commit only included the code needed to schedule instruction stops from KVM and missed the functionality to actually service the events.	2013-10-03 11:00:18 +02:00
Andreas Sandberg	469f2e31cf	kvm: Add support for thread-specific instruction events Instruction events are currently ignored when executing in KVM. This changeset adds support for triggering KVM exits based on instruction counts using hardware performance counters. Depending on the underlying performance counter implementation, there might be some inaccuracies due to instructions being counted in the host kernel when entering/exiting KVM. Due to limitations/bugs in Linux's performance counter interface, we can't reliably change the period of an overflow counter. We work around this issue by detaching and reattaching the counter if we need to reconfigure it.	2013-09-30 09:53:52 +02:00
Andreas Sandberg	86bade714e	kvm: FPU synchronization support on x86 This changeset adds support for synchronizing the FPU and SIMD state of a virtual x86 CPU with gem5. It supports both the XSave API and the KVM_(GET\|SET)_FPU kernel API. The XSave interface can be disabled using the useXSave parameter (in case of kernel issues). Unfortunately, KVM_(GET\|SET)_FPU interface seems to be buggy in some kernels (specifically, the MXCSR register isn't always synchronized), which means that it might not be possible to synchronize MXCSR on old kernels without the XSave interface. This changeset depends on the __float80 type in gcc and might not build using llvm.	2013-09-30 09:43:43 +02:00
Andreas Sandberg	30841926a3	kvm: x86: Fix segment registers to make them VMX compatible There are cases when the segment registers in gem5 are not compatible with VMX. This changeset works around all known such issues. Specifically: * The accessed bits in CS, SS, DD, ES, FS, GS are forced to 1. * The busy bit in TR is forced to 1. * The protection level of SS is forced to the same protection level as CS. The difference /seems/ to be caused by a bug in gem5's x86 implementation.	2013-09-30 09:36:54 +02:00
Andreas Sandberg	e5c319db43	kvm: Add x86 segment register verification to help debugging	2013-09-25 12:35:21 +02:00
Andreas Sandberg	599b59b387	kvm: Initial x86 support This changeset adds support for KVM on x86. Full support is split across a number of commits since some features are relatively complex. This changeset includes support for: * Integer state synchronization (including segment regs) * CPUID (gem5's CPUID values are inserted into KVM) * x86 legacy IO (remapped and handled by gem5's memory system) * Memory mapped IO * PCI * MSRs * State dumping Most of the functionality is fairly straight forward. There are some quirks to support PCI enumerations since this is done in the TLB(!) in the simulated CPUs. We currently replicate some of that code. Unlike the ARM implementation, the x86 implementation of the virtual CPU does not use the cycles hardware counter. KVM on x86 simulates the time stamp counter (TSC) in the kernel. If we just measure host cycles using perfevent, we might end up measuring a slightly different number of cycles. If we don't get the cycle accounting right, we might end up rewinding the TSC, with all kinds of chaos as a result. An additional feature of the KVM CPU on x86 is extended state dumping. This enables Python scripts controlling the simulator to request dumping of a subset of the processor state. The following methods are currenlty supported: * dumpFpuRegs * dumpIntRegs * dumpSpecRegs * dumpDebugRegs * dumpXCRs * dumpXSave * dumpVCpuEvents * dumpMSRs Known limitations: * M5 ops are currently not supported. * FPU synchronization is not supported (only affects CPU switching). Both of the limitations will be addressed in separate commits.	2013-09-25 12:24:26 +02:00
Andreas Sandberg	cd9cd85ce9	kvm: Correctly handle the return value from handleIpr(Read\|Write) The KVM base class incorrectly assumed that handleIprRead and handleIprWrite both return ticks. This is not the case, instead they return cycles. This changeset converts the returned cycles to ticks when handling IPR accesses.	2013-09-19 17:55:04 +02:00
Andreas Sandberg	211c10b46d	kvm: Fix a case where the run timers weren't armed properly There is a possibility that the timespec used to arm a timer becomes zero if the number of ticks used when arming a timer is close to the resolution of the timer. Due to the semantics of POSIX timers, this actually disarms the timer. This changeset fixes this issue by eliminating the rounding error (we always round away from zero now). It also reuses the minimum number of cycles, which were previously only used for cycle-based timers, to calculate a more useful resolution.	2013-09-19 17:55:03 +02:00
Joel Hestness	a1f9081bab	cpu: Dynamically instantiate O3 CPU LSQUnits Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's object, but it would only initialize numThreads LSQUnits as specified by the user. This had the effect of leaving some LSQUnits uninitialized when the number of threads was less than MaxThreads, and when adding statistics to the LSQUnit that must be initialized, this caused the stats initialization check to fail. By dynamically instantiating LSQUnits, they are all initialized and this avoids uninitialized LSQUnits from floating around during runtime.	2013-09-11 15:34:50 -05:00
Andreas Hansson	19a5b68db7	arch: Resurrect the NOISA build target and rename it NULL This patch makes it possible to once again build gem5 without any ISA. The main purpose is to enable work around the interconnect and memory system without having to build any CPU models or device models. The regress script is updated to include the NULL ISA target. Currently no regressions make use of it, but all the testers could (and perhaps should) transition to it. --HG-- rename : build_opts/NOISA => build_opts/NULL rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc	2013-09-04 13:22:57 -04:00
Andreas Hansson	ea40297018	cpu: Move the branch predictor out of the BaseCPU The branch predictor is guarded by having either the in-order or out-of-order CPU as one of the available CPU models and therefore should not be used in the BaseCPU. This patch moves the parameter to the relevant CPU classes.	2013-09-04 13:22:56 -04:00
Andreas Hansson	bb1d2f3957	arch: Header clean up for NOISA resurrection This patch is a first step to getting NOISA working again. A number of redundant includes make life more difficult than it has to be and this patch simply removes them. There are also some redundant forward declarations removed.	2013-09-04 13:22:55 -04:00
Andreas Hansson	c6062a3981	cpu: Fix timing CPU isDrained comment formatting This patch fixes up the comment formatting for isDrained in the timing CPU.	2013-08-20 11:21:27 -04:00
Lena Olson	646c4a23ca	cpu: Accurately count idle cycles for simple cpu Added a couple missing updates to the notIdleFraction stat. Without these, it sometimes gives a (not) idle fraction that is greater than 1 or less than 0.	2013-08-19 03:52:35 -04:00
Sascha Bischoff	e553844efc	cpu: Fix TrafficGen trace playback This patch addresses an issue with trace playback in the TrafficGen where the trace was reset but the header was not read from the trace when a captured trace was played back for a second time. This resulted in parsing errors as the expected message was not found in the trace file. The header check is moved to an init funtion which is called by the constructor and when the trace is reset. This ensures that the trace header is read each time when the trace is replayed. This patch also addresses a small formatting issue in a panic.	2013-08-19 03:52:32 -04:00
Andreas Hansson	7a61f667f0	cpu: Fix timing CPU drain check This patch modifies the SimpleTimingCPU drain check to also consider the fetch event. Previously, there was an assumption that there is never a fetch event scheduled if the CPU is not executing microcode. However, when a context is activated, a fetch even is scheduled, and microPC() is zero.	2013-08-19 03:52:30 -04:00
Andreas Hansson	9b2effd9e2	cpu: Fix a bug in the O3 CPU introduced by the cache line patch This patch fixes a bug in the O3 fetch stage that was introduced when the cache line size was moved to the system. By mistake, the initialisation and resetting of the fetch stage was merged and put in the constructor. The resetting is now re-added where it should be.	2013-08-19 03:52:24 -04:00
Andreas Sandberg	b5bb2a25aa	cpu: Remove unused getBranchPred() method from BaseCPU Remove unused virtual getBranchPred() method from BaseCPU as it is not implemented by any of the CPU models. It used to always return NULL.	2013-07-19 11:52:07 +02:00
Andreas Hansson	d4273cc9a6	mem: Set the cache line size on a system level This patch removes the notion of a peer block size and instead sets the cache line size on the system level. Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used. A follow-on patch updates the configuration scripts accordingly.	2013-07-18 08:31:16 -04:00

1 2 3 4 5 ...

1433 Commits