Commit Graph

5956 Commits

Author SHA1 Message Date
Andreas Sandberg
469f2e31cf kvm: Add support for thread-specific instruction events
Instruction events are currently ignored when executing in KVM. This
changeset adds support for triggering KVM exits based on instruction
counts using hardware performance counters. Depending on the
underlying performance counter implementation, there might be some
inaccuracies due to instructions being counted in the host kernel when
entering/exiting KVM.

Due to limitations/bugs in Linux's performance counter interface, we
can't reliably change the period of an overflow counter. We work
around this issue by detaching and reattaching the counter if we need
to reconfigure it.
2013-09-30 09:53:52 +02:00
Andreas Sandberg
86bade714e kvm: FPU synchronization support on x86
This changeset adds support for synchronizing the FPU and SIMD state
of a virtual x86 CPU with gem5. It supports both the XSave API and the
KVM_(GET|SET)_FPU kernel API. The XSave interface can be disabled
using the useXSave parameter (in case of kernel
issues). Unfortunately, KVM_(GET|SET)_FPU interface seems to be buggy
in some kernels (specifically, the MXCSR register isn't always
synchronized), which means that it might not be possible to
synchronize MXCSR on old kernels without the XSave interface.

This changeset depends on the __float80 type in gcc and might not
build using llvm.
2013-09-30 09:43:43 +02:00
Andreas Sandberg
cccca70149 x86: Add support routines to load and store 80-bit floats
The x87 FPU on x86 supports extended floating point. We currently
handle all floating point on x86 as double and don't support 80-bit
loads/stores. This changeset add a utility function to load and
convert 80-bit floats to doubles (loadFloat80) and another function to
store doubles as 80-bit floats (storeFloat80). Both functions use
libfputils to do the conversion in software. The functions are
currently not used, but are required to handle floating point in KVM
and to properly support all x87 loads/stores.
2013-09-30 09:42:30 +02:00
Andreas Sandberg
3af2d8eab0 x86: Add limited support for extracting function call arguments
Add support for extracting the first 6 64-bit integer argumements to a
function call in X86ISA::getArgument().
2013-09-30 09:37:17 +02:00
Andreas Sandberg
30841926a3 kvm: x86: Fix segment registers to make them VMX compatible
There are cases when the segment registers in gem5 are not compatible
with VMX. This changeset works around all known such issues. Specifically:

* The accessed bits in CS, SS, DD, ES, FS, GS are forced to 1.
* The busy bit in TR is forced to 1.
* The protection level of SS is forced to the same protection level as
  CS. The difference /seems/ to be caused by a bug in gem5's x86
  implementation.
2013-09-30 09:36:54 +02:00
Andreas Sandberg
e5c319db43 kvm: Add x86 segment register verification to help debugging 2013-09-25 12:35:21 +02:00
Andreas Sandberg
599b59b387 kvm: Initial x86 support
This changeset adds support for KVM on x86. Full support is split
across a number of commits since some features are relatively
complex. This changeset includes support for:

 * Integer state synchronization (including segment regs)
 * CPUID (gem5's CPUID values are inserted into KVM)
 * x86 legacy IO (remapped and handled by gem5's memory system)
 * Memory mapped IO
 * PCI
 * MSRs
 * State dumping

Most of the functionality is fairly straight forward. There are some
quirks to support PCI enumerations since this is done in the TLB(!) in
the simulated CPUs. We currently replicate some of that code.

Unlike the ARM implementation, the x86 implementation of the virtual
CPU does not use the cycles hardware counter. KVM on x86 simulates the
time stamp counter (TSC) in the kernel. If we just measure host cycles
using perfevent, we might end up measuring a slightly different number
of cycles. If we don't get the cycle accounting right, we might end up
rewinding the TSC, with all kinds of chaos as a result.

An additional feature of the KVM CPU on x86 is extended state
dumping. This enables Python scripts controlling the simulator to
request dumping of a subset of the processor state. The following
methods are currenlty supported:

 * dumpFpuRegs
 * dumpIntRegs
 * dumpSpecRegs
 * dumpDebugRegs
 * dumpXCRs
 * dumpXSave
 * dumpVCpuEvents
 * dumpMSRs

Known limitations:
  * M5 ops are currently not supported.
  * FPU synchronization is not supported (only affects CPU switching).

Both of the limitations will be addressed in separate commits.
2013-09-25 12:24:26 +02:00
Andreas Sandberg
cd9cd85ce9 kvm: Correctly handle the return value from handleIpr(Read|Write)
The KVM base class incorrectly assumed that handleIprRead and
handleIprWrite both return ticks. This is not the case, instead they
return cycles. This changeset converts the returned cycles to ticks
when handling IPR accesses.
2013-09-19 17:55:04 +02:00
Andreas Sandberg
211c10b46d kvm: Fix a case where the run timers weren't armed properly
There is a possibility that the timespec used to arm a timer becomes
zero if the number of ticks used when arming a timer is close to the
resolution of the timer. Due to the semantics of POSIX timers, this
actually disarms the timer. This changeset fixes this issue by
eliminating the rounding error (we always round away from zero
now). It also reuses the minimum number of cycles, which were
previously only used for cycle-based timers, to calculate a more
useful resolution.
2013-09-19 17:55:03 +02:00
Andreas Sandberg
a6e723e4d6 x86: Add support routines to convert between x87 tag formats
This changeset adds the convX87XTagsToTags() and convX87TagsToXTags()
which convert between the tag formats in the FTW register and the
format used in the xsave area. The conversion from to the x87 FTW
representation is currently loses some information since it does not
reconstruct the valid/zero/special flags which are not included in the
xsave representation.
2013-09-19 17:30:26 +02:00
Andreas Sandberg
4dbf25adc3 sim: Fix undefined behavior in the pseudo-inst interface
The order between updating and using arg_num in
PseudoInst::pseudoInst() is currently undefined. This changeset
explicitly updates arg_num after it has been used to extract an
argument.

--HG--
extra : rebase_source : 67c46dc3333d16ce56687ee8aea41ce6c6d133bb
2013-09-18 17:08:35 +02:00
Andreas Hansson
9aa939891f mem: Fix scheduling bug in SimpleMemory
This patch ensures that a dequeue event is not scheduled if the memory
controller is waiting for a retry already. Without this check it is
possible for the controller to attempt sending something whilst
already having one packet that is in retry, thus causing the bus to
have an assertion failure.
2013-09-18 08:46:33 -04:00
Andreas Hansson
fe5212f932 swig: Fix issue with circular import in 2.0.9/2.0.10
This patch fixes an issue which prevented gem5 from running when built
using swig 2.0.9 and 2.0.10. The generated event.py tried to import
m5.internal which in turn relied on importing event. This patch seems
to fix the problem, and so far has not caused any other issues.
2013-09-18 08:46:31 -04:00
Andreas Sandberg
e93e12a62b x86: Expose the raw hash map of MSRs
This patch allows the KVM CPU module to initialize it's MSRs by
enumerating the MSRs in the gem5 x86 implementation.
2013-09-18 11:28:28 +02:00
Andreas Sandberg
4b840b8322 x86: Add support for checking the raw state of an interrupt
In order to support hardware virtualization, we need to be able to
check if there are any interrupts pending irregardless of the
rflags.intf value. This changeset adds the checkInterruptsRaw() method
to the x86 interrupt control. It returns true if there are pending
interrupts that can be delivered as soon as the CPU is ready for
interrupt delivery.
2013-09-18 11:28:27 +02:00
Andreas Sandberg
15733e9b33 x86: Expose the interrupt vector in faults
This patch allows a hardware virtualized CPU to discover which interrupt
to deliver to the guest.
2013-09-18 11:28:24 +02:00
Joel Hestness
cc155ffa0d ruby: Fix Topology throttle connections
The Topology source sets up input and output buffers for each of the external
nodes of a topology by indexing on Ruby's generated controller unique IDs.
These unique IDs are found by adding the MachineType_base_number to the version
number of each controller (see any generated *_Controller.cc - init() calls
getToNetQueue and getFromNetQueue using m_version + base). However, the
Topology object used the cntrl_id - which is required to be unique across all
controllers - to index the controllers list as they are being connected to
their input and output buffers. If the cntrl_ids did not match the Ruby unique
ID, the throttles end up connected to incorrectly indexed nodes in the network,
resulting in packets traversing incorrect network paths. This patch fixes the
Topology indexing scheme by using the Ruby unique ID to match that of the
SimpleNetwork buffer vectors.
2013-09-11 15:35:18 -05:00
Joel Hestness
a1f9081bab cpu: Dynamically instantiate O3 CPU LSQUnits
Previously, the LSQ would instantiate MaxThreads LSQUnits in the body of it's
object, but it would only initialize numThreads LSQUnits as specified by the
user. This had the effect of leaving some LSQUnits uninitialized when the
number of threads was less than MaxThreads, and when adding statistics to the
LSQUnit that must be initialized, this caused the stats initialization check to
fail. By dynamically instantiating LSQUnits, they are all initialized and this
avoids uninitialized LSQUnits from floating around during runtime.
2013-09-11 15:34:50 -05:00
Joel Hestness
c1cf55c738 ruby: Statically allocate stats in SimpleNetwork, Switch, Throttle
The previous changeset (9863:9483739f83ee) used STL vector containers to
dynamically allocate stats in the Ruby SimpleNetwork, Switch and Throttle. For
gcc versions before at least 4.6.3, this causes the standard vector allocator
to call Stats copy constructors (a no-no, since stats should be allocated in
the body of each SimObject instance). Since the size of these stats arrays is
known at compile time (NOTE: after code generation), this patch changes their
allocation to be static rather than using an STL vector.
2013-09-11 15:33:27 -05:00
Nilay Vaish
e391fd151b stats: add operator= for DataWrapVec class
gcc/g++ 4.4.7 complained about the operator= being undefined.
This changeset adds the operator.
2013-09-09 18:52:23 -05:00
Nilay Vaish
90bfbd9793 ruby: network: convert to gem5 style stats 2013-09-06 16:21:35 -05:00
Nilay Vaish
24dc914d87 ruby: profiler: removes function resourceUsage() 2013-09-06 16:21:32 -05:00
Nilay Vaish
79b5ea9d19 ruby: remove undefined message size type
This message size type does not work well with one of the statistical
variables. It also seems unnecessary.
2013-09-06 16:21:30 -05:00
Nilay Vaish
0280997fbf ruby: network: removes reset functionality 2013-09-06 16:21:30 -05:00
Nilay Vaish
e7bd70e079 ruby: network: shorten variable names 2013-09-06 16:21:29 -05:00
Nilay Vaish
47d113696d stats: adds a Formula operator for division 2013-09-06 16:21:29 -05:00
Nilay Vaish
c0a8ad0a35 ruby: converts sparse memory stats to gem5 style 2013-09-06 16:21:28 -05:00
Andreas Hansson
53cf77cf18 sim: Fix clang warning for unused variable
This patch ensures the NULL ISA can build without causing issues with
an unused variable.
2013-09-05 13:53:54 -04:00
Andreas Hansson
3b90f52b61 util: Add ini string as tooltip info in dot output
This patch adds the config ini string as a tooltip that can be
displayed in most browsers rendering the resulting svg. Certain
characters are modified for HTML output.

Tested on chrome and firefox.
2013-09-04 13:23:00 -04:00
Andreas Hansson
fad36b35c6 util: Add colours to the dot output
This patch is adding a splash of colour to the dot output to make it
easier to distinguish objects of different types. As a bonus, the
pastel-colour palette also makes the output look like a something from
the 21st century.
2013-09-04 13:22:59 -04:00
Andreas Hansson
62cf785178 util: Add class name to dot graph and output to svg
This patch adds the class name to the label, creates some more space
by increasing the rank separation, and additionally outputs the graph
as an editable SVG in addition to the PDF.
2013-09-04 13:22:58 -04:00
Andreas Hansson
19a5b68db7 arch: Resurrect the NOISA build target and rename it NULL
This patch makes it possible to once again build gem5 without any
ISA. The main purpose is to enable work around the interconnect and
memory system without having to build any CPU models or device models.

The regress script is updated to include the NULL ISA target. Currently
no regressions make use of it, but all the testers could (and perhaps
should) transition to it.

--HG--
rename : build_opts/NOISA => build_opts/NULL
rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts
rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh
rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc
2013-09-04 13:22:57 -04:00
Andreas Hansson
ea40297018 cpu: Move the branch predictor out of the BaseCPU
The branch predictor is guarded by having either the in-order or
out-of-order CPU as one of the available CPU models and therefore
should not be used in the BaseCPU. This patch moves the parameter to
the relevant CPU classes.
2013-09-04 13:22:56 -04:00
Andreas Hansson
bb1d2f3957 arch: Header clean up for NOISA resurrection
This patch is a first step to getting NOISA working again. A number of
redundant includes make life more difficult than it has to be and this
patch simply removes them. There are also some redundant forward
declarations removed.
2013-09-04 13:22:55 -04:00
Andreas Hansson
cead68a781 alpha: Move system virtProxy to Alpha only
This patch moves the system virtual port proxy to the Alpha system
only to make the resurrection of the NOISA slightly less
painful. Alpha is the only ISA that is actually using it.
2013-09-04 13:22:55 -04:00
Andreas Hansson
fdf6f6c4b6 scons: Enable build on OSX
This patch changes the SConscript to build gem5 with libc++ on OSX as
the conventional libstdc++ does not have the C++11 constructs that the
current code base makes use of (e.g. std::forward).

Since this was the last use of the transitional TR1, the unordered map
and set header can now be simplified as well.
2013-09-04 13:22:54 -04:00
Andreas Hansson
c6062a3981 cpu: Fix timing CPU isDrained comment formatting
This patch fixes up the comment formatting for isDrained in the timing
CPU.
2013-08-20 11:21:27 -04:00
Andreas Hansson
c57c452143 base: Fix VectorPrint initialisation
This patch changes how the initialisation of the VectorPrint struct is
done so that gcc 4.4 is happy again.
2013-08-20 11:21:26 -04:00
Andreas Hansson
b63631536d stats: Cumulative stats update
This patch updates the stats to reflect the: 1) addition of the
internal queue in SimpleMemory, 2) moving of the memory class outside
FSConfig, 3) fixing up of the 2D vector printing format, 4) specifying
burst size and interface width for the DRAM instead of relying on
cache-line size, 5) performing merging in the DRAM controller write
buffer, and 6) fixing how idle cycles are counted in the atomic and
timing CPU models.

The main reason for bundling them up is to minimise the changeset
size.
2013-08-19 03:52:36 -04:00
Lena Olson
646c4a23ca cpu: Accurately count idle cycles for simple cpu
Added a couple missing updates to the notIdleFraction stat. Without
these, it sometimes gives a (not) idle fraction that is greater than 1
or less than 0.
2013-08-19 03:52:35 -04:00
Andreas Hansson
c26911013c config: Command line support for multi-channel memory
This patch adds support for specifying multi-channel memory
configurations on the command line, e.g. 'se/fs.py
--mem-type=ddr3_1600_x64 --mem-channels=4'. To enable this, it
enhances the functionality of MemConfig and moves the existing
makeMultiChannel class method from SimpleDRAM to the support scripts.

The se/fs.py example scripts are updated to make use of the new
feature.
2013-08-19 03:52:34 -04:00
Andreas Hansson
49d88f08b0 mem: Change AbstractMemory defaults to match the common case
This patch changes the default parameter value of conf_table_reported
to match the common case. It also simplifies the regression and config
scripts to reflect this change.
2013-08-19 03:52:33 -04:00
Sascha Bischoff
e553844efc cpu: Fix TrafficGen trace playback
This patch addresses an issue with trace playback in the TrafficGen
where the trace was reset but the header was not read from the trace
when a captured trace was played back for a second time. This resulted
in parsing errors as the expected message was not found in the trace
file.

The header check is moved to an init funtion which is called by the
constructor and when the trace is reset. This ensures that the trace
header is read each time when the trace is replayed.

This patch also addresses a small formatting issue in a panic.
2013-08-19 03:52:32 -04:00
Andreas Hansson
6279eaf1f7 mem: Use STL deque in favour of list for DRAM queues
This patch changes the data structure used for the DRAM read, write
and response queues from an STL list to deque. This optimisation is
based on the observation that the size is small (and fixed), and that
the structures are frequently iterated over in a linear fashion.
2013-08-19 03:52:32 -04:00
Andreas Hansson
ac42db8134 mem: Perform write merging in the DRAM write queue
This patch implements basic write merging in the DRAM to avoid
redundant bursts. When a new access is added to the queue it is
compared against the existing entries, and if it is either
intersecting or immediately succeeding/preceeding an existing item it
is merged.

There is currently no attempt made at avoiding iterating over the
existing items in determining whether merging is possible or not.
2013-08-19 03:52:31 -04:00
Amin Farmahini
243f135e5f mem: Replacing bytesPerCacheLine with DRAM burstLength in SimpleDRAM
This patch gets rid of bytesPerCacheLine parameter and makes the DRAM
configuration separate from cache line size. Instead of
bytesPerCacheLine, we define a parameter for the DRAM called
burst_length. The burst_length parameter shows the length of a DRAM
device burst in bits. Also, lines_per_rowbuffer is replaced with
device_rowbuffer_size to improve code portablity.

This patch adds a burst length in beats for each memory type, an
interface width for each memory type, and the memory controller model
is extended to reason about "system" packets vs "dram" packets and
assemble the responses properly. It means that system packets larger
than a full burst are split into multiple dram packets.
2013-08-19 03:52:30 -04:00
Andreas Hansson
7a61f667f0 cpu: Fix timing CPU drain check
This patch modifies the SimpleTimingCPU drain check to also consider
the fetch event. Previously, there was an assumption that there is
never a fetch event scheduled if the CPU is not executing
microcode. However, when a context is activated, a fetch even is
scheduled, and microPC() is zero.
2013-08-19 03:52:30 -04:00
Andreas Hansson
f7d44590cb alpha: Check interrupts before quiesce
This patch adds a check to the quiesce operation to ensure that the
CPU does not suspend itself when there are unmasked interrupts
pending. Without this patch there are corner cases when the CPU gets
an interrupt before the quiesce is executed and then never wakes up
again.
2013-08-19 03:52:29 -04:00
Sascha Bischoff
6211c24a96 stats: Fix issue when printing 2D vectors
This patch addresses an issue with the text-based stats output which
resulted in Vector2D stats being printed without subnames in the event
that one of the dimensions was of length 1.

This patch also fixes the total printing for the 2D vector. Previously
totals were printed without explicitly stating that a total was being
printed. This has been rectified in this patch.
2013-08-19 03:52:29 -04:00
Akash Bagdia
e7e17f92db power: Add voltage domains to the clock domains
This patch adds the notion of voltage domains, and groups clock
domains that operate under the same voltage (i.e. power supply) into
domains. Each clock domain is required to be associated with a voltage
domain, and the latter requires the voltage to be explicitly set.

A voltage domain is an independently controllable voltage supply being
provided to section of the design. Thus, if you wish to perform
dynamic voltage scaling on a CPU, its clock domain should be
associated with a separate voltage domain.

The current implementation of the voltage domain does not take into
consideration cases where there are derived voltage domains running at
ratio of native voltage domains, as with the case where there can be
on-chip buck/boost (charge pumps) voltage regulation logic.

The regression and configuration scripts are updated with a generic
voltage domain for the system, and one for the CPUs.
2013-08-19 03:52:28 -04:00