Commit Graph

7336 Commits

Author SHA1 Message Date
Jason Lowe-Power
824c87634d stats: Add more information to uninitialized error
ClockedObject was changed to require its regStats() to be called from every
child class. If you forget to do this, the error was indecipherable. This
patch makes the error more clear.
2016-10-14 09:02:03 -05:00
Omar Naji
78dd152a0d mem: add DRAM powerdown current
Change-Id: I763cffe0c69f5ebbbf6a6eb12bec5c13d5d0161d
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser
1dc16aff24 mem: Add DRAM low-power functionality
Added power-down state transitions to the DRAM controller model.

Added per rank parameter, outstandingEvents, which tracks the number
of outstanding command events and is used to determine when the
controller should transition to a low power state.
The controller will only transition when there are no outstanding events
scheduled and the number of command entries for the given rank is 0.

The outstandingEvents parameter is incremented for every RD/WR burst,
PRE, and REF event scheduled.  ACT is implicitly covered by RD/WR
since burst will always issue and complete after a required ACT.
The parameter is decremented when the event is serviced (completed).

The controller will automatically transition to ACT power down,
PRE power down, or SREF.

Transition to ACT power down state scheduled from:
1) The RespondEvent, where read data is received from the memory.
   ACT power-down entry will be scheduled when one or more banks is
   open, all commands for the rank have completed (no more commands
   scheduled), and there are no commands in queue for the rank

Transition to PRE power down scheduled from:
1) respondEvent, when all banks are closed, all commands have
   completed, and there are no commands in queue for the rank
2) prechargeEvent when all banks are closed, all commands have
   completed, and there are no commands in queue for the rank
3) refreshEvent, after the refresh is complete when the previous
   state was ACT power-down
4) refreshEvent, after the refresh is complete when the previous
   state was PRE power-down and there are commands in the queue.

Transition to SREF will be scheduled from:
1) refreshEvent, after the refresh is completes when the previous
   state was PRE power-down with no commands in queue

Power-down exit commands are scheduled from:
1) The refreshEvent, prior to issuing a refresh
2) doDRAMAccess, to wake-up the rank for RD/WR command issue.

Self-refresh exit commands are scheduled from:
1) The next request event, when the queue has commands for the rank
   in the readQueue or there are commands for the rank in the
   writeQueue and the bus state is WRITE.

Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser
7b269f2c95 mem: Add callback to compute stats prior to dump event
The per rank statistics are periodically updated based on
state transition and refresh events.

Add a method to update these when a dump event occurs to
ensure they reflect accurate values.
Specifically, need to ensure that the low-power state
durations, power, and energy are logged correctly.

Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser
0dd0d4ee7a mem: Modify drain to ensure banks and power are idled
Add constraint that all ranks have to be in PWR_IDLE
before signaling drain complete

This will ensure that the banks are all closed and the rank
has exited any low-power states.

On suspend, update the power stats to sync the DRAM power logic

The logic maintains the location of the signalDrainDone
method, which is still triggered from either:
1) Read response event
2) Next request event

This ensures that the drain will complete in the READ bus
state and minimizes the changes required.

Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:11 +01:00
Wendy Elsasser
27665af26d mem: Sort memory commands and update DRAMPower
Add local variable to stores commands to be issued.
These commands are in order within a single bank but will be out
of order across banks & ranks.

A new procedure, flushCmdList, sorts commands across banks / ranks,
and flushes the sorted list, up to curTick() to DRAMPower.
This is currently called in refresh, once all previous commands are
guaranteed to have completed.  Could be called in other events like
the powerEvent as well.

By only flushing commands up to curTick(), will not get out of sync
when flushed at a periodic stats dump (done in subsequent patch).

Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:10 +01:00
Omar Naji
61b2b493d4 mem: update DDR3 die revision
Change-Id: I8992ddc1664c3ed4b2d36d8a34e4ce8be113b9de
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13 19:22:10 +01:00
Omar Naji
d19dc35b06 mem: add DRAM powerdown timing 2016-10-13 19:22:10 +01:00
Omar Naji
20e6bb0140 mem: make DDR4 x16 2016-10-13 19:22:10 +01:00
Mitch Hayenga
bd0c2d5b0b isa,arm: Add missing AArch32 FP instructions
This commit adds missing non-predicated, scalar floating point
instructions.  Specifically VRINT* floating point integer rounding
instructions and VSEL* floating point conditional selects.

Change-Id: I23cbd1389f151389ac8beb28a7d18d5f93d000e7
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>
2016-10-13 19:22:10 +01:00
Andreas Sandberg
8c5df4be2e dev, arm: Make GenericTimer param handling more robust
The generic timer needs a pointer to an ArmSystem to wire itself to the
system register handler. This was previously specified as an instance
of System that was later cast to ArmSystem. Make this more robust by
specifying it as an ArmSystem in the Python interface and add a check
to make sure that it is non-NULL.

Change-Id: I989455e666f4ea324df28124edbbadfd094b0d02
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-10-07 14:14:44 +01:00
Tushar Krishna
22e6f65d72 ruby: Add M5_VAR_USED before variables used only inside assert in garnet2.0.
This removes errors when building gem5.fast
2016-10-06 21:06:00 -04:00
Tushar Krishna
dbe8892b76 ruby: garnet2.0
Revamped version of garnet with more optimized single-cycle routers,
more configurability, and cleaner code.
2016-10-06 14:35:22 -04:00
Tushar Krishna
b512f4bf71 ruby: remove the original garnet code.
Only garnet2.0 will be supported henceforth.
2016-10-06 14:35:21 -04:00
Tushar Krishna
0962d76827 config: add port directions and per-router delay in topology.
This patch adds port direction names to the links during topology
creation, which can be used for better printed names for the links
or for users to code up their own adaptive routing algorithms.
It also adds support for every router to have an independent latency
value to support heterogeneous topologies with the subsequent
garnet2.0 patch.
2016-10-06 14:35:20 -04:00
Tushar Krishna
003c08fa90 config: make internal links in network topology unidirectional.
This patch makes the internal links within the network topology
unidirectional, thus allowing any deadlock-free routing algorithms to
be specified from the topology itself using weights.
This patch also renames Mesh.py and MeshDirCorners.py to
Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing).
It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.
2016-10-06 14:35:18 -04:00
Tushar Krishna
0f68b50ff1 ruby: rename networktest to garnet_synthetic_traffic.
networktest is essentially a collection of synthetic traffic patterns
for the network. The protocol name and the tester having the same name
led to multiple python configuration files with the same name, adding
confusion. This patch renames networktest to garnet_synthetic_traffic,
and also adds more synthetic traffic patterns.
2016-10-06 14:35:16 -04:00
Tushar Krishna
aca869bf2d ruby: rename ALPHA_Network_test protocol to Garnet_standalone.
Over the past 6 years, we realized that the protocol is essentially used
to run the garnet network in a standalone manner, and feed standard synthetic
traffic patterns through it.
2016-10-06 14:35:14 -04:00
Alexandru Dutu
3f0118876f kvm: Adding details to kvm page fault in x86
Adding details, e.g. rip, rsp etc. to the kvm pagefault exit when in SE mode.
2016-10-04 13:06:05 -04:00
Alexandru Dutu
526b1b7ec8 misc: Adds a warning in case gdb is attached multiple times
Instead of scheduling another event, this patch adds a warning in case gdb
is attached multiple times and the first attachement event has not been
processed yet.
2016-10-04 13:04:19 -04:00
Alexandru Dutu
c8cf71f1a0 gpu-compute: Added method to compute the actual workgroup size
This patch adds a method to the Wavefront class to compute the actual workgroup
size. This can be different from the maximum workgroup size specified when
launching the kernel through the NDRange object. Current solution is still not
optimal, as we are computing these for each wavefront and the dispatcher also
needs to have this information and can't actually call
Wavefront::computeActuallWgSz before the wavefronts are being created. A long
term solution would be to have a Workgroup class that deals with all these
details.
2016-10-04 13:03:52 -04:00
Andreas Sandberg
18135ce6ab sim: Add a checkpoint function to test for entries
When loading a checkpoint, it's sometimes desirable to be able to test
whether an entry within a secion exists. This is currently done
automatically in the UNSERIALIZE_OPT_SCALAR macro, but it isn't
possible to do for arrays, containers, or enums. Instead of adding
even more macros, add a helper function (CheckpointIn::entryExists())
that tests for the presence of an entry.

Change-Id: I4b4646b03276b889fd3916efefff3bd552317dbc
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-10-04 11:22:16 +01:00
Brad Beckmann
ee78758857 ruby: correct size for partial memory writes
Fixed AbstractController::queueMemoryWritePartial to specify the
correct size for partial memory writes.
2016-09-29 01:06:52 -04:00
Brad Beckmann
f0971354c4 mem: minor dprintf fix to abstract mem
print number of bytes written as a decimal number, not hex
2016-09-29 01:06:33 -04:00
Curtis Dunham
109cc2caa6 arm: disable GIC extensions
Change-Id: If19b9c593b48ded1ea848f2d3710d4369ec8a221
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-22 14:46:37 +01:00
Rekai Gonzalez-Alberquilla
ad296b068c cpu: Fix the O3 CPU Drain
The drain did not wait until stages were ready again. Therefore, as a
result of messages in the TimeBuffer being drain, the state after the
drain was not consistent and asserts fired in some places when the
draining happened after a stage got blocked, but before the notification
arrived to the previous stages.

Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-22 10:49:10 +01:00
Tony Gutierrez
84f9747688 gpu-compute: fix typo in GPUDispatcher 2016-09-16 14:47:19 -04:00
Alexandru Dutu
68127ca3da hsail: Fix disassembly of load instruction with 3 destination operands 2016-09-16 12:36:20 -04:00
Alexandru Dutu
bd65ec0744 gpu-compute: Adding context serialization methods to Wavefront
This patch adds methods to serialize the context of a particular wavefront
to the simulated system memory. Context serialization is used when a wavefront
is preempeted (i.e. context switch).
2016-09-16 12:32:36 -04:00
Alexandru Dutu
e9b14d5111 gpu-compute: Refactoring Wavefront::dynWaveId 2016-09-16 12:31:46 -04:00
Alexandru Dutu
498d0e63e5 gpu-compute: Adding vector register file debug messages
This patch introduces DPRINTFs for reading and writing to and from the vector
register file.
2016-09-16 12:30:05 -04:00
Alexandru Dutu
7918376450 gpu-compute: Changing reconvergenceStack type
std::stack has no iterators, therefore the reconvergence stack can't be
iterated without poping elements off. We will be using std::list instead to be
able to iterate for saving and restoring purposes.
2016-09-16 12:29:01 -04:00
Alexandru Dutu
d5c8c5d3db gpu-compute: Adding ioctl for HW context size
Adding runtime support for determining the memory required by a SIMD engine
when executing a particular wavefront.
2016-09-16 12:27:56 -04:00
Alexandru Dutu
589e13a23b gpu-compute: Wavefront refactoring
Renaming members of the Wavefront class in accordance with the style guide.
2016-09-16 12:26:52 -04:00
Alexandru Dutu
e9fe1b838b gpu-compute: Remove WFContext
WFContext struct is currently unused and it has been rendered not useful in
saving and restoring the context of a Wavefront. Wavefront class should be
sufficient for that purpose and the runtime can figure out the memory size
it will need to allocate for a Wavefront through an IOCTL.
2016-09-16 12:26:03 -04:00
Curtis Dunham
18461d1522 base: eliminate ipython warning
Change-Id: I3e282baeb969b6bb9534813a2f433d68246c0669
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-15 18:21:38 +01:00
Ricardo Alves
e5c1488cb6 arm: Add m5_fail support for aarch64
Change-Id: Id2acbc09772be310a0eb9e33295afab07e08a4fa
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-15 18:21:24 +01:00
Radhika Jagtap
1fe5f63137 cpu: Support exit when any one Trace CPU completes replay
This change adds a Trace CPU param to exit simulation early,
i.e. when the first (any one) trace execution is complete. With
this change the user gets a choice to configure exit as either
when the last CPU finishes (default) or first CPU finishes
replay. Configuring an early exit enables simulating and
measuring stats strictly when memory-system resources are being
stressed by all Trace CPUs.

Change-Id: I3998045fdcc5cd343e1ca92d18dd7f7ecdba8f1d
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-09-15 18:01:20 +01:00
Radhika Jagtap
d067327fc0 cpu: Adjust for trace offset and fix stats
This change subtracts the time offset present in the trace from
all the event times when nodes and request are sent so that the
replay starts immediately when the simulation starts. This makes
the stats accurate when the time offset in traces is large, for
example when traces are generated in the middle of a workload
execution. It also solves the problem of unnecessary DRAM
refresh events that would keep occuring during the large time
offset before even a single request is replayed into the system.

Change-Id: Ie0898842615def867ffd5c219948386d952af7f7
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-09-15 18:01:16 +01:00
Radhika Jagtap
d7724d5f54 cpu: Add frequency scaling to the Trace CPU
This change adds a simple feature to scale the frequency of
the Trace CPU.

The compute delays in the input traces provide timing. This
change adds a freqency multiplier parameter to the Trace CPU
set to 1.0 by default. The compute delay is manipulated to
effectively achieve the  frequency at which the nodes become
ready and thus scale the frequency of the Trace CPU.

Change-Id: Iaabbd57806941ad56094fcddbeb38fcee1172431
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-09-15 18:01:09 +01:00
Michael LeBeane
443da2c030 kvm: Support timing accesses for KVM cpu
This patch enables timing accesses for KVM cpu.  A new state,
RunningMMIOPending, is added to indicate that there are outstanding timing
requests generated by KVM in the system.  KVM's tick() is disabled and the
simulation does not enter into KVM until all outstanding timing requests have
completed.  The main motivation for this is to allow KVM CPU to perform MMIO
in Ruby, since Ruby does not support atomic accesses.
2016-09-13 23:20:03 -04:00
Michael LeBeane
2c43a21687 x86: Force strict ordering for memory mapped m5ops
Normal MMAPPED_IPR requests are allowed to execute speculatively under the
assumption that they have no side effects.  The special case of m5ops that are
treated like MMAPPED_IPR should not be allowed to execute speculatively, since
they can have side-effects.  Adding the STRICT_ORDER flag to these requests
blocks execution until the associated instruction hits the ROB head.
2016-09-13 23:18:34 -04:00
Michael LeBeane
458d4a3c7b sim: Refactor quiesce and remove FS asserts
The quiesce family of magic ops can be simplified by the inclusion of
quiesceTick() and quiesce() functions on ThreadContext.  This patch also
gets rid of the FS guards, since suspending a CPU is also a valid
operation for SE mode.
2016-09-13 23:17:42 -04:00
Michael LeBeane
6e4c51fa99 dev: Add a DmaCallback class to DmaDevice
This patch introduces the DmaCallback helper class, which registers a callback
to fire after a sequence of (potentially non-contiguous) DMA transfers on a
DmaPort completes.
2016-09-13 23:14:24 -04:00
Michael LeBeane
f17a5faf44 sim, syscall_emul: Add mmap to EmulatedDriver
Add support for calling mmap on an EmulatedDriver file descriptor.
2016-09-13 23:12:46 -04:00
Michael LeBeane
6a668d0c0c gpu-compute: Fix bug with return in cfg
Connecting basic blocks would stop too early in kernels where ret was not the
last instruction.  This patch allows basic blocks after the ret instruction
to be properly connected.
2016-09-13 23:11:20 -04:00
Michael LeBeane
febab25957 dev: Exit correctly in dist-gem5
The receiver thread in dist_iface is allowed to directly exit the simulation.
This can cause exit to be called twice if the main thread simultaneously wants
to exit the simulation.  Therefore, have the receiver thread enqueue a request
to exit on the primary event queue for the main simulation thread to handle.
2016-09-13 23:08:34 -04:00
Michael LeBeane
cc58148fe1 misc: Remove FullSystem check for networking components
Ethernet devices are currently only hooked up if running in FS mode.  Much of
the Ethernet networking code is generic and can be used to build non-Ethernet
device models.  Some of these device models do not require a complex driver
stack and can be built to use an EmulatedDriver in SE mode. This patch enables
etherent interfaces to properly connect regardless of whether the simulation
is in FS or SE mode.
2016-09-13 23:06:32 -04:00
Matt Poremba
4c903d0412 base: Output all AddrRange parameters to config.ini
Currently only 'start' and 'end' of AddrRange are printed in config.ini.
This causes address ranges to be overlapping when loading a c++-only
config with interleaved addresses using CxxConfigManger. This patch adds
prints for the interleave and XOR bits to config.ini such that address
ranges are properly setup with cxx config.
2016-09-13 23:06:18 -04:00
Andreas Sandberg
3329de1e86 dev, arm: Add a customizable NoMali GPU model
Add a customizable NoMali GPU model and an example Mali T760
configuration. Unlike the normal NoMali model (NoMaliGpu), the
NoMaliCustopmGpu model exposes all the important GPU ID registers to
Python. This makes it possible to implement custom GPU configurations
by without changing the underlying NoMali library.

Change-Id: I4fdba05844c3589893aa1a4c11dc376ec33d4e9e
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-09-06 10:22:38 +01:00