This patch continues the unification of how the different CPU models
create and share their instruction and data ports. Most importantly,
it forces every CPU to have an instruction and a data port, and gives
these ports explicit getters in the BaseCPU (getDataPort and
getInstPort). The patch helps in simplifying the code, make
assumptions more explicit, andfurther ease future patches related to
the CPU ports.
The biggest changes are in the in-order model (that was not modified
in the previous unification patch), which now moves the ports from the
CacheUnit to the CPU. It also distinguishes the instruction fetch and
load-store unit from the rest of the resources, and avoids the use of
indices and casting in favour of keeping track of these two units
explicitly (since they are always there anyways). The atomic, timing
and O3 model simply return references to their already existing ports.
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
This patch performs minimal changes to move the instruction and data
ports from specialised subclasses to the base CPU (to the largest
degree possible). Ultimately it servers to make the CPU(s) have a
well-defined interface to the memory sub-system.
Port proxies are used to replace non-structural ports, and thus enable
all ports in the system to correspond to a structural entity. This has
the advantage of accessing memory through the normal memory subsystem
and thus allowing any constellation of distributed memories, address
maps, etc. Most accesses are done through the "system port" that is
used for loading binaries, debugging etc. For the entities that belong
to the CPU, e.g. threads and thread contexts, they wrap the CPU data
port in a port proxy.
The following replacements are made:
FunctionalPort > PortProxy
TranslatingPort > SETranslatingPortProxy
VirtualPort > FSTranslatingPortProxy
--HG--
rename : src/mem/vport.cc => src/mem/fs_translating_port_proxy.cc
rename : src/mem/vport.hh => src/mem/fs_translating_port_proxy.hh
rename : src/mem/translating_port.cc => src/mem/se_translating_port_proxy.cc
rename : src/mem/translating_port.hh => src/mem/se_translating_port_proxy.hh
readBytes and writeBytes had the word "bytes" in their names because they
accessed blobs of bytes. This distinguished them from the read and write
functions which handled higher level data types. Because those functions don't
exist any more, this change renames readBytes and writeBytes to more general
names, readMem and writeMem, which reflect the fact that they are how you read
and write memory. This also makes their names more consistent with the
register reading/writing functions, although those are still read and set for
some reason.
This change fixes an issue where a DTLB fault occurs and redirects fetch to
handle the fault and the ITLB requires a walk which delays translation. In this
case the status of the cpu isn't updated appropriately, and an additional
instruction fetch occurs. Eventually this hits an assert as multiple instruction
fetches are occuring in the system and when the second one returns the
processor is in the wrong state.
Some asserts below are removed because it was always true (typo) and the state
after the initiateAcc() the processor could be in any valid state when a
d-side fault occurs.
Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs,
when a TLB miss occurs, initiateTranslation() can return with NoFault but with
the translation unfinished.
Instructions experiencing a delayed translation due to a hardware page table
walk are deferred until the translation completes and kept into the IQ. In
order to keep track of them, the IQ has been augmented with a queue of the
outstanding delayed memory instructions. When their translation completes,
instructions are re-executed (only their initiateAccess() was already
executed; their DTB translation is now skipped). The IEW stage has been
modified to support such a 2-pass execution.
In the case of a split transaction and a cache that is faster than a CPU we
could get two responses before next_tick expires. Add an event that is
scheduled in this case and return false rather than asserting.
This initiates a timing translation and passes the read or write on to the
processor before waiting for it to finish. Once the translation is finished,
the instruction's state is updated via the 'finish' function. A new
DataTranslation class is created to handle this.
The idea is taken from the implementation of timing translations in
TimingSimpleCPU by Gabe Black. This patch also separates out the timing
translations from this CPU and uses the new DataTranslation class.
The constructor no-longer schedules an event at construction and the implict conversion between int and bool was allowing the old code to compile without warning.
Signed-off By: Ali Saidi
A whole bunch of stuff has been converted to use the new params stuff, but
the CPU wasn't one of them. While we're at it, make some things a bit
more stylish. Most of the work was done by Gabe, I just cleaned stuff up
a bit more at the end.
The notIdleFraction statistic isn't updated when the statistics reset, probably because the cpu Status information
was pulled into the atomic and timing cpus. This changeset pulls Status back into the BaseSimpleCPU object. Anyone
care to comment on the odd naming of the Status instance? It shouldn't just be status because that is confusing
with Port::Status, but _status seems a bit strage too.
1. Make sure connectMemPorts() only gets called when the CPU's peer gets changed. This is done by making setPeer() virtual, and overriding it in the CPU's ports. When it gets called on a CPU's port (dcache specifically), it calls the normal setPeer() function, and also connectMemPorts().
2. Consolidate redundant code that handles switching in a CPU.
src/cpu/base.cc:
Move common code of switching over peers to base CPU.
src/cpu/base.hh:
Move common code of switching over peers to BaseCPU.
src/cpu/o3/cpu.cc:
Add in function that updates thread context's ports.
Also use updated function to takeOverFrom() in BaseCPU. This gets rid of some repeated code.
src/cpu/o3/cpu.hh:
Include function to update thread context's memory ports.
src/cpu/o3/lsq.hh:
Add function to dcache port that will update the memory ports upon getting a new peer.
Also include a function that will tell the CPU to update those memory ports.
src/cpu/o3/lsq_impl.hh:
Add function that will update the memory ports upon getting a new peer.
src/cpu/simple/atomic.cc:
src/cpu/simple/timing.cc:
Add function that will update thread context's memory ports upon getting a new peer.
Also use the new BaseCPU's take over from function.
src/cpu/simple/atomic.hh:
Add in function (and dcache port) that will allow the dcache to update memory ports when it gets assigned a new peer.
src/cpu/simple/timing.hh:
Add function that will update thread context's memory ports upon getting a new peer.
src/mem/port.hh:
Make setPeer virtual so that other classes can override it.
--HG--
extra : convert_revision : 2050f1241dd2e83875d281cfc5ad5c6c8705fdaf
RangeSize as a function takes a start address, and a SIZE, and will make the range (start, start+size-1) for you.
src/cpu/memtest/memtest.hh:
src/cpu/o3/fetch.hh:
src/cpu/o3/lsq.hh:
src/cpu/ozone/front_end.hh:
src/cpu/ozone/lw_lsq.hh:
src/cpu/simple/atomic.hh:
src/cpu/simple/timing.hh:
Fix RangeSize arguments
src/dev/alpha/tsunami_cchip.cc:
src/dev/alpha/tsunami_io.cc:
src/dev/alpha/tsunami_pchip.cc:
src/dev/baddev.cc:
pioSize indicates SIZE, not a mask
--HG--
extra : convert_revision : d385521fcfe58f8dffc8622260937e668a47a948
src/cpu/simple/atomic.hh:
Port now takes in the MemObject that owns it.
src/cpu/simple/timing.hh:
Port now takes in MemObject that owns it.
src/dev/io_device.cc:
src/mem/bus.hh:
Ports now take in the MemObject that owns it.
src/mem/cache/base_cache.cc:
Ports now take in the MemObject that own it.
src/mem/port.hh:
src/mem/tport.hh:
Ports now optionally take in the MemObject that owns it.
--HG--
extra : convert_revision : 890a72a871795987c2236c65937e06973412d349
into zamp.eecs.umich.edu:/z/ktlim2/clean/o3-merge/newmem
src/cpu/memtest/memtest.cc:
src/cpu/memtest/memtest.hh:
src/cpu/simple/timing.hh:
tests/configs/o3-timing-mp.py:
Hand merge.
--HG--
extra : convert_revision : a58cc439eb5e8f900d175ed8b5a85b6c8723e558
and PhysicalMemory. *No* support for caches or O3CPU.
Note that properly setting cpu_id on all CPUs is now required
for correct operation.
src/arch/SConscript:
src/base/traceflags.py:
src/cpu/base.hh:
src/cpu/simple/atomic.cc:
src/cpu/simple/timing.cc:
src/cpu/simple/timing.hh:
src/mem/physical.cc:
src/mem/physical.hh:
src/mem/request.hh:
src/python/m5/objects/BaseCPU.py:
tests/configs/simple-atomic.py:
tests/configs/simple-timing.py:
tests/configs/tsunami-simple-atomic-dual.py:
tests/configs/tsunami-simple-atomic.py:
tests/configs/tsunami-simple-timing-dual.py:
tests/configs/tsunami-simple-timing.py:
Implement Alpha LL/SC support for SimpleCPU (Atomic & Timing)
and PhysicalMemory. *No* support for caches or O3CPU.
--HG--
extra : convert_revision : 6ce982d44924cc477e049b9adf359818908e72be
src/cpu/simple/timing.cc:
Record numCycles stat properly.
src/cpu/simple/timing.hh:
Extra variable to help record numCycles stat.
--HG--
extra : convert_revision : 343311902831820264878aad41dc619999726b6b
Add a max time option in seconds and a single system root clock be 1THz
configs/test/fs.py:
Add a max time option in seconds and a single system root clock be 1THz
src/cpu/simple/timing.cc:
src/cpu/simple/timing.hh:
Enforce the timing cpu ticking at it's clock rate
--HG--
extra : convert_revision : a1b0de27abde867f9c3da5bec11639e3d82a95f5
States are now running, draining, or drained. memory state information moved into system object
system parameter is not fs only for cpus
Implement drain() support in devices
Update for drain() call that returns number of times drain_event->process() will be called
Break O3 CPU! No sense in putting in a hack change that kevin is going to remove in a few minutes i imagine
src/cpu/simple/atomic.cc:
src/cpu/simple/atomic.hh:
Since se mode has a system, allow access to it
Verify that the atomic cpu is connected to an atomic system on resume
src/cpu/simple/base.cc:
Since se mode has a system, allow access to it
src/cpu/simple/timing.cc:
src/cpu/simple/timing.hh:
Update for new drain() call that returns number of times drain_event->process() will be called and memory state being moved into the system
Since se mode has a system, allow access to it
Verify that the timing cpu is connected to an timing system on resume
src/dev/ide_disk.cc:
src/dev/io_device.cc:
src/dev/io_device.hh:
src/dev/ns_gige.cc:
src/dev/ns_gige.hh:
src/dev/pcidev.cc:
src/dev/pcidev.hh:
src/dev/sinic.cc:
src/dev/sinic.hh:
Implement drain() support in devices
src/python/m5/config.py:
Allow drain to return number of times drain_event->process() will be called. Normally 0 or 1 but things like O3 cpu or devices with multiple ports may want to call it many times
src/python/m5/objects/BaseCPU.py:
move system parameter out of fs to everyone
src/sim/sim_object.cc:
src/sim/sim_object.hh:
States are now running, draining, or drained. memory state information moved into system object
src/sim/system.cc:
src/sim/system.hh:
memory mode information now contained in system object
--HG--
extra : convert_revision : 1389c77e66ee6d9710bf77b4306fb47e107b21cf
src/cpu/o3/cpu.cc:
Fix up keeping proper state when switched out and drained.
src/cpu/simple/timing.cc:
src/cpu/simple/timing.hh:
Keep track of the event we use to schedule fetch initially and upon resume. We may have to cancel the event if the CPU is switched out.
--HG--
extra : convert_revision : 60a2a1bd2cdc67bd53ca4a67aa77166c826a4c8c
configs/test/test.py:
Update to use new cpu getPort functionality
src/cpu/base.cc:
Make cpu's a memObject to expose getPort interface
src/cpu/base.hh:
Make cpu's a memObject to export getPort interface
src/cpu/simple/atomic.cc:
src/cpu/simple/atomic.hh:
src/cpu/simple/timing.cc:
src/cpu/simple/timing.hh:
Now use the connector via getPort interface
src/mem/cache/base_cache.cc:
Make sure the cache recognizes all port names
--HG--
extra : convert_revision : dbfefa978ec755bc8aa6f962ae158acf32dafe61
src/cpu/base.cc:
src/cpu/base.hh:
src/cpu/simple/atomic.hh:
Switching out no longer takes a sampler.
src/cpu/simple/atomic.cc:
Fix up switching out. Also fix up serialization; the nameOut() was messing up the ordering.
src/cpu/simple/timing.cc:
Add in quiesce, fix up serialization.
src/cpu/simple/timing.hh:
Add in queisce, fix up serialization.
--HG--
extra : convert_revision : 9d59d53bdf269d4d82fb119e5ae7c8a5d475880b
it ends up being O(N^2). But it's probably going to have to change for the real bus anyway, so it should be rewritten then
Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true)
Removed Port Blocked/Unblocked and replaced with sendRetry().
Remove possibility of packet mangling if packet is going to be refused anyway in bridge
src/cpu/simple/atomic.cc:
src/cpu/simple/atomic.hh:
src/cpu/simple/timing.cc:
src/cpu/simple/timing.hh:
Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true)
src/dev/io_device.cc:
src/dev/io_device.hh:
Make DMA Timing requests/responses work.
Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true)
src/mem/bridge.cc:
src/mem/bridge.hh:
Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true)
Removed Port Blocked/Unblocked and replaced with sendRetry().
Remove posibility of packet mangling if packet is going to be refused anyway.
src/mem/bus.cc:
src/mem/bus.hh:
Add a very poor implementation of dealing with retries on timing requests. It is especially slow with tracing on since
it ends up being O(N^2). But it's probably going to have to change for the real bus anyway, so it should be rewritten then
src/mem/port.hh:
Change recvRetry() to not accept a packet. Sendtiming should be called again (and can respond with false or true)
Removed Blocked/Unblocked port status, their functionality is really duplicated in the recvRetry() method
--HG--
extra : convert_revision : fab613404be54bfa7a4c67572bae7b559169e573
src/cpu/simple/atomic.cc:
src/cpu/simple/base.cc:
Move traceData->finalize() into postExecute().
src/cpu/simple/timing.cc:
Fixes for full system. Now boots Alpha Linux!
- Handle ifetch faults, suspend/resume.
- Delete memory request & packet objects on response.
- Don't try to do split memory accesses on prefetch references
(ISA description doesn't support this).
src/cpu/simple/timing.hh:
Minor reorganization of internal methods.
--HG--
extra : convert_revision : 59e3ee5e4cb53c424ebdbe2e504d97e88c08a978