util-gem5art: Add gem5art packages

gem5art is a utility to help manage the artifacts used in gem5
experiments, the output from those experiments, and running the
experiments in parallel (artifacts, run, and tasks packages
respectively).

The current documentation can be found on readthedocs [1], but we are
planning on migrating this to the gem5 website very soon [2].

More information on the motivation and design was discussed at the gem5
workshop last summer. See the blog post [3] for more details.

The current version (v1.3.1) is already deployed on PyPI, and you can
install it with `pip install gem5art-artifact gem5art-run gem5art-tasks`

Once this is merged, we will update the PyPI version to match the
version in gem5 (v1.4.0). The only differences are mostly documentation
based (pointers to the documentation and source), but we have also
updated the style to strictly match PEP8 with black [4].

gem5art is a *utility* to use with gem5. So, we expect that the
versioning and release schedule will not necessarily match gem5's (hence
a separate versioning structure and separate RELEASE-NOTES, etc.).

[1]: https://gem5art.readthedocs.io/en/latest/
[2]: https://www.gem5.org/documentation/gem5art
[3]: http://www.gem5.org/2020/05/26/gem5art.html
[4]: https://github.com/psf/black

Change-Id: Ic8af63edf0cb7df4693a46413f7278a3e8ac6846
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42121
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Ayaz Akram <yazakram@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit is contained in:
Jason Lowe-Power
2021-03-03 09:33:45 -08:00
committed by Jason Lowe-Power
parent 215096d195
commit 7cdf8c00f8
28 changed files with 3237 additions and 0 deletions

8
util/gem5art/.gitignore vendored Normal file
View File

@@ -0,0 +1,8 @@
*.swp
*~
.venv
__pycache__
dist/
*.egg-info/
.vscode/
.mypy_cache/

25
util/gem5art/LICENSE Normal file
View File

@@ -0,0 +1,25 @@
Copyright (c) 2019-2021 The Regents of the University of California.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met: redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer;
redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution;
neither the name of the copyright holders nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

104
util/gem5art/README.md Normal file
View File

@@ -0,0 +1,104 @@
<img alt="gem5art logo" src="/gem5art.svg" width=150>
# gem5art: Artifact, reproducibility, and testing utilities for gem5
![CI Badge](https://github.com/darchr/gem5art/workflows/CI/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/gem5art/badge/?version=latest)](https://gem5art.readthedocs.io/en/latest/?badge=latest)
See <http://www.gem5.org/documentation/gem5art> for detailed documentation.
## Installing gem5art
To install gem5art, simply use pip.
We suggest creating a virtual environment first.
Note that gem5art requires Python 3, so be sure to use a Python 3 interpreter when creating the virtual environment
```sh
virtualenv -p python3
pip install gem5art-artifact gem5art-run gem5art-tasks
```
It's not required to install all of the gem5art utilities (e.g., you can skip gem5art-tasks if you don't want to use the celery job server).
## Running the tests
Below describes how to run the tests locally before uploading your changes.
### mypy: Python static analysis
[mypy](http://mypy-lang.org/) is a static type checker for Python.
By annotating the code with types and using a static type checker, we can have many of the benefits of a compiled language, with the benefits of Python!
Before contributing any code, please add type annotations and run the type checker.
The type checker must be run for each package separately.
```sh
cd artifact
mypy -p gem5art.artifact
```
```sh
cd run
mypy -p gem5art.run
```
```sh
cd tasks
mypy -p gem5art.tasks
```
You should see something like the following output:
```
Success: no issues found in 3 source files
```
If you see `0 source files`, then it's mostly likely that mypy has been run in the wrong directory.
If there are problems with imports, you may need to add `# type: ignore` after the `import` statement if there are third party packages without type annotations.
### Running the unit tests
We currently only have a small number of unit tests.
Although, we are working on adding more!
To run the unit tests, use the Python `unittest` module.
```sh
python -m unittest
```
You must run this in each package's subdirectory.
The output should be something like the following:
```
...
----------------------------------------------------------------------
Ran 3 tests in 0.141s
OK
```
If you instead see `Ran 0 tests`, then most likely you are in the wrong directory.
## Directory structure
The directory structure is a little strange so we can distribute each Python package separately.
However, they are all part of the gem5art namespace.
See the [Python namespace documentation](https://packaging.python.org/guides/packaging-namespace-packages/) for more details.
## Building for distribution
1. Run the setup.py. This must be done in each subdirectory to get the packages to build correctly.
```sh
python setup.py sdist
```
2. Upload to PyPI
```sh
twine upload dist/*
```
These two steps must be completed for each package (e.g., artifact, run, and tasks).

View File

@@ -0,0 +1,25 @@
# Release notes for the gem5art package
## v1.4.0
- Update version now that it's in gem5
## v1.3.1
- Minor fixes
- Update documentation
- Prepare for merging with main gem5 repository
## v1.3.0
### Database now configurable
- Instead of only working with MongoDB installed at localhost, you can now specify the database connection parameter.
- You can specify it by explicitly calling `artifact.getDBConnection()` or using the `GEM5ART_DB` environment variable.
- The default is still `mongodb://localhost:271017`.
- All functions that query the database now *require* a `db` parameter (e.g., `getRuns()`).
- Reorganized some of the db functions in artifact, but this shouldn't affect end users.
### Other changes
- General documentation updates

View File

@@ -0,0 +1,269 @@
# gem5art artifact package
This package contains the `Artifact` type and an artifact database for use with [gem5art](http://www.gem5.org/documentation/gem5art/).
Please cite the [gem5art paper](https://arch.cs.ucdavis.edu/papers/2021-3-28-gem5art) when using the gem5art packages.
This documentation can be found on the [gem5 website](https://www.gem5.org/documentation/gem5art/)
## gem5art artifacts
All unique objects used during gem5 experiments are termed "artifacts" in gem5art.
Examples of artifacts include: gem5 binary, gem5 source code repo, Linux kernel source repo, linux binary, disk image, and packer binary (used to build the disk image).
The goal of this infrastructure is to keep a record of all the artifacts used in a particular experiment and to return the set of used artifacts when the same experiment needs to be performed in the future.
The description of an artifact serves as the documentation of how that artifact was created.
One of the goals of gem5art is for these artifacts to be self contained.
With just the metadata stored with the artifact a third party should be able to perfectly reproduce the artifact.
(We are still working toward this goal.
For instance, we are looking into using docker to create artifacts to separate artifact creation from the host platform its run on.)
Each artifact is characterized by a set of attributes, described below:
- command: command used to build this artifact
- typ: type of the artifact e.g. binary, git repo etc.
- name: name of the artifact
- cwd: current working directory, where the command to build the artifact is run
- path: actual path of the location of the artifact
- inputs: a list of the artifacts used to build the current artifact
- documentation: a docstring explaining the purpose of the artifact and any other useful information that can help to reproduce the artifact
Additionally, each artifact also has the following implicit information.
- hash: an MD5 hash for a binary artifact or a git hash for a git artifact
- time: time of the creation of an artifact
- id: a UUID associated with the artifact
- git: a dictionary containing the origin, current commit and the repo name for a git artifact (will be an empty dictionary for other types of artifacts)
These attribute are not specified by the user, but are generated by gem5art automatically (when the `Artifact` object is created for the first time).
An example of how a user would create a gem5 binary artifact using gem5art is shown below.
In this example, the type, name, and documentation are up to the user of gem5art.
You're encouraged to use names that are easy to remember when you later query the database.
The documentation attribute should be used to completely describe the artifact that you are saving.
```python
gem5_binary = Artifact.registerArtifact(
command = 'scons build/X86/gem5.opt',
typ = 'gem5 binary',
name = 'gem5',
cwd = 'gem5/',
path = 'gem5/build/X86/gem5.opt',
inputs = [gem5_repo,],
documentation = '''
Default gem5 binary compiled for the X86 ISA.
This was built from the main gem5 repo (gem5.googlesource.com) without
any modifications. We recently updated to the current gem5 master
which has a fix for memory channel address striping.
'''
)
```
Another goal of gem5art is to enable sharing of artifacts among multiple users, which is achieved through the use of the centralized database.
Basically, whenever a user tries to create a new artifact, the database is searched to find if the same artifact exists there.
If it does, the user can download the matching artifact for use.
Otherwise, the newly created artifact is uploaded to the database for later use.
The use of database also avoids running identical experiments (by generating an error message if a user tries to execute exact run which already exists in the database).
### Creating artifacts
To create an `Artifact`, you must use [`registerArtifact`](artifacts.html#gem5art.artifact.artifact.Artifact.registerArtifact) as shown in the above example as well.
This is a factory method which will initially create the artifact.
When calling `registerArtifact`, the artifact will automatically be added to the database.
If it already exists, a pointer to that artifact will be returned.
The parameters to the `registerArtifact` function are meant for *documentation*, not as explicit directions to create the artifact from scratch.
In the future, this feature may be added to gem5art.
Note: While creating new artifacts, warning messages showing that certain attributes (except hash and id) of two artifacts don't match (when artifact similarity is checked in the code) might appear. Users should make sure that they understand the reasons of any such warnings.
### Using artifacts from the database
You can create an artifact with just a UUID if it is already stored in the database.
The behavior will be the same as when creating an artifact that already exists.
All of the properties of the artifact will be populated from the database.
## ArtifactDB
The particular database used in this work is [MongoDB](https://www.mongodb.com/).
We use MongoDB since it can easily store large files (e.g., disk images), is tightly integrated with Python through [pymongo](https://api.mongodb.com/python/current/), and has an interface that is flexible as the needs of gem5art changes.
Currently, it's required to run a database to use gem5.
However, we are planning on changing this default to allow gem5art to be used standalone as well.
gem5art allows you to connect to any database, but by default assumes there is a MongoDB instance running on the localhost at `mongo://localhost:27017`.
You can use the environment variable `GEM5ART_DB` to specify the default database to connect when running simple scripts.
Additionally, you can specify the location of the database when calling `getDBConnection` in your scripts.
In case no database exists or a user want their own database, you can create a new database by creating a new directory and running the mongodb docker image.
See the [MongoDB docker documentation](https://hub.docker.com/_/mongo) or the [MongoDB documentation](https://docs.mongodb.com/) for more information.
```sh
`docker run -p 27017:27017 -v <absolute path to the created directory>:/data/db --name mongo-<some tag> -d mongo`
```
This uses the official [MongoDB Docker image](https://hub.docker.com/_/mongo) to run the database at the default port on the localhost.
If the Docker container is killed, it can be restarted with the same command line and the database should be consistent.
### Connecting to an existing database
By default, gem5art will assume the database is running at `mongodb://localhost:27017`, which is MongoDB's default on the localhost.
The environment variable `GEM5ART_DB` can override this default.
Otherwise, to programmatically set a database URI when using gem5art, you can pass a URI to the `getDatabaseConnection` function.
Currently, gem5art only supports MongoDB database backends, but extending this to other databases should be straightforward.
### Searching the Database
gem5art provides a few convience functions for searching and accessing the database.
These functions can be found in `artifact.common_queries`.
Specifically, we provide the following functions:
- `getByName`: Returns all objects mathching `name` in database.
- `getDiskImages`: Returns a generator of disk images (type = disk image).
- `getLinuxBinaries`: Returns a generator of Linux kernel binaries (type = kernel).
- `getgem5Binaries`: Returns a generator of gem5 binaries (type = gem5 binary).
### Downloading from the Database
You can also download a file associated with an artifact using functions provided by gem5art. A good way to search and download items from the database is by using the Python interactive shell.
You can search the database with the functions provided by the `artifact` module (e.g., [`getByName`](artifacts.html#gem5art.artifact.artifact.getByName), [`getByType`](artifacts.html#gem5art.artifact.artifact.getByType), etc.).
Then, once you've found the ID of the artifact you'd like to download, you can call [`downloadFile`](artifacts.html#gem5art.artifact._artifactdb.ArtifactDB.downloadFile).
See the example below.
```sh
$ python
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gem5art.artifact import *
>>> db = getDBConnection()
>>> for i in getDiskImages(db, limit=2): print(i)
...
ubuntu
id: d4a54de8-3a1f-4d4d-9175-53c15e647afd
type: disk image
path: disk-image/ubuntu-image/ubuntu
inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, m5:69dad8b1-48d0-43dd-a538-f3196a894804
Ubuntu with m5 binary installed and root auto login
ubuntu
id: c54b8805-48d6-425d-ac81-9b1badba206e
type: disk image
path: disk-image/ubuntu-image/ubuntu
inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:5bfaab52-7d04-49f2-8fea-c5af8a7f34a8, m5:69dad8b1-48d0-43dd-a538-f3196a894804
Ubuntu with m5 binary installed and root auto login
>>> for i in getLinuxBinaries(db, limit=2): print(i)
...
vmlinux-5.2.3
id: 8cfd9fbe-24d0-40b5-897e-beca3df80dd2
type: kernel
path: linux-stable/vmlinux-5.2.3
inputs: fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
Kernel binary for 5.2.3 with simple config file
vmlinux-5.2.3
id: 9721d8c9-dc41-49ba-ab5c-3ed169e24166
type: kernel
path: linux-stable/vmlinux-5.2.3
inputs: npb:85e6dd97-c946-4596-9b52-0bb145810d68, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
Kernel binary for 5.2.3 with simple config file
>>> from uuid import UUID
>>> db.downloadFile(UUID('8cfd9fbe-24d0-40b5-897e-beca3df80dd2'), 'linux-stable/vmlinux-5.2.3')
```
For another example, assume there is a disk image named `npb` (containing [NAS Parallel](https://www.nas.nasa.gov/) Benchmarks) in your database and you want to download the disk image to your local directory. You can do the following to download the disk image:
```python
import gem5art.artifact
db = gem5art.artifact.getDBConnection()
disks = gem5art.artifact.getByName(db, 'npb')
for disk in disks:
if disk.type == 'disk image' and disk.documentation == 'npb disk image created on Nov 20':
db.downloadFile(disk._id, 'npb')
```
Here, we assume that there can be multiple disk images/artifacts with the name `npb` and we are only interested in downloading the npb disk image with a particular documentation ('npb disk image created on Nov 20'). Also, note that there is not a single way to download files from the database (although they will eventually use the downloadFile function).
The dual of the [downloadFile](artifacts.html#gem5art.artifact._artifactdb.ArtifactDB.downloadFile) method used above is [upload](artifacts.html#gem5art.artifact._artifactdb.ArtifactDB.upload).
#### Database schema
Alternative, you can use the pymongo Python module or the mongodb command line interface to interact with the database.
See the [MongoDB documentation](https://docs.mongodb.com/) for more information on how to query the MongoDB database.
gem5art has two collections.
`artifact_database.artifacts` stores all of the metadata for the artifacts and `artifact_database.fs` is a [GridFS](https://docs.mongodb.com/manual/core/gridfs/) store for all of the files.
The files in the GridFS use the same UUIDs as the Artifacts as their primary keys.
You can list all of the details of all of the artifacts by running the following in Python.
```python
#!/usr/bin/env python3
from pymongo import MongoClient
db = MongoClient().artifact_database
for i in db.artifacts.find():
print(i)
```
gem5art also provides a few methods to search the database for artifacts of a particular type or name. For example, to find all disk images in a database you can do the following:
```python
import gem5art.artifact
db = gem5art.artifact.getDBConnection('mongo://localhost')
for i in gem5art.artifact.getDiskImages(db):
print(i)
```
Other similar methods include: `getLinuxBinaries()`, `getgem5Binaries()`
You can use getByName() method to search database for artifacts using the name attribute. For example, to search for gem5 named artifacts:
```python
import gem5art.artifact
db = gem5art.artifact.getDBConnection('mongo://localhost')
for i in gem5art.artifact.getByName(db, "gem5"):
print(i)
```
## Artifacts API Documentation
```eval_rst
Artifact Module
--------
.. automodule:: gem5art.artifact
:members:
Artifact
--------
.. automodule:: gem5art.artifact.artifact
:members:
:undoc-members:
Artifact
--------
.. automodule:: gem5art.artifact.artifact.Artifact
:members:
:undoc-members:
Helper Functions for Common Queries
-----------------------------------
.. automodule:: gem5art.artifact.common_queries
:members:
:undoc-members:
AritifactDB
-----------
This is mostly internal.
.. automodule:: gem5art.artifact._artifactdb
:members:
:undoc-members:
```

View File

@@ -0,0 +1,45 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""This is the gem5 artifact package"""
from .artifact import Artifact
from .common_queries import (
getByName,
getDiskImages,
getLinuxBinaries,
getgem5Binaries,
)
from ._artifactdb import getDBConnection
__all__ = [
"Artifact",
"getByName",
"getDiskImages",
"getLinuxBinaries",
"getgem5Binaries",
"getDBConnection",
]

View File

@@ -0,0 +1,256 @@
# Copyright (c) 2019-2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""This file defines the ArtifactDB type and some common implementations of
ArtifactDB.
The database interface defined here does not include any schema information.
The database "schema" is defined in the artifact.py file based on the types of
artifacts stored in the database.
Some common queries can be found in common_queries.py
"""
from abc import ABC, abstractmethod
import gridfs # type: ignore
import os
from pathlib import Path
from pymongo import MongoClient # type: ignore
from typing import Any, Dict, Iterable, Union, Type
from urllib.parse import urlparse
from uuid import UUID
class ArtifactDB(ABC):
"""
Abstract base class for all artifact DBs.
"""
@abstractmethod
def __init__(self, uri: str):
"""Initialize the database with a URI"""
pass
@abstractmethod
def put(self, key: UUID, artifact: Dict[str, Union[str, UUID]]) -> None:
"""Insert the artifact into the database with the key"""
pass
@abstractmethod
def upload(self, key: UUID, path: Path) -> None:
"""Upload the file at path to the database with _id of key"""
pass
@abstractmethod
def __contains__(self, key: Union[UUID, str]) -> bool:
"""Key can be a UUID or a string. Returns true if item in DB"""
pass
@abstractmethod
def get(self, key: Union[UUID, str]) -> Dict[str, str]:
"""Key can be a UUID or a string. Returns a dictionary to construct
an artifact.
"""
pass
@abstractmethod
def downloadFile(self, key: UUID, path: Path) -> None:
"""Download the file with the _id key to the path. Will overwrite the
file if it currently exists."""
pass
def searchByName(self, name: str, limit: int) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some name. Note: Not all DB implementations will implement this
function"""
raise NotImplementedError()
def searchByType(self, typ: str, limit: int) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some type. Note: Not all DB implementations will implement this
function"""
raise NotImplementedError()
def searchByNameType(
self, name: str, typ: str, limit: int
) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some name and type. Note: Not all DB implementations will implement
this function"""
raise NotImplementedError()
def searchByLikeNameType(
self, name: str, typ: str, limit: int
) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some type and a regex name. Note: Not all DB implementations will
implement this function"""
raise NotImplementedError()
class ArtifactMongoDB(ArtifactDB):
"""
This is a mongodb database connector for storing Artifacts (as defined in
artifact.py).
This database stores the data in three collections:
- artifacts: This stores the json serialized Artifact class
- files and chunks: These two collections store the large files required
for some artifacts. Within the files collection, the _id is the
UUID of the artifact.
"""
def __init__(self, uri: str) -> None:
"""Initialize the mongodb connection and grab pointers to the databases
uri is the location of the database in a mongodb compatible form.
http://dochub.mongodb.org/core/connections.
"""
# Note: Need "connect=False" so that we don't connect until the first
# time we interact with the database. Required for the gem5 running
# celery server
self.db = MongoClient(host=uri, connect=False).artifact_database
self.artifacts = self.db.artifacts
self.fs = gridfs.GridFSBucket(self.db, disable_md5=True)
def put(self, key: UUID, artifact: Dict[str, Union[str, UUID]]) -> None:
"""Insert the artifact into the database with the key"""
assert artifact["_id"] == key
self.artifacts.insert_one(artifact)
def upload(self, key: UUID, path: Path) -> None:
"""Upload the file at path to the database with _id of key"""
with open(path, "rb") as f:
self.fs.upload_from_stream_with_id(key, str(path), f)
def __contains__(self, key: Union[UUID, str]) -> bool:
"""Key can be a UUID or a string. Returns true if item in DB"""
if isinstance(key, UUID):
count = self.artifacts.count_documents({"_id": key}, limit=1)
else:
# This is a hash. Count the number of matches
count = self.artifacts.count_documents({"hash": key}, limit=1)
return bool(count > 0)
def get(self, key: Union[UUID, str]) -> Dict[str, str]:
"""Key can be a UUID or a string. Returns a dictionary to construct
an artifact.
"""
if isinstance(key, UUID):
return self.artifacts.find_one({"_id": key}, limit=1)
else:
# This is a hash.
return self.artifacts.find_one({"hash": key}, limit=1)
def downloadFile(self, key: UUID, path: Path) -> None:
"""Download the file with the _id key to the path. Will overwrite the
file if it currently exists."""
with open(path, "wb") as f:
self.fs.download_to_stream(key, f)
def searchByName(self, name: str, limit: int) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some name."""
for d in self.artifacts.find({"name": name}, limit=limit):
yield d
def searchByType(self, typ: str, limit: int) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some type."""
for d in self.artifacts.find({"type": typ}, limit=limit):
yield d
def searchByNameType(
self, name: str, typ: str, limit: int
) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some name and type."""
for d in self.artifacts.find({"type": typ, "name": name}, limit=limit):
yield d
def searchByLikeNameType(
self, name: str, typ: str, limit: int
) -> Iterable[Dict[str, Any]]:
"""Returns an iterable of all artifacts in the database that match
some type and a regex name."""
data = self.artifacts.find(
{"type": typ, "name": {"$regex": "{}".format(name)}}, limit=limit
)
for d in data:
yield d
_db = None
_default_uri = "mongodb://localhost:27017"
_db_schemes: Dict[str, Type[ArtifactDB]] = {"mongodb": ArtifactMongoDB}
def _getDBType(uri: str) -> Type[ArtifactDB]:
"""Internal function to take a URI and return a class that can be
constructed with that URI. For instance "mongodb://localhost" will return
an ArtifactMongoDB. More types will be added in the future.
Supported types:
**ArtifactMongoDB**: mongodb://...
See http://dochub.mongodb.org/core/connections for details.
"""
result = urlparse(uri)
if result.scheme in _db_schemes:
return _db_schemes[result.scheme]
else:
raise Exception(f"Cannot find DB type for {uri}")
def getDBConnection(uri: str = "") -> ArtifactDB:
"""Returns the database connection
uri: a string representing the URI of the database. See _getDBType for
details. If no URI is given we use the default
(mongodb://localhost:27017) or the value in the GEM5ART_DB environment
variable.
If the connection has not been established, this will create a new
connection. If the connection has been established, this will replace the
connection if the uri input is non-empy.
"""
global _db
# mypy bug: https://github.com/python/mypy/issues/5423
if _db is not None and not uri: # type: ignore[unreachable]
# If we have already established a connection, use that
return _db # type: ignore[unreachable]
if not uri:
uri = os.environ.get("GEM5ART_DB", _default_uri)
typ = _getDBType(uri)
_db = typ(uri)
return _db

View File

@@ -0,0 +1,312 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""File contains the Artifact class and helper functions
"""
import hashlib
from inspect import cleandoc
import os
from pathlib import Path
import subprocess
import time
from typing import Any, Dict, Iterator, List, Union
from uuid import UUID, uuid4
from ._artifactdb import getDBConnection
def getHash(path: Path) -> str:
"""
Returns an md5 hash for the file in self.path.
"""
BUF_SIZE = 65536
md5 = hashlib.md5()
with open(path, "rb") as f:
while True:
data = f.read(BUF_SIZE)
if not data:
break
md5.update(data)
return md5.hexdigest()
def getGit(path: Path) -> Dict[str, str]:
"""
Returns dictionary with origin, current commit, and repo name for the
base repository for `path`.
An exception is generated if the repo is dirty or doesn't exist
"""
path = path.resolve() # Make absolute
if path.is_file():
path = path.parent
command = [
"git",
"status",
"--porcelain",
"--ignore-submodules",
"--untracked-files=no",
]
res = subprocess.run(command, stdout=subprocess.PIPE, cwd=path)
if res.returncode != 0:
raise Exception("git repo doesn't exist for {}".format(path))
if res.stdout:
raise Exception("git repo dirty for {}".format(path))
command = ["git", "remote", "get-url", "origin"]
origin = subprocess.check_output(command, cwd=path)
command = ["git", "log", "-n1", "--pretty=format:%H"]
hsh = subprocess.check_output(command, cwd=path)
command = ["git", "rev-parse", "--show-toplevel"]
name = subprocess.check_output(command, cwd=path)
return {
"origin": str(origin.strip(), "utf-8"),
"hash": str(hsh.strip(), "utf-8"),
"name": str(name.strip(), "utf-8"),
}
class Artifact:
"""
A base artifact class.
It holds following attributes of an artifact:
1) name: name of the artifact
2) command: bash command used to generate the artifact
3) path: path of the location of the artifact
4) time: time of creation of the artifact
5) documentation: a string to describe the artifact
6) ID: unique identifier of the artifact
7) inputs: list of the input artifacts used to create this artifact stored
as a list of uuids
"""
_id: UUID
name: str
type: str
documentation: str
command: str
path: Path
hash: str
time: float
git: Dict[str, str]
cwd: Path
inputs: List["Artifact"]
@classmethod
def registerArtifact(
cls,
command: str,
name: str,
cwd: str,
typ: str,
path: Union[str, Path],
documentation: str,
inputs: List["Artifact"] = [],
) -> "Artifact":
"""Constructs a new artifact.
This assume either it's not in the database or it is the exact same as
when it was added to the database
"""
_db = getDBConnection()
# Dictionary with all of the kwargs for construction.
data: Dict[str, Any] = {}
data["name"] = name
data["type"] = typ
data["documentation"] = cleandoc(documentation)
if len(data["documentation"]) < 10: # 10 characters is arbitrary
raise Exception(
cleandoc(
"""Must provide longer documentation!
This documentation is how your future data will remember what
this artifact is and how it was created."""
)
)
data["command"] = cleandoc(command)
data["time"] = time.time()
ppath = Path(path)
data["path"] = ppath
if ppath.is_file():
data["hash"] = getHash(ppath)
data["git"] = {}
elif ppath.is_dir():
data["git"] = getGit(ppath)
data["hash"] = data["git"]["hash"]
else:
raise Exception("Path {} doesn't exist".format(ppath))
pcwd = Path(cwd)
data["cwd"] = pcwd
if not pcwd.exists():
raise Exception("cwd {} doesn't exist.".format(pcwd))
if not pcwd.is_dir():
raise Exception("cwd {} is not a directory".format(pcwd))
data["inputs"] = [i._id for i in inputs]
if data["hash"] in _db:
old_artifact = Artifact(_db.get(data["hash"]))
data["_id"] = old_artifact._id
# Now that we have a complete object, construct it
self = cls(data)
self._checkSimilar(old_artifact)
else:
data["_id"] = uuid4()
# Now that we have a complete object, construct it
self = cls(data)
# Upload the file if there is one.
if self.path.is_file():
_db.upload(self._id, self.path)
# Putting the artifact to the database
_db.put(self._id, self._getSerializable())
return self
def __init__(self, other: Union[str, UUID, Dict[str, Any]]) -> None:
"""Constructs the object from the database based on a UUID or
dictionary from the database
"""
_db = getDBConnection()
if isinstance(other, str):
other = UUID(other)
if isinstance(other, UUID):
other = _db.get(other)
if not other:
raise Exception("Cannot construct artifact")
assert isinstance(other["_id"], UUID)
self._id = other["_id"]
self.name = other["name"]
self.type = other["type"]
self.documentation = other["documentation"]
self.command = other["command"]
self.path = Path(other["path"])
self.hash = other["hash"]
assert isinstance(other["git"], dict)
self.git = other["git"]
self.cwd = Path(other["cwd"])
self.inputs = [Artifact(i) for i in other["inputs"]]
def __str__(self) -> str:
inputs = ", ".join([i.name + ":" + str(i._id) for i in self.inputs])
return "\n ".join(
[
self.name,
f"id: {self._id}",
f"type: {self.type}",
f"path: {self.path}",
f"inputs: {inputs}",
self.documentation,
]
)
def __repr__(self) -> str:
return vars(self).__repr__()
def _getSerializable(self) -> Dict[str, Union[str, UUID]]:
data = vars(self).copy()
data["inputs"] = [input._id for input in self.inputs]
data["cwd"] = str(data["cwd"])
data["path"] = str(data["path"])
return data
def __eq__(self, other: object) -> bool:
"""checks if two artifacts are the same.
Two artifacts are the same if they have the same UUID and the same
hash. We emit a warning if other fields are different. If other fields
are different and the hash is the same, this is suggestive that the
user is doing something wrong.
"""
if not isinstance(other, Artifact):
return NotImplemented
if self.hash == other.hash and self._id == other._id:
self._checkSimilar(other)
return True
else:
return False
def _checkSimilar(self, other: "Artifact"):
"""Prints warnings if other is simlar, but not the same as self.
These mismatches may or may not be a problem. It's up to the user to
make this decision.
"""
if self.name != other.name:
print(
f"WARNING: name mismatch for {self.name}! "
f"{self.name} != {other.name}"
)
if self.documentation != other.documentation:
print(
f"WARNING: documentation mismatch for {self.name}! "
f"{self.documentation} != {other.documentation}"
)
if self.command != other.command:
print(
f"WARNING: command mismatch for {self.name}! "
f"{self.command} != {other.command}"
)
if self.path != other.path:
print(
f"WARNING: path mismatch for {self.name}! "
f"{self.path} != {other.path}"
)
if self.cwd != other.cwd:
print(
f"WARNING: cwd mismatch for {self.name}! "
f"{self.cwd} != {other.cwd}"
)
if self.git != other.git:
print(
f"WARNING: git mismatch for {self.name}! "
f"{self.git} != {other.git}"
)
mismatch = set(self.inputs).symmetric_difference(other.inputs)
if mismatch:
print(f"WARNING: input mismatch for {self.name}! {mismatch}")
def __hash__(self) -> int:
return self._id.int

View File

@@ -0,0 +1,83 @@
# Copyright (c) 2020-2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""File contains the some helper functions with common queries for artifacts
in the ArtifactDB.
"""
from typing import Iterator
from ._artifactdb import ArtifactDB
from .artifact import Artifact
def _getByType(db: ArtifactDB, typ: str, limit: int = 0) -> Iterator[Artifact]:
"""Returns a generator of Artifacts with matching `type` from the db.
Limit specifies the maximum number of results to return.
"""
data = db.searchByType(typ, limit=limit)
for d in data:
yield Artifact(d)
def getDiskImages(db: ArtifactDB, limit: int = 0) -> Iterator[Artifact]:
"""Returns a generator of disk images (type = disk image).
Limit specifies the maximum number of results to return.
"""
return _getByType(db, "disk image", limit)
def getgem5Binaries(db: ArtifactDB, limit: int = 0) -> Iterator[Artifact]:
"""Returns a generator of gem5 binaries (type = gem5 binary).
Limit specifies the maximum number of results to return.
"""
return _getByType(db, "gem5 binary", limit)
def getLinuxBinaries(db: ArtifactDB, limit: int = 0) -> Iterator[Artifact]:
"""Returns a generator of Linux kernel binaries (type = kernel).
Limit specifies the maximum number of results to return.
"""
return _getByType(db, "kernel", limit)
def getByName(db: ArtifactDB, name: str, limit: int = 0) -> Iterator[Artifact]:
"""Returns all objects mathching `name` in database.
Limit specifies the maximum number of results to return.
"""
data = db.searchByName(name, limit=limit)
for d in data:
yield Artifact(d)

View File

@@ -0,0 +1,3 @@
[mypy]
namespace_packages = True
warn_unreachable = True

63
util/gem5art/artifact/setup.py Executable file
View File

@@ -0,0 +1,63 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""A setuptools based setup module."""
from os.path import join
from pathlib import Path
from setuptools import setup, find_namespace_packages
with open(Path(__file__).parent / "README.md", encoding="utf-8") as f:
long_description = f.read()
setup(
name="gem5art-artifact",
version="1.4.0",
description="Artifacts for gem5art",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://www.gem5.org/",
author="Davis Architecture Research Group (DArchR)",
author_email="jlowepower@ucdavis.edu",
license="BSD",
classifiers=[
"Development Status :: 4 - Beta",
"License :: OSI Approved :: BSD License",
"Topic :: System :: Hardware",
"Intended Audience :: Science/Research",
"Programming Language :: Python :: 3",
],
keywords="simulation architecture gem5",
packages=find_namespace_packages(include=["gem5art.*"]),
install_requires=["pymongo"],
python_requires=">=3.6",
project_urls={
"Bug Reports": "https://gem5.atlassian.net/",
"Source": "https://gem5.googlesource.com/",
"Documentation": "https://www.gem5.org/documentation/gem5art",
},
)

View File

@@ -0,0 +1,25 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,243 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Tests for the Artifact object and associated functions"""
import hashlib
from pathlib import Path
import unittest
from uuid import uuid4, UUID
import sys
import io
from gem5art import artifact
from gem5art.artifact._artifactdb import ArtifactDB, getDBConnection
class MockDB(ArtifactDB):
"""
This is a Mock DB,
used to run unit tests
"""
def __init__(self, uri=""):
self.db = {}
self.hashes = {}
def put(self, key, metadata):
print("putting an entry in the mock database")
self.db[key] = metadata
self.hashes[metadata["hash"]] = key
def __contains__(self, key):
if isinstance(key, UUID):
return key in self.db.keys()
else:
# This is a hash
return key in self.hashes
def get(self, key):
if isinstance(key, UUID):
return self.db[key]
else:
# This is a hash
return self.db[self.hashes[key]]
def upload(self, key, path):
pass
def downloadFile(self, key, path):
pass
# Add the MockDB as a scheme
artifact._artifactdb._db_schemes["mockdb"] = MockDB
# This needs to be a global variable so
# that this getDBConnection is the first
# call to create a DB connection
_db = getDBConnection("mockdb://")
class TestGit(unittest.TestCase):
def test_keys(self):
git = artifact.artifact.getGit(Path("."))
self.assertSetEqual(
set(git.keys()), set(["origin", "hash", "name"]), "git keys wrong"
)
def test_origin(self):
git = artifact.artifact.getGit(Path("."))
self.assertTrue(
git["origin"].endswith("gem5art"), "Origin should end with gem5art"
)
class TestArtifact(unittest.TestCase):
def setUp(self):
self.artifact = artifact.Artifact(
{
"_id": uuid4(),
"name": "test-name",
"type": "test-type",
"documentation": (
"This is a long test documentation that has "
"lots of words"
),
"command": ["ls", "-l"],
"path": "/",
"hash": hashlib.md5().hexdigest(),
"git": artifact.artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
def test_dirs(self):
self.assertTrue(self.artifact.cwd.exists())
self.assertTrue(self.artifact.path.exists())
class TestArtifactSimilarity(unittest.TestCase):
def setUp(self):
self.artifactA = artifact.Artifact(
{
"_id": uuid4(),
"name": "artifact-A",
"type": "type-A",
"documentation": "This is a description of artifact A",
"command": ["ls", "-l"],
"path": "/",
"hash": hashlib.md5().hexdigest(),
"git": artifact.artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
self.artifactB = artifact.Artifact(
{
"_id": uuid4(),
"name": "artifact-B",
"type": "type-B",
"documentation": "This is a description of artifact B",
"command": ["ls", "-l"],
"path": "/",
"hash": hashlib.md5().hexdigest(),
"git": artifact.artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
self.artifactC = artifact.Artifact(
{
"_id": self.artifactA._id,
"name": "artifact-A",
"type": "type-A",
"documentation": "This is a description of artifact A",
"command": ["ls", "-l"],
"path": "/",
"hash": self.artifactA.hash,
"git": artifact.artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
self.artifactD = artifact.Artifact(
{
"_id": uuid4(),
"name": "artifact-A",
"type": "type-A",
"documentation": "This is a description of artifact A",
"command": ["ls", "-l"],
"path": "/",
"hash": hashlib.md5().hexdigest(),
"git": artifact.artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
def test_not_equal(self):
self.assertTrue(self.artifactA != self.artifactB)
def test_equal(self):
self.assertTrue(self.artifactA == self.artifactC)
def test_not_similar(self):
capturedOutput = io.StringIO()
sys.stdout = capturedOutput
self.artifactA._checkSimilar(self.artifactB)
sys.stdout = sys.__stdout__
self.assertTrue("WARNING:" in capturedOutput.getvalue())
def test_similar(self):
capturedOutput = io.StringIO()
sys.stdout = capturedOutput
self.artifactA._checkSimilar(self.artifactD)
sys.stdout = sys.__stdout__
self.assertFalse("WARNING:" in capturedOutput.getvalue())
class TestRegisterArtifact(unittest.TestCase):
def setUp(self):
# Create and register an artifact
self.testArtifactA = artifact.Artifact.registerArtifact(
name="artifact-A",
typ="type-A",
documentation="This is a description of artifact A",
command="ls -l",
path="./",
cwd="./",
)
# Create an artifact without pushing it to the database
self.testArtifactB = artifact.Artifact(
{
"_id": uuid4(),
"name": "artifact-B",
"type": "type-B",
"documentation": "This is a description of artifact B",
"command": ["vim test_artifact.py"],
"path": "./tests/test_artifact.py",
"hash": hashlib.md5().hexdigest(),
"git": artifact.artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
# test to see if an artifact is in the database
def test_in_database(self):
self.assertTrue(self.testArtifactA.hash in _db)
self.assertFalse(self.testArtifactB.hash in _db)
if __name__ == "__main__":
unittest.main()

BIN
util/gem5art/gem5art.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

383
util/gem5art/gem5art.svg Normal file
View File

@@ -0,0 +1,383 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
id="svg2"
sodipodi:docname="gem5art.svg"
viewBox="0 0 308.56216 333.39229"
sodipodi:version="0.32"
version="1.0"
inkscape:output_extension="org.inkscape.output.svg.inkscape"
inkscape:version="0.92.3 (2405546, 2018-03-11)"
width="308.56216"
height="333.3923"
inkscape:export-filename="/home/jlp/Code/gem5/gem5art/gem5art.png"
inkscape:export-xdpi="150"
inkscape:export-ydpi="150">
<defs
id="defs4">
<linearGradient
id="linearGradient2656">
<stop
id="stop2658"
style="stop-color:#480c00"
offset="0" />
<stop
id="stop2660"
style="stop-color:#a06400"
offset=".69328" />
<stop
id="stop2662"
style="stop-color:#ecd450"
offset=".72343" />
<stop
id="stop2664"
style="stop-color:#682c00"
offset=".79501" />
<stop
id="stop2666"
style="stop-color:#a47200"
offset=".82530" />
<stop
id="stop2668"
style="stop-color:#e4c844"
offset=".86175" />
<stop
id="stop2670"
style="stop-color:#783c00"
offset=".94375" />
<stop
id="stop2672"
style="stop-color:#e8cc48"
offset=".97130" />
<stop
id="stop2674"
style="stop-color:#480c00"
offset="1" />
</linearGradient>
<linearGradient
id="linearGradient9908">
<stop
id="stop9910"
style="stop-color:#480c00"
offset="0" />
<stop
id="stop9920"
style="stop-color:#743800"
offset=".69328" />
<stop
id="stop9912"
style="stop-color:#ecd450"
offset=".76570" />
<stop
id="stop9914"
style="stop-color:#a06400"
offset=".91558" />
<stop
id="stop9916"
style="stop-color:#e8cc48"
offset=".96003" />
<stop
id="stop9918"
style="stop-color:#480c00"
offset="1" />
</linearGradient>
<linearGradient
id="linearGradient2634"
y2="547.71997"
xlink:href="#linearGradient9908"
gradientUnits="userSpaceOnUse"
x2="-199.25"
gradientTransform="matrix(-0.314435,0,0,-3.5356,-7375.2,-2991.861)"
y1="547.71997"
x1="2231.6001"
inkscape:collect="always" />
<radialGradient
fx="0"
fy="0"
cx="0"
cy="0"
r="1"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(47.134947,0,0,47.134947,-12979.631,-6842.1551)"
spreadMethod="pad"
id="radialGradient130">
<stop
style="stop-opacity:1;stop-color:#35aad1"
offset="0"
id="stop126" />
<stop
style="stop-opacity:1;stop-color:#008eb0"
offset="1"
id="stop128" />
</radialGradient>
<radialGradient
fx="0"
fy="0"
cx="0"
cy="0"
r="1"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(47.127623,0,0,47.127623,-13053.605,-6797.9188)"
spreadMethod="pad"
id="radialGradient110">
<stop
style="stop-opacity:1;stop-color:#939598"
offset="0"
id="stop106" />
<stop
style="stop-opacity:1;stop-color:#77787b"
offset="1"
id="stop108" />
</radialGradient>
<linearGradient
inkscape:collect="always"
xlink:href="#linearGradient9908"
id="linearGradient1859"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(0,0.314435,-3.5356,0,-8168,-5727.15)"
x1="2231.6001"
y1="547.71997"
x2="-199.25"
y2="547.71997" />
<linearGradient
inkscape:collect="always"
xlink:href="#linearGradient9908"
id="linearGradient1876"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(0,0.314435,-3.5356,0,-7009.9,-5940.311)"
x1="2231.6001"
y1="547.71997"
x2="-199.25"
y2="547.71997" />
<linearGradient
inkscape:collect="always"
xlink:href="#linearGradient9908"
id="linearGradient1857-3"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(0.314435,0,0,3.5356,-9958.4,-6305.761)"
x1="2231.6001"
y1="547.71997"
x2="-199.25"
y2="547.71997" />
<linearGradient
inkscape:collect="always"
xlink:href="#linearGradient9908"
id="linearGradient1876-7"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(0,0.314435,-3.5356,0,-7009.9,-5940.311)"
x1="2231.6001"
y1="547.71997"
x2="-199.25"
y2="547.71997" />
<linearGradient
id="linearGradient2634-0"
y2="547.71997"
xlink:href="#linearGradient9908"
gradientUnits="userSpaceOnUse"
x2="-199.25"
gradientTransform="matrix(-0.314435,0,0,-3.5356,-7375.2,-2991.861)"
y1="547.71997"
x1="2231.6001"
inkscape:collect="always" />
<linearGradient
id="linearGradient2636-9"
y2="547.71997"
xlink:href="#linearGradient9908"
gradientUnits="userSpaceOnUse"
x2="-199.25"
gradientTransform="matrix(0,-0.314435,3.5356,0,-10324,-3357.25)"
y1="547.71997"
x1="2231.6001"
inkscape:collect="always" />
<radialGradient
fx="0"
fy="0"
cx="0"
cy="0"
r="1"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(47.127623,0,0,47.127623,-13053.605,-6797.9188)"
spreadMethod="pad"
id="radialGradient110-6">
<stop
style="stop-opacity:1;stop-color:#939598"
offset="0"
id="stop106-1" />
<stop
style="stop-opacity:1;stop-color:#77787b"
offset="1"
id="stop108-8" />
</radialGradient>
<radialGradient
fx="0"
fy="0"
cx="0"
cy="0"
r="1"
gradientUnits="userSpaceOnUse"
gradientTransform="matrix(47.134947,0,0,47.134947,-12979.631,-6842.1551)"
spreadMethod="pad"
id="radialGradient130-7">
<stop
style="stop-opacity:1;stop-color:#35aad1"
offset="0"
id="stop126-9" />
<stop
style="stop-opacity:1;stop-color:#008eb0"
offset="1"
id="stop128-2" />
</radialGradient>
</defs>
<sodipodi:namedview
id="base"
bordercolor="#666666"
inkscape:pageshadow="2"
inkscape:guide-bbox="true"
pagecolor="#ffffff"
inkscape:window-height="1025"
inkscape:zoom="1.5506342"
inkscape:window-x="0"
showgrid="false"
borderopacity="1.0"
inkscape:current-layer="layer1"
inkscape:cx="101.47976"
inkscape:cy="187.93026"
showguides="true"
inkscape:window-y="27"
inkscape:window-width="1920"
showborder="false"
inkscape:pageopacity="0.0"
inkscape:document-units="px"
inkscape:window-maximized="1"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0" />
<g
id="layer1"
inkscape:label="Ebene 1"
inkscape:groupmode="layer"
transform="translate(12764.604,6948.6392)">
<rect
style="fill:#f7f5e5;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:1.33417308;stroke-linecap:square;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal"
id="rect2098"
width="265.24234"
height="287.7782"
x="-12742.701"
y="-6926.0596" />
<g
id="g2648-3"
transform="matrix(0.11392363,0,0,0.12309744,-11622.953,-6209.6901)"
style="stroke-width:0.50787127">
<path
inkscape:connector-curvature="0"
d="m -10020.9,-6002.961 225,225.1 V -3519.7 l -225,225.039 z"
style="fill:url(#linearGradient1857-3);stroke-width:0.50787127"
sodipodi:nodetypes="ccccc"
id="rect2383-6" />
<path
inkscape:connector-curvature="0"
d="m -7312.7,-6002.961 -225,225.1 h -2258.2 l -225,-225.1 z"
style="fill:url(#linearGradient1876-7);stroke-width:0.50787127"
sodipodi:nodetypes="ccccc"
id="path2626-0" />
<path
inkscape:connector-curvature="0"
d="m -7312.7,-3294.661 -225,-225.039 v -2258.161 l 225,-225.1 z"
style="fill:url(#linearGradient2634-0);stroke-width:0.50787127"
sodipodi:nodetypes="ccccc"
id="path2640-6" />
<path
inkscape:connector-curvature="0"
d="m -10021.2,-3294.6 225.3,-225.1 h 2258.2 l 224.7,225.1 z"
style="fill:url(#linearGradient2636-9);stroke-width:0.50787127"
sodipodi:nodetypes="ccccc"
id="path2642-2" />
</g>
<g
transform="translate(406.2851,1.2898364)"
id="g2045-0">
<path
d="m -13088.844,-6689.1863 c -5.713,4.1067 -13.707,3.5987 -18.843,-1.54 -5.713,-5.716 -5.713,-14.984 0,-20.6867 5.708,-5.7186 14.963,-5.7186 20.684,0 l 4.259,-4.2693 c -8.061,-8.0733 -21.145,-8.0733 -29.22,0 -8.063,8.0707 -8.063,21.1533 0,29.224 6.26,6.2493 15.509,7.6373 23.12,4.2 v 7.32 h -22.375 l -6.082,6.0827 h 34.538 v -27.14 c 0,0 -1.598,2.6186 -4.24,5.2693 -0.574,0.5733 -1.201,1.0773 -1.841,1.54"
style="fill:#77787b;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:1.33333325"
id="path80-2"
inkscape:connector-curvature="0" />
<path
d="m -12944.367,-6712.9559 c 5.707,-4.0907 13.71,-3.584 18.84,1.5427 5.722,5.7186 5.722,14.98 0,20.696 -5.705,5.708 -14.97,5.708 -20.685,0 l -4.265,4.2586 c 8.074,8.0747 21.158,8.0747 29.221,0 8.068,-8.0586 8.068,-21.1386 0,-29.2146 -6.253,-6.2454 -15.507,-7.6374 -23.111,-4.1974 v -7.32 h 22.367 l 6.084,-6.0826 h -34.531 v 17.5786 9.5667 c 0,0 1.584,-2.6387 4.235,-5.2853 0.58,-0.564 1.197,-1.0707 1.845,-1.5427"
style="fill:#008eb0;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:1.33333325"
id="path84-3"
inkscape:connector-curvature="0" />
<path
d="m -13066.931,-6711.3582 c 5.677,-5.664 14.893,-5.664 20.58,0 1.238,1.2427 2.177,2.66 2.877,4.184 h -26.356 c 0.701,-1.524 1.657,-2.9413 2.899,-4.184 m 0,20.584 c -2.844,-2.8466 -4.263,-6.564 -4.263,-10.292 h 35.208 c 0,-2.0613 -0.312,-4.1213 -0.931,-6.108 -0.95,-3.1053 -2.656,-6.0373 -5.117,-8.508 -8.069,-8.0733 -21.151,-8.0733 -29.228,0 -8.064,8.084 -8.064,21.164 0,29.224 8.077,8.0734 21.159,8.0734 29.228,0 l -4.317,-4.316 c -5.687,5.6814 -14.903,5.6814 -20.58,0"
style="fill:#77787b;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:1.33333325"
id="path88-7"
inkscape:connector-curvature="0" />
<path
d="m -12977.472,-6721.7312 c -7.46,0 -13.972,3.9653 -17.614,9.888 -3.628,-5.9227 -10.146,-9.888 -17.606,-9.888 -4.252,0 -8.197,1.296 -11.485,3.4973 -2.48,1.656 -4.559,3.8467 -6.112,6.3907 v 30.7666 h 6.112 v -28.8866 c 2.665,-3.4347 6.812,-5.648 11.485,-5.648 8.04,0 14.547,6.5053 14.547,14.5453 v 19.9893 h 6.123 v -19.9893 c 0,-8.04 6.517,-14.5453 14.55,-14.5453 8.032,0 14.544,6.5053 14.544,14.5453 v 19.9893 h 6.122 v -19.9893 c 0,-11.4173 -9.256,-20.6653 -20.666,-20.6653"
style="fill:#008eb0;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:1.33333325"
id="path92-5"
inkscape:connector-curvature="0" />
<path
inkscape:connector-curvature="0"
id="path112-9"
style="fill:url(#radialGradient110-6);stroke:none;stroke-width:1.33333325"
d="m -13077.879,-6840.685 c -16.926,16.9187 -16.926,44.368 0,61.3014 v 0 c 13.115,13.1053 32.528,16.0106 48.482,8.8 v 0 15.3653 h -46.92 l -12.775,12.756 h 72.451 v -56.8907 l -10.948,10.9467 h -41.37 l 52.322,-52.3253 c -8.458,-8.428 -19.531,-12.6427 -30.602,-12.644 v 0 c -11.092,0 -22.18,4.2293 -30.64,12.6906" />
<path
inkscape:connector-curvature="0"
id="path132-2"
style="fill:url(#radialGradient130-7);stroke:none;stroke-width:1.33333325"
d="m -13016.573,-6897.6103 v 56.82 l 10.885,-10.8987 h 41.371 l -52.303,52.3014 c 16.931,16.924 44.36,16.9093 61.28,0 v 0 c 16.929,-16.9294 16.929,-44.38 0,-61.2987 v 0 c -13.115,-13.1093 -32.532,-16.0253 -48.483,-8.8133 v 0 -15.356 h 47.01 l 12.686,-12.7547 z" />
</g>
</g>
<metadata
id="metadata905">
<rdf:RDF>
<cc:Work>
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<cc:license
rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/" />
<dc:publisher>
<cc:Agent
rdf:about="http://openclipart.org/">
<dc:title>Openclipart</dc:title>
</cc:Agent>
</dc:publisher>
<dc:title></dc:title>
<dc:date>2008-09-22T14:36:23</dc:date>
<dc:description />
<dc:source>https://openclipart.org/detail/19331/gold-frames-set-by-chrisdesign-19331</dc:source>
<dc:creator>
<cc:Agent>
<dc:title>Chrisdesign</dc:title>
</cc:Agent>
</dc:creator>
<dc:subject>
<rdf:Bag>
<rdf:li>gold frame frames rahmen</rdf:li>
<rdf:li>how i did it</rdf:li>
</rdf:Bag>
</dc:subject>
</cc:Work>
<cc:License
rdf:about="http://creativecommons.org/publicdomain/zero/1.0/">
<cc:permits
rdf:resource="http://creativecommons.org/ns#Reproduction" />
<cc:permits
rdf:resource="http://creativecommons.org/ns#Distribution" />
<cc:permits
rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
</cc:License>
</rdf:RDF>
</metadata>
</svg>

After

Width:  |  Height:  |  Size: 14 KiB

183
util/gem5art/run/README.md Normal file
View File

@@ -0,0 +1,183 @@
# gem5art run package
This package contains Python objects to wrap gem5 runs/experiments.
Please cite the [gem5art paper](https://arch.cs.ucdavis.edu/papers/2021-3-28-gem5art) when using the gem5art packages.
This documentation can be found on the [gem5 website](https://www.gem5.org/documentation/gem5art/)
Each gem5 experiment is wrapped inside a run object.
These run objects contain all of the information required to execute the gem5 experiments and can optionally be executed via the gem5art tasks library (or manually with the `run()` function.).gem5Run interacts with the Artifact class of gem5art to ensure reproducibility of gem5 experiments and also stores the current gem5Run object and the output results in the database for later analysis.
## SE and FS mode runs
Next are two methods (for SE (system-emulation) and FS (full-system) modes of gem5) from gem5Run class which give an idea of the required arguments from a user's perspective to create a gem5Run object:
```python
@classmethod
def createSERun(cls,
name: str,
gem5_binary: str,
run_script: str,
outdir: str,
gem5_artifact: Artifact,
gem5_git_artifact: Artifact,
run_script_git_artifact: Artifact,
*params: str,
timeout: int = 60*15) -> 'gem5Run':
.......
@classmethod
def createFSRun(cls,
name: str,
gem5_binary: str,
run_script: str,
outdir: str,
gem5_artifact: Artifact,
gem5_git_artifact: Artifact,
run_script_git_artifact: Artifact,
linux_binary: str,
disk_image: str,
linux_binary_artifact: Artifact,
disk_image_artifact: Artifact,
*params: str,
timeout: int = 60*15) -> 'gem5Run':
.......
```
For the user it is important to understand different arguments passed to run objects:
- `name`: name of the run, can act as a tag to search the database to find the required runs (it is expected that user will use a unique name for different experiments)
- `gem5_binary`: path to the actual gem5 binary to be used
- `run_script`: path to the python run script that will be used with gem5 binary
- `outdir`: path to the directory where gem5 results should be written
- `gem5_artifact`: gem5 binary git artifact object
- `gem5_git_artifact`: gem5 source git repo artifact object
- `run_script_git_artifact`: run script artifact object
- `linux_binary` (only full-system): path to the actual linux binary to be used (used by run script as well)
- `disk_image` (only full-system): path to the actual disk image to be used (used by run script as well)
- `linux_binary_artifact` (only full-system): linux binary artifact object
- `disk_image_artifact` (only full-system): disk image artifact object
- `params`: other params to be passed to the run script
- `timeout`: longest time in seconds for which the current gem5 job is allowed to execute
The artifact parameters (`gem5_artifact`, `gem5_git_artifact`, and `run_script_git_artifact`) are used to ensure this is reproducible run.
Apart from the above mentioned parameters, gem5Run class also keeps track of other features of a gem5 run e.g., the start time, the end time, the current status of gem5 run, the kill reason (if the run is finished), etc.
While the user can write their own run script to use with gem5 (with any command line arguments), currently when a `gem5Run` object is created for a full-system experiment using `createFSRun` method, it is assumed that the path to the `linux_binary` and `disk_image` is passed to the run script on the command line (as arguments of the `createFSRun` method).
## Running an experiment
The `gem5Run` object has everything needed to run one gem5 execution.
Normally, this will be performed by using the gem5art *tasks* package.
However, it is also possible to manually execute a gem5 run.
The `run` function executes the gem5 experiment.
It takes two optional parameters: a task associated with the run for bookkeeping and an optional directory to execute the run in.
The `run` function executes the gem5 binary by using `Popen`.
This creates another process to execute gem5.
The `run` function is *blocking* and does not return until the child process has completed.
While the child process is running, every 5 seconds the parent python process will update the status in the `info.json` file.
The `info.json` file is the serialized `gem5run` object which contains all of the run information and the current status.
`gem5Run` objects have 7 possible status states.
These are currently simple strings stored in the `status` property.
- `Created`: The run has been created. This is set in the constructor when either `createSRRun` or `createFSRun` is called.
- `Begin run`: When `run()` is called, after the database is checked, we enter the `Begin run` state.
- `Failed artifact check for ...`: The status is set to this when the artifact check fails
- `Spawning`: Next, just before `Popen` is called, the run enters the `Spawning` state
- `Running`: Once the parent process begins spinning waiting for the child to finish, the run enters the `Running` state.
- `Finished`: When the child finished with exit code `0`, the run enters the `Finished` state.
- `Failed`: When the child finished with a non-zero exit code, the run enters the `Failed` state.
## Run Already in the Database
When starting a run with gem5art, it might complain that the run already exists in the database.
Basically, before launching a gem5 job, gem5art checks if this run matches an existing run in the database.
In order to uniquely identify a run, a single hash is made out of:
- the runscript
- the parameters passed to the runscript
- the artifacts of the run object which, for an SE run, include: gem5 binary artifact, gem5 source git artifact, run script (experiments repo) artifact. For an FS run, the list of artifacts also include linux binary artifact and disk image artifacts in addition to the artifacts of an SE run.
If this hash already exists in the database, gem5art will not launch a new job based on this run object as a run with same parameters would have already been executed.
In case, user still wants to launch this job, the user will have to remove the existing run object from the database.
## Searching the Database to find Runs
### Utility script
gem5art provides the utility `gem5art-getruns` to search the database and retrieve runs.
Based on the parameters, `gem5art-getruns` will dump the results into a file in the json format.
```
usage: gem5art-getruns [-h] [--fs-only] [--limit LIMIT] [--db-uri DB_URI]
[-s SEARCH_NAME]
filename
Dump all runs from the database into a json file
positional arguments:
filename Output file name
optional arguments:
-h, --help show this help message and exit
--fs-only Only output FS runs
--limit LIMIT Limit of the number of runs to return. Default: all
--db-uri DB_URI The database to connect to. Default
mongodb://localhost:27017
-s SEARCH_NAME, --search_name SEARCH_NAME
Query for the name field
```
### Manually searching the database
Once you start running the experiments with gem5 and want to know the status of those runs, you can look at the gem5Run artifacts in the database.
For this purpose, gem5art provides a method `getRuns`, which you can use as follows:
```python
import gem5art.run
from gem5art.artifact import getDBConnection
db = getDBConnection()
for i in gem5art.run.getRuns(db, fs_only=False, limit=100):
print(i)
```
The documentation on [getRuns](run.html#gem5art.run.getRuns) is available at the bottom of this page.
## Searching the Database to find Runs with Specific Names
As discussed above, while creating a FS or SE mode Run object, the user has to pass a name field to recognize
a particular set of runs (or experiments).
We expect that the user will take care to use a name string which fully characterizes a set of experiments and can be thought of as a `Nonce`.
For example, if we are running experiments to test linux kernel boot on gem5, we can use a name field `boot_tests_v1` or `boot_tests_[month_year]` (where mont_year correspond to the month and year when the experiments were run).
Later on, the same name can be used to search for relevant gem5 runs in the database.
For this purpose, gem5art provides a method `getRunsByName`, which can be used as follow:
```python
import gem5art.run
from gem5art.artifact import getDBConnection
db = getDBConnection()
for i in gem5art.run.getRunsByName(db, name='boot_tests_v1', fs_only=True, limit=100):
print(i)
```
The documentation on `getRunsByName` is available [here](run.html#gem5art.run.getRunsByName).
## Runs API Documentation
```eval_rst
Run
---
.. automodule:: gem5art.run
:members:
:undoc-members:
```

View File

@@ -0,0 +1,88 @@
#! /usr/bin/env python3
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""This is a simple script to dump gem5 runs into a json file.
This file simply wraps the getRuns function from gem5art.run.
"""
from argparse import ArgumentParser
from json import dump
import gem5art.artifact
from gem5art.artifact import getDBConnection
from gem5art.run import getRunsByNameLike, getRuns
def parseArgs():
parser = ArgumentParser(
description="Dump all runs from the database into a json file"
)
default_db_uri = gem5art.artifact._artifactdb._default_uri
parser.add_argument("filename", help="Output file name")
parser.add_argument(
"--fs-only",
action="store_true",
default=False,
help="Only output FS runs",
)
parser.add_argument(
"--limit",
type=int,
default=0,
help="Limit of the number of runs to return. Default: all",
)
parser.add_argument(
"--db-uri",
default=default_db_uri,
help=f"The database to connect to. Default {default_db_uri}",
)
parser.add_argument(
"-s", "--search_name", help="Query for the name field", default=""
)
return parser.parse_args()
if __name__ == "__main__":
args = parseArgs()
db = getDBConnection(args.db_uri)
with open(args.filename, "w") as f:
if args.search_name:
runs = getRunsByNameLike(
db, args.search_name, args.fs_only, args.limit
)
else:
runs = getRuns(db, args.fs_only, args.limit)
to_dump = [run._convertForJson(run._getSerializable()) for run in runs]
dump(to_dump, f, indent=2)

View File

@@ -0,0 +1,618 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
This file defines a gem5Run object which contains all information needed to
run a single gem5 test.
This class works closely with the artifact module to ensure that the gem5
experiment is reproducible and the output is saved to the database.
"""
import hashlib
import json
import os
from pathlib import Path
import signal
import subprocess
import time
from typing import Any, Dict, Iterable, List, Optional, Tuple, Union
from uuid import UUID, uuid4
import zipfile
from gem5art import artifact
from gem5art.artifact import Artifact
from gem5art.artifact._artifactdb import ArtifactDB
class gem5Run:
"""
This class holds all of the info required to run gem5.
"""
_id: UUID
hash: str
type: str
name: str
gem5_binary: Path
run_script: Path
gem5_artifact: Artifact
gem5_git_artifact: Artifact
run_script_git_artifact: Artifact
params: Tuple[str, ...]
timeout: int
gem5_name: str
script_name: str
linux_name: str
disk_name: str
string: str
outdir: Path
linux_binary: Path
disk_image: Path
linux_binary_artifact: Artifact
disk_image_artifact: Artifact
command: List[str]
running: bool
enqueue_time: float
start_time: float
end_time: float
return_code: int
kill_reason: str
status: str
pid: int
task_id: Any
results: Optional[Artifact]
artifacts: List[Artifact]
@classmethod
def _create(
cls,
name: str,
gem5_binary: Path,
run_script: Path,
outdir: Path,
gem5_artifact: Artifact,
gem5_git_artifact: Artifact,
run_script_git_artifact: Artifact,
params: Tuple[str, ...],
timeout: int,
) -> "gem5Run":
"""
Shared code between SE and FS when creating a run object.
"""
run = cls()
run.name = name
run.gem5_binary = gem5_binary
run.run_script = run_script
run.gem5_artifact = gem5_artifact
run.gem5_git_artifact = gem5_git_artifact
run.run_script_git_artifact = run_script_git_artifact
run.params = params
run.timeout = timeout
run._id = uuid4()
run.outdir = outdir.resolve() # ensure this is absolute
# Assumes **/<gem5_name>/gem5.<anything>
run.gem5_name = run.gem5_binary.parent.name
# Assumes **/<script_name>.py
run.script_name = run.run_script.stem
# Info about the actual run
run.running = False
run.enqueue_time = time.time()
run.start_time = 0.0
run.end_time = 0.0
run.return_code = 0
run.kill_reason = ""
run.status = "Created"
run.pid = 0
run.task_id = None
# Initially, there are no results
run.results = None
return run
@classmethod
def createSERun(
cls,
name: str,
gem5_binary: str,
run_script: str,
outdir: str,
gem5_artifact: Artifact,
gem5_git_artifact: Artifact,
run_script_git_artifact: Artifact,
*params: str,
timeout: int = 60 * 15,
) -> "gem5Run":
"""
name is the name of the run. The name is not necessarily unique. The
name could be used to query the results of the run.
gem5_binary and run_script are the paths to the binary to run
and the script to pass to gem5. Full paths are better.
The artifact parameters (gem5_artifact, gem5_git_artifact, and
run_script_git_artifact) are used to ensure this is reproducible run.
Further parameters can be passed via extra arguments. These
parameters will be passed in order to the gem5 run script.
timeout is the time in seconds to run the subprocess before killing it.
Note: When instantiating this class for the first time, it will create
a file `info.json` in the outdir which contains a serialized version
of this class.
"""
run = cls._create(
name,
Path(gem5_binary),
Path(run_script),
Path(outdir),
gem5_artifact,
gem5_git_artifact,
run_script_git_artifact,
params,
timeout,
)
run.artifacts = [
gem5_artifact,
gem5_git_artifact,
run_script_git_artifact,
]
run.string = f"{run.gem5_name} {run.script_name}"
run.string += " ".join(run.params)
run.command = [
str(run.gem5_binary),
"-re",
f"--outdir={run.outdir}",
str(run.run_script),
]
run.command += list(params)
run.hash = run._getHash()
run.type = "gem5 run"
# Make the directory if it doesn't exist
os.makedirs(run.outdir, exist_ok=True)
run.dumpJson("info.json")
return run
@classmethod
def createFSRun(
cls,
name: str,
gem5_binary: str,
run_script: str,
outdir: str,
gem5_artifact: Artifact,
gem5_git_artifact: Artifact,
run_script_git_artifact: Artifact,
linux_binary: str,
disk_image: str,
linux_binary_artifact: Artifact,
disk_image_artifact: Artifact,
*params: str,
timeout: int = 60 * 15,
) -> "gem5Run":
"""
name is the name of the run. The name is not necessarily unique. The
name could be used to query the results of the run.
gem5_binary and run_script are the paths to the binary to run
and the script to pass to gem5.
The linux_binary is the kernel to run and the disk_image is the path
to the disk image to use.
Further parameters can be passed via extra arguments. These
parameters will be passed in order to the gem5 run script.
Note: When instantiating this class for the first time, it will create
a file `info.json` in the outdir which contains a serialized version
of this class.
"""
run = cls._create(
name,
Path(gem5_binary),
Path(run_script),
Path(outdir),
gem5_artifact,
gem5_git_artifact,
run_script_git_artifact,
params,
timeout,
)
run.linux_binary = Path(linux_binary)
run.disk_image = Path(disk_image)
run.linux_binary_artifact = linux_binary_artifact
run.disk_image_artifact = disk_image_artifact
# Assumes **/<linux_name>
run.linux_name = run.linux_binary.name
# Assumes **/<disk_name>
run.disk_name = run.disk_image.name
run.artifacts = [
gem5_artifact,
gem5_git_artifact,
run_script_git_artifact,
linux_binary_artifact,
disk_image_artifact,
]
run.string = f"{run.gem5_name} {run.script_name} "
run.string += f"{run.linux_name} {run.disk_name} "
run.string += " ".join(run.params)
run.command = [
str(run.gem5_binary),
"-re",
f"--outdir={run.outdir}",
str(run.run_script),
str(run.linux_binary),
str(run.disk_image),
]
run.command += list(params)
run.hash = run._getHash()
run.type = "gem5 run fs"
# Make the directory if it doesn't exist
os.makedirs(run.outdir, exist_ok=True)
run.dumpJson("info.json")
return run
@classmethod
def loadJson(cls, filename: str) -> "gem5Run":
with open(filename) as f:
d = json.load(f)
# Convert string version of UUID to UUID object
for k, v in d.iteritems():
if k.endswith("_artifact"):
d[k] = UUID(v)
d["_id"] = UUID(d["_id"])
try:
return cls.loadFromDict(d)
except KeyError:
print("Incompatible json file: {}!".format(filename))
raise
@classmethod
def loadFromDict(cls, d: Dict[str, Union[str, UUID]]) -> "gem5Run":
"""Returns new gem5Run instance from the dictionary of values in d"""
run = cls()
run.artifacts = []
for k, v in d.items():
if isinstance(v, UUID) and k != "_id":
a = Artifact(v)
setattr(run, k, a)
run.artifacts.append(a)
else:
setattr(run, k, v)
return run
def checkArtifacts(self, cwd: str) -> bool:
"""Checks to make sure all of the artifacts are up to date
This should happen just before running gem5. This function will return
False if the artifacts don't check and true if they are all the same.
For the git repos, this checks the git hash, for binary artifacts this
checks the md5 hash.
"""
for v in self.artifacts:
if v.type == "git repo":
new = artifact.artifact.getGit(cwd / v.path)["hash"]
old = v.git["hash"]
else:
new = artifact.artifact.getHash(cwd / v.path)
old = v.hash
if new != old:
self.status = f"Failed artifact check for {cwd / v.path}"
return False
return True
def __repr__(self) -> str:
return str(self._getSerializable())
def checkKernelPanic(self) -> bool:
"""
Returns true if the gem5 instance specified in args has a kernel panic
Note: this gets around the problem that gem5 doesn't exit on panics.
"""
term_path = self.outdir / "system.pc.com_1.device"
if not term_path.exists():
return False
with open(term_path, "rb") as f:
try:
f.seek(-1000, os.SEEK_END)
except OSError:
return False
try:
# There was a case where reading `term_path` resulted in a
# UnicodeDecodeError. It is known that the terminal output
# (content of 'system.pc.com_1.device') is written from a
# buffer from gem5, and when gem5 stops, the content of the
# buffer is stopped being copied to the file. The buffer is
# not flushed as well. So, it might be a case that the content
# of the `term_path` is corrupted as a Unicode character could
# be longer than a byte.
last = f.readlines()[-1].decode()
if "Kernel panic" in last:
return True
else:
return False
except UnicodeDecodeError:
return False
def _getSerializable(self) -> Dict[str, Union[str, UUID]]:
"""Returns a dictionary that can be used to recreate this object
Note: All artifacts are converted to a UUID instead of an Artifact.
"""
# Grab all of the member variables
d = vars(self).copy()
# Remove list of artifacts
del d["artifacts"]
# Replace the artifacts with their UUIDs
for k, v in d.items():
if isinstance(v, Artifact):
d[k] = v._id
if isinstance(v, Path):
d[k] = str(v)
return d
def _getHash(self) -> str:
"""Return a single value that uniquely identifies this run
To uniquely identify this run, the gem5 binary, gem5 scripts, and
parameters should all match. Thus, let's make a single hash out of the
artifacts + the runscript + parameters
"""
to_hash = [art._id.bytes for art in self.artifacts]
to_hash.append(str(self.run_script).encode())
to_hash.append(" ".join(self.params).encode())
return hashlib.md5(b"".join(to_hash)).hexdigest()
@classmethod
def _convertForJson(cls, d: Dict[str, Any]) -> Dict[str, str]:
"""Converts UUID objects to strings for json compatibility"""
for k, v in d.items():
if isinstance(v, UUID):
d[k] = str(v)
return d
def dumpJson(self, filename: str) -> None:
"""Dump all info into a json file"""
d = self._convertForJson(self._getSerializable())
with open(self.outdir / filename, "w") as f:
json.dump(d, f)
def dumpsJson(self) -> str:
"""Like dumpJson except returns string"""
d = self._convertForJson(self._getSerializable())
return json.dumps(d)
def run(self, task: Any = None, cwd: str = ".") -> None:
"""Actually run the test.
Calls Popen with the command to fork a new process.
Then, this function polls the process every 5 seconds to check if it
has finished or not. Each time it checks, it dumps the json info so
other applications can poll those files.
task is the celery task that is running this gem5 instance.
cwd is the directory to change to before running. This allows a server
process to run in a different directory than the running process. Note
that only the spawned process runs in the new directory.
"""
# Check if the run is already in the database
db = artifact.getDBConnection()
if self.hash in db:
print(f"Error: Have already run {self.command}. Exiting!")
return
self.status = "Begin run"
self.dumpJson("info.json")
if not self.checkArtifacts(cwd):
self.dumpJson("info.json")
return
self.status = "Spawning"
self.start_time = time.time()
self.task_id = task.request.id if task else None
self.dumpJson("info.json")
# Start running the gem5 command
proc = subprocess.Popen(self.command, cwd=cwd)
# Register handler in case this process is killed while the gem5
# instance is running. Note: there's a bit of a race condition here,
# but hopefully it's not a big deal
def handler(signum, frame):
proc.kill()
self.kill_reason = "sigterm"
self.dumpJson("info.json")
# Note: We'll fall out of the while loop after this.
# This makes it so if you term *this* process, it will actually kill
# the subprocess and then this process will die.
signal.signal(signal.SIGTERM, handler)
# Do this until the subprocess is done (successfully or not)
while proc.poll() is None:
self.status = "Running"
# Still running
self.current_time = time.time()
self.pid = proc.pid
self.running = True
if self.current_time - self.start_time > self.timeout:
proc.kill()
self.kill_reason = "timeout"
if self.checkKernelPanic():
proc.kill()
self.kill_reason = "kernel panic"
self.dumpJson("info.json")
# Check again in five seconds
time.sleep(5)
print("Done running {}".format(" ".join(self.command)))
# Done executing
self.running = False
self.end_time = time.time()
self.return_code = proc.returncode
if self.return_code == 0:
self.status = "Finished"
else:
self.status = "Failed"
self.dumpJson("info.json")
self.saveResults()
# Store current gem5 run in the database
db.put(self._id, self._getSerializable())
print("Done storing the results of {}".format(" ".join(self.command)))
def saveResults(self) -> None:
"""Zip up the output directory and store the results in the
database."""
with zipfile.ZipFile(
self.outdir / "results.zip", "w", zipfile.ZIP_DEFLATED
) as zipf:
for path in self.outdir.glob("**/*"):
if path.name == "results.zip":
continue
zipf.write(path, path.relative_to(self.outdir.parent))
self.results = Artifact.registerArtifact(
command=f"zip results.zip -r {self.outdir}",
name=self.name,
typ="directory",
path=self.outdir / "results.zip",
cwd="./",
documentation="Compressed version of the results directory",
)
def __str__(self) -> str:
return self.string + " -> " + self.status
def getRuns(
db: ArtifactDB, fs_only: bool = False, limit: int = 0
) -> Iterable[gem5Run]:
"""Returns a generator of gem5Run objects.
If fs_only is True, then only full system runs will be returned.
Limit specifies the maximum number of runs to return.
"""
if not fs_only:
runs = db.searchByType("gem5 run", limit=limit)
for run in runs:
yield gem5Run.loadFromDict(run)
fsruns = db.searchByType("gem5 run fs", limit=limit)
for run in fsruns:
yield gem5Run.loadFromDict(run)
def getRunsByName(
db: ArtifactDB, name: str, fs_only: bool = False, limit: int = 0
) -> Iterable[gem5Run]:
"""Returns a generator of gem5Run objects, which have the field "name"
**exactly** the same as the name parameter. The name used in this query
is case sensitive.
If fs_only is True, then only full system runs will be returned.
Limit specifies the maximum number of runs to return.
"""
if not fs_only:
seruns = db.searchByNameType(name, "gem5 run", limit=limit)
for run in seruns:
yield gem5Run.loadFromDict(run)
fsruns = db.searchByNameType(name, "gem5 run fs", limit=limit)
for run in fsruns:
yield gem5Run.loadFromDict(run)
def getRunsByNameLike(
db: ArtifactDB, name: str, fs_only: bool = False, limit: int = 0
) -> Iterable[gem5Run]:
"""Return a generator of gem5Run objects, which have the field "name"
containing the name parameter as a substring. The name used in this
query is case sensitive.
If fs_only is True, then only full system runs will be returned.
Limit specifies the maximum number of runs to return.
"""
if not fs_only:
seruns = db.searchByLikeNameType(name, "gem5 run", limit=limit)
for run in seruns:
yield gem5Run.loadFromDict(run)
fsruns = db.searchByLikeNameType(name, "gem5 run fs", limit=limit)
for run in fsruns:
yield gem5Run.loadFromDict(run)

View File

@@ -0,0 +1,4 @@
[mypy]
namespace_packages = True
warn_unreachable = True
mypy_path = ../artifact

66
util/gem5art/run/setup.py Executable file
View File

@@ -0,0 +1,66 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""A setuptools based setup module."""
from os.path import join
from pathlib import Path
from setuptools import setup, find_namespace_packages
with open(Path(__file__).parent / "README.md", encoding="utf-8") as f:
long_description = f.read()
setup(
name="gem5art-run",
version="1.4.0",
description="A collection of utilities for running gem5",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://www.gem5.org/",
author="Davis Architecture Research Group (DArchR)",
author_email="jlowepower@ucdavis.edu",
license="BSD",
classifiers=[
"Development Status :: 4 - Beta",
"License :: OSI Approved :: BSD License",
"Topic :: System :: Hardware",
"Intended Audience :: Science/Research",
"Programming Language :: Python :: 3",
],
keywords="simulation architecture gem5",
packages=find_namespace_packages(),
install_requires=["gem5art-artifact"],
python_requires=">=3.6",
project_urls={
"Bug Reports": "https://gem5.atlassian.net/",
"Source": "https://gem5.googlesource.com/",
"Documentation": "https://www.gem5.org/documentation/gem5art",
},
scripts=[
"bin/gem5art-getruns",
],
)

View File

@@ -0,0 +1,25 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,125 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Tests for gem5Run object"""
import hashlib
from pathlib import Path
import os
import unittest
from uuid import uuid4
from gem5art.artifact import artifact
from gem5art.run import gem5Run
class TestSERun(unittest.TestCase):
def setUp(self):
self.gem5art = artifact.Artifact(
{
"_id": uuid4(),
"name": "test-gem5",
"type": "test-binary",
"documentation": "This is a description of gem5 artifact",
"command": "scons build/X86/gem5.opt",
"path": "/",
"hash": hashlib.md5().hexdigest(),
"git": artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
self.gem5gitart = artifact.Artifact(
{
"_id": uuid4(),
"name": "test-gem5-git",
"type": "test-git",
"documentation": "This is a description of gem5 git artifact",
"command": "git clone something",
"path": "/",
"hash": hashlib.md5().hexdigest(),
"git": artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
self.runscptart = artifact.Artifact(
{
"_id": uuid4(),
"name": "test-runscript",
"type": "test-git",
"documentation": "This is a description of runscript aritfact",
"command": "git clone something",
"path": "/",
"hash": hashlib.md5().hexdigest(),
"git": artifact.getGit(Path(".")),
"cwd": "/",
"inputs": [],
}
)
self.run = gem5Run.createSERun(
"test SE run",
"gem5/build/X86/gem5.opt",
"configs-tests/run_test.py",
"results/run_test/out",
self.gem5art,
self.gem5gitart,
self.runscptart,
"extra",
"params",
)
def test_out_dir(self):
relative_outdir = "results/run_test/out"
self.assertEqual(
self.run.outdir.relative_to(Path(".").resolve()),
Path(relative_outdir),
)
self.assertTrue(
self.run.outdir.is_absolute(),
"outdir should be absolute directory",
)
def test_command(self):
self.assertEqual(
self.run.command,
[
"gem5/build/X86/gem5.opt",
"-re",
"--outdir={}".format(os.path.abspath("results/run_test/out")),
"configs-tests/run_test.py",
"extra",
"params",
],
)
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,82 @@
# gem5art tasks package
This package contains two parallel task libraries for running gem5 experiments.
he actual gem5 experiment can be executed with the help of [Python multiprocessing support](https://docs.python.org/3/library/multiprocessing.html), [Celery](http://www.celeryproject.org/) or even without using any job manager (a job can be directly launched by calling `run()` function of gem5Run object).
This package implicitly depends on the gem5art run package.
Please cite the [gem5art paper](https://arch.cs.ucdavis.edu/papers/2021-3-28-gem5art) when using the gem5art packages.
This documentation can be found on the [gem5 website](https://www.gem5.org/documentation/gem5art/)
## Use of Python Multiprocessing
This is a simple way to run gem5 jobs using Python multiprocessing library.
You can use the following function in your job launch script to execute gem5art run objects:
```python
run_job_pool([a list containing all run objects you want to execute], num_parallel_jobs = [Number of parallel jobs you want to run])
```
## Use of Celery
Celery server can run many gem5 tasks asynchronously.
Once a user creates a gem5Run object (discussed previously) while using gem5art, this object needs to be passed to a method `run_gem5_instance()` registered with Celery app, which is responsible for starting a Celery task to run gem5. The other argument needed by the `run_gem5_instance()` is the current working directory.
Celery server can be started with the following command:
```sh
celery -E -A gem5art.tasks.celery worker --autoscale=[number of workers],0
```
This will start a server with events enabled that will accept gem5 tasks as defined in gem5art.
It will autoscale from 0 to desired number of workers.
Celery relies on a message broker `RabbitMQ` for communication between the client and workers.
If not already installed, you need to install `RabbitMQ` on your system (before running celery) using:
```sh
apt-get install rabbitmq-server
```
### Monitoring Celery
Celery does not explicitly show the status of the runs by default.
[flower](https://flower.readthedocs.io/en/latest/), a Python package, is a web-based tool for monitoring and administrating Celery.
To install the flower package,
```sh
pip install flower
```
You can monitor the celery cluster doing the following:
```sh
flower -A gem5art.tasks.celery --port=5555
```
This will start a webserver on port 5555.
### Removing all tasks
```sh
celery -A gem5art.tasks.celery purge
```
### Viewing state of all jobs in celery
```sh
celery -A gem5art.tasks.celery events
```
## Tasks API Documentation
```eval_rst
Task
----
.. automodule:: gem5art.tasks.tasks
:members:
:undoc-members:
.. automodule:: gem5art.tasks.celery
:members:
:undoc-members:
```

View File

@@ -0,0 +1,27 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""This is a set of utilities for using celery to run gem5 experiments"""

View File

@@ -0,0 +1,40 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
from celery import Celery # type: ignore
# Create a celery server. If you run celery with this file, it will start a
# server that will accept tasks specified by the "run" below.
gem5app = Celery(
"gem5",
backend="rpc",
broker="amqp://localhost",
include=["gem5art.tasks.tasks"],
)
gem5app.conf.update(accept_content=["pickle", "json"])
if __name__ == "__main__":
gem5app.start()

View File

@@ -0,0 +1,65 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
from .celery import gem5app
import multiprocessing as mp
import time
@gem5app.task(bind=True, serializer="pickle")
def run_gem5_instance(self, gem5_run, cwd="."):
"""
Runs a gem5 instance with the script and any parameters to the script.
Note: this is "bound" which means self is the task that is running this.
"""
gem5_run.run(self, cwd=cwd)
def run_single_job(run):
start_time = time.time()
print(f"Running {' '.join(run.command)} at {time.time()}")
run.run()
finish_time = time.time()
print(
f"Finished {' '.join(run.command)} at {time.time()}. "
f"Total time = {finish_time - start_time}"
)
def run_job_pool(job_list, num_parallel_jobs=mp.cpu_count() // 2):
"""
Runs gem5 jobs in parallel when Celery is not used.
Creates as many parallel jobs as core count if no explicit
job count is provided
Receives a list of run objects created by the launch script
"""
pool = mp.Pool(num_parallel_jobs)
pool.map(run_single_job, job_list)
pool.close()
pool.join()
print(f"All jobs done running!")

View File

@@ -0,0 +1,4 @@
[mypy]
namespace_packages = True
warn_unreachable = True
mypy_path = ../artifact

66
util/gem5art/tasks/setup.py Executable file
View File

@@ -0,0 +1,66 @@
# Copyright (c) 2019, 2021 The Regents of the University of California
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""A setuptools based setup module."""
from os.path import join
from pathlib import Path
from setuptools import setup, find_namespace_packages
with open(Path(__file__).parent / "README.md", encoding="utf-8") as f:
long_description = f.read()
setup(
name="gem5art-tasks",
version="1.4.0",
description="A celery app for gem5art",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://www.gem5.org/",
author="Davis Architecture Research Group (DArchR)",
author_email="jlowepower@ucdavis.edu",
license="BSD",
classifiers=[
"Development Status :: 4 - Beta",
"License :: OSI Approved :: BSD License",
"Topic :: System :: Hardware",
"Intended Audience :: Science/Research",
"Programming Language :: Python :: 3",
],
keywords="simulation architecture gem5",
packages=find_namespace_packages(include=["gem5art.*"]),
install_requires=["celery"],
extras_require={
"flower": ["flower"],
},
python_requires=">=3.6",
project_urls={
"Bug Reports": "https://gem5.atlassian.net/",
"Source": "https://gem5.googlesource.com/",
"Documentation": "https://www.gem5.org/documentation/gem5art",
},
)