misc,tests: Increase Weekly and Daily GPU test timeout (#1628)

The Weekly GPU tests are failing due to a timeout, but I found the
testing timeout was set to 5 hours, and we have been frequently close to
reaching this but have recently changed the test enough to consistently
go over.

 The main two things that appear to have caused this are:

~~1. Moving the X86_VEGA compilation into the same step as the running
of the tests.~~ (I take this back, the timeout is per-job, it shouldn't
matter how stuff is deivided among steps in the job. However, keeping it
separate does no harm and merging the two steps did coincide with
failures occurring. I'll play it safe for now_.
2. Reducing the number of threads per GitHub Actions runner, thus
slowing job execution.

In addition, we've added more tests to this weekly GPU suite, though I
don't believe we have got to running these tests yet. The timeout
appears to always have been triggered before this.

This PR increases the timeout to 3 days and moves the compilation into a
separate step.

**Update: Same changes done for Daily tests too as it appears to be the
same problem.
This commit is contained in:
Bobby R. Bruce
2024-10-04 07:41:17 -07:00
committed by GitHub
2 changed files with 12 additions and 4 deletions

View File

@@ -112,7 +112,7 @@ jobs:
gpu-tests:
runs-on: [self-hosted, linux, x64]
container: ghcr.io/gem5/gcn-gpu:latest
timeout-minutes: 300
timeout-minutes: 720 # 12 hours
steps:
- uses: actions/checkout@v4
@@ -127,9 +127,13 @@ jobs:
restore-keys: |
testlib-build-vega
- name: Build VEGA_X86/gem5.opt
working-directory: ${{ github.workspace }}
run: scons build/VEGA_X86/gem5.opt -j $(nproc)
- name: Run Testlib GPU Tests
working-directory: ${{ github.workspace }}/tests
run: ./main.py run --length=long -vvv -t $(nproc) -j $(nproc) --host gcn_gpu gem5/gpu
run: ./main.py run --length=long -vvv --skip-build -t $(nproc) --host gcn_gpu gem5/gpu
- name: Upload results
if: success() || failure()

View File

@@ -48,7 +48,7 @@ jobs:
gpu-tests:
runs-on: [self-hosted, linux, x64]
container: ghcr.io/gem5/gcn-gpu:latest
timeout-minutes: 300
timeout-minutes: 4320 # 3 days
steps:
- uses: actions/checkout@v4
@@ -63,9 +63,13 @@ jobs:
restore-keys: |
testlib-build-vega
- name: Build VEGA_X86/gem5.opt
working-directory: ${{ github.workspace }}
run: scons build/VEGA_X86/gem5.opt -j $(nproc)
- name: Run Testlib GPU Tests
working-directory: ${{ github.workspace }}/tests
run: ./main.py run --length=very-long -vvv -j $(nproc) -t $(nproc) --host gcn_gpu gem5/gpu
run: ./main.py run --length=very-long -vvv --skip-build -t $(nproc) --host gcn_gpu gem5/gpu
- name: Upload results
if: success() || failure()