Running a Project

Local

Running a project file is straightforward. Call the buildstock_local command line tool as follows:

$ buildstock_local --help
...
usage: buildstock_local [-h] [-j J] [-m]
                        [--postprocessonly | --uploadonly | --continue_upload | --validateonly | --samplingonly]
                        project_filename

positional arguments:
  project_filename

options:
  -h, --help           show this help message and exit
  -j J                 Number of parallel simulations. Default: all cores.
  -m, --measures_only  Only apply the measures, but don't run simulations.
                       Useful for debugging.
  --postprocessonly    Only do postprocessing, useful for when the simulations
                       are already done
  --uploadonly         Only upload to S3, useful when postprocessing is
                       already done. Ignores the upload flag in yaml. Errors
                       out if files already exists in s3
  --continue_upload    Continue uploading to S3, useful when previous upload
                       was interrupted.
  --validateonly       Only validate the project YAML file and references.
                       Nothing is executed
  --samplingonly       Run the sampling only.

Warning

In general, you should omit the -j argument, which will use all the cpus you made available to docker. Setting the -j flag for a number greater than the number of CPUs you made available in Docker will cause the simulations to run slower as the concurrent simulations will compete for CPUs.

Warning

Running the simulation with --postprocessonly when there is already postprocessed results from previous run will overwrite those results.

Running on NREL HPC (Kestrel)

After you have activated the appropriate conda environment on Kestrel, you can submit a project file to be simulated by passing it to the buildstock_kestrel command.

$ buildstock_kestrel --help
...
usage: buildstock_kestrel [-h] [--hipri] [-m]
                          [--postprocessonly | --uploadonly | --continue_upload | --validateonly | --samplingonly | --rerun_failed]
                          project_filename

positional arguments:
  project_filename

options:
  -h, --help          show this help message and exit
  --hipri             Submit this job to the high priority queue. Uses 2x node
                      hours.
  -m, --measuresonly  Only apply the measures, but don't run simulations.
                      Useful for debugging.
  --postprocessonly   Only do postprocessing, useful for when the simulations
                      are already done
  --uploadonly        Only upload to S3, useful when postprocessing is already
                      done. Ignores the upload flag in yaml. Errors out if
                      files already exists in s3
  --continue_upload   Continue uploading to S3, useful when previous upload
                      was interrupted.
  --validateonly      Only validate the project YAML file and references.
                      Nothing is executed
  --samplingonly      Run the sampling only.
  --rerun_failed      Rerun the failed jobs

Warning

Running the simulation with postprocessonly when there is already postprocessed results from previous run will overwrite those results.

Project configuration

To run a project on Kestrel, you will need to make a few changes to your Project Definition. First, the output_directory should be in /scratch/your_username/some_directory or in /projects somewhere. Building stock simulations generate a lot of output quickly and the /scratch or /projects filesystem are equipped to handle that kind of I/O throughput where your /home directory is not.

Next, you will need to add a Kestrel Configuration top level key to the project file, which will look something like this:

kestrel:
  account: your_hpc_allocation
  n_jobs: 100  # the number of concurrent nodes to use
  minutes_per_sim: 2
  sampling:
    time: 60  # the number of minutes you expect sampling to take
  postprocessing:
    time: 180  # the number of minutes you expect post processing to take

In general, be conservative on the time estimates. It can be helpful to run a small batch with pretty conservative estimates and then look at the output logs to see how long things really took before submitting a full batch simulation.

Re-running failed array jobs

Running buildstockbatch on HPC breaks the simulation into an array of jobs that you set with the n_jobs configuration parameter. Each of those jobs runs a batch of simulations on a single compute node. Sometimes a handful of jobs will fail. If most of the jobs succeeded, rather than rerun everything you can resubmit just the jobs that failed with the --rerun_failed command line argument. This will also clear out and rerun the postprocessing.

Running on Amazon Web Services

Running a batch on AWS is done by calling the buildstock_aws command line tool.

$ buildstock_aws --help
...
usage: buildstock_aws [-h] [-c | --validateonly | --postprocessonly | --crawl]
                      project_filename

positional arguments:
  project_filename

options:
  -h, --help         show this help message and exit
  -c, --clean        After the simulation is done, run with --clean to clean
                     up AWS environment
  --validateonly     Only validate the project YAML file and references.
                     Nothing is executed
  --postprocessonly  Only do postprocessing, useful for when the simulations
                     are already done
  --crawl            Only do the crawling in Athena. When simulations and
                     postprocessing are done.

The first time you run it may take several minutes to build and upload the docker image. buildstock_aws needs to stay running and connected to the internet while the batch simulation is running on AWS. We have found it useful to run from an EC2 instance for convenience, but that is not strictly necessary.

AWS Specific Project configuration

For the project to run on AWS, you will need to add a section to your config file, something like this:

aws:
  # The job_identifier should be unique, start with alpha, and limited to 10 chars
  job_identifier: national01
  s3:
    bucket: myorg-resstock
    prefix: national01_run01
  region: us-west-2
  use_spot: true
  batch_array_size: 10000
  dask:
    n_workers: 8
  notifications_email: your_email@somewhere.com  # doesn't work right now

See AWS Configuration for details.

Cleaning up after yourself

When the batch is done, buildstock_aws should clean up after itself. However, if something goes wrong, the cleanup script can be run with the --clean option like so:

buildstock_aws --clean your_project_file.yml

This will clean up all the AWS resources that were created on your behalf to run the simulations. Your results will still be on S3 and queryable in Athena.

Running on Google Cloud Platform

Run a project on GCP by calling the buildstock_gcp command line tool.

$ buildstock_gcp --help
...
                      [-c | --validateonly | --show_jobs | --postprocessonly | --missingonly]
                      [-v]
                      project_filename [job_identifier]

positional arguments:
  project_filename
  job_identifier     Optional override of gcp.job_identifier in your project
                     file. Max 48 characters.

options:
  -h, --help         show this help message and exit
  -c, --clean        After the simulation is done, run with --clean to clean
                     up GCP environment. If the GCP Batch job is still
                     running, this will cancel the job.
  --validateonly     Only validate the project YAML file and references.
                     Nothing is executed
  --show_jobs        List existing jobs
  --postprocessonly  Only do postprocessing, useful for when the simulations
                     are already done
  --missingonly      Only run batches of simulations that are missing from a
                     previous job, then run post-processing. Assumes that the
                     project file is the same as the previous job, other than
                     the job identifier. Will not rerun individual failed
                     simulations, only full batches that are missing.
  -v, --verbose      Verbose output - includes DEBUG logs if set

The first time you run buildstock_gcp it may take several minutes, especially over a slower internet connection as it is downloading and building a docker image.

GCP specific project configuration

For the project to run on GCP, you will need to add a gcp section to your project file, something like this:

gcp:
  job_identifier: national01
  # The project, Artifact Registry repo, and GCS bucket must already exist.
  project: myorg_project
  region: us-central1
  artifact_registry:
    repository: buildstockbatch-docker
  gcs:
    bucket: buildstockbatch
    prefix: national01_run01
  job_environment:
    use_spot: true
  batch_array_size: 10000

See GCP Configuration for details and other optional settings.

You can optionally override the job_identifier from the command line (buildstock_gcp project.yml [job_identifier]). Note that each job you run must have a unique ID (unless you delete a previous job with the --clean option), so this option makes it easier to quickly assign a new ID with each run without updating the config file.

Retry failed tasks

Occasionally, especially when using spot instances, tasks will fail for transient reasons, like the VM being preempted. While preempted tasks are automatically retried a few times, if they continue to fail, the entire job will fail and postprocessing will not run.

If this happens, you can rerun the same job with the --missingonly flag. This will rerun only the tasks that didn’t produce output files, then run postprocessing. Note: This flag assumes that your project config file has not changed since the previous run, other than the job identifier. If it has changed, the behavior is undefined.

Show existing jobs

Run buildstock_gcp your_project_file.yml [job_identifier] --show_jobs to see the existing jobs matching the project specified. This can show you whether a previously-started job has completed, is still running, or has already been cleaned up.

Post-processing only

If buildstock_gcp is interrupted after the simulations are kicked off (i.e. the Batch job is running), the simulations will finish, but post-processing will not be started. You can run only the post-processing steps later with the --postprocessonly flag.

Cleaning up after yourself

When the simulations and postprocessing are complete, run buildstock_gcp your_project_file.yml [job_identifier] --clean. This will clean up all the GCP resources that were created to run the specified project, other than files in Cloud Storage. If the project is still running, it will be cancelled. Your output files will still be available in GCS.

You can clean up files in Cloud Storage from the GCP Console.

If you make code changes between runs, you may want to occasionally clean up the docker images created for each run with docker image prune.