Optimum documentation

DistributedRunner

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v1.23.3).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

DistributedRunner

class optimum.habana.distributed.DistributedRunner

< >

( command_list: typing.List = [] world_size: int = 1 hostfile: typing.Union[str, pathlib.Path] = None use_mpi: bool = False use_deepspeed: bool = False master_port: int = 29500 use_env: bool = False map_by: bool = 'socket' multi_hls = None )

Set up training/inference hardware configurations and run distributed commands.

create_multi_node_setup

< >

( )

Multi-node configuration setup for DeepSpeed.

create_single_card_setup

< >

( use_deepspeed = False )

Single-card setup.

create_single_node_setup

< >

( )

Single-node multi-card configuration setup.

create_single_node_setup_deepspeed

< >

( )

Single-node multi-card configuration setup for DeepSpeed.

create_single_node_setup_mpirun

< >

( )

Single-node multi-card configuration setup for mpirun.

process_hostfile

< >

( ) str

Returns

str

address of the master node.

Returns the master address to use for multi-node runs with DeepSpeed. Directly inspired from https://github.com/microsoft/DeepSpeed/blob/316c4a43e0802a979951ee17f735daf77ea9780f/deepspeed/autotuning/utils.py#L145.

run

< >

( )

Runs the desired command with configuration specified by the user.

< > Update on GitHub