---
base_model: Snowflake/snowflake-arctic-embed-m
datasets: []
language:
- en
library_name: sentence-transformers
license: apache-2.0
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:1490
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: How do you configure the LENGTH_OUT_OF_BOUNDS check to ensure the
    number of outliers is within acceptable limits?
  sentences:
  - 'LENGTH_OUT_OF_BOUNDS: dict(


    num_percentiles=1000,min_unique_values=3,


    condition_number_of_outliers_less_or_equal=dict(


    max_outliers=3,


    ),


    },


    ...


    is equivalent to running the following Deepchecks tests:


    import deepchecks.tabular.checks as tabular_checks


    from deepchecks.tabular import Suite


    from deepchecks.tabular import Dataset


    train_dataset = Dataset(


    reference_dataset,


    label=''class'',


    cat_features=[''country'', ''state'']


    suite = Suite(name="custom")


    check = tabular_checks.OutlierSampleDetection(


    nearest_neighbors_percent=0.01,


    extent_parameter=3,


    check.add_condition_outlier_ratio_less_or_equal(


    max_outliers_ratio=0.007,


    outlier_score_threshold=0.5,


    check.add_condition_no_outliers(


    outlier_score_threshold=0.6,


    suite.add(check)


    check = tabular_checks.StringLengthOutOfBounds(


    num_percentiles=1000,


    min_unique_values=3,


    check.add_condition_number_of_outliers_less_or_equal(


    max_outliers=3,


    suite.run(train_dataset=train_dataset)


    You can view the complete list of configuration parameters in the SDK docs.


    The Deepchecks Data Validator


    The Deepchecks Data Validator implements the same interface as do all Data Validators,
    so this method forces you to maintain some level of compatibility with the overall
    Data Validator abstraction, which guarantees an easier migration in case you decide
    to switch to another Data Validator.


    All you have to do is call the Deepchecks Data Validator methods when you need
    to interact with Deepchecks to run tests, e.g.:


    import pandas as pd


    from deepchecks.core.suite import SuiteResult


    from zenml.integrations.deepchecks.data_validators import DeepchecksDataValidator


    from zenml.integrations.deepchecks.validation_checks import DeepchecksDataIntegrityCheck


    from zenml import step


    @step


    def data_integrity_check(


    dataset: pd.DataFrame,


    ) -> SuiteResult:


    """Custom data integrity check step with Deepchecks


    Args:


    dataset: input Pandas DataFrame


    Returns:


    Deepchecks test suite execution result


    """'
  - ' the Tekton orchestrator, check out the SDK Docs .Enabling CUDA for GPU-backed
    hardware


    Note that if you wish to use this orchestrator to run steps on a GPU, you will
    need to follow the instructions on this page to ensure that it works. It requires
    adding some extra settings customization and is essential to enable CUDA for the
    GPU to give its full acceleration.


    PreviousAWS Sagemaker Orchestrator


    NextAirflow Orchestrator


    Last updated 19 days ago'
  - 'gmax(prediction.numpy())


    return classes[maxindex]The custom predict function should get the model and the
    input data as arguments and return the model predictions. ZenML will automatically
    take care of loading the model into memory and starting the seldon-core-microservice
    that will be responsible for serving the model and running the predict function.


    After defining your custom predict function in code, you can use the seldon_custom_model_deployer_step
    to automatically build your function into a Docker image and deploy it as a model
    server by setting the predict_function argument to the path of your custom_predict
    function:


    from zenml.integrations.seldon.steps import seldon_custom_model_deployer_step


    from zenml.integrations.seldon.services import SeldonDeploymentConfig


    from zenml import pipeline


    @pipeline


    def seldon_deployment_pipeline():


    model = ...


    seldon_custom_model_deployer_step(


    model=model,


    predict_function="<PATH.TO.custom_predict>",  # TODO: path to custom code


    service_config=SeldonDeploymentConfig(


    model_name="<MODEL_NAME>",  # TODO: name of the deployed model


    replicas=1,


    implementation="custom",


    resources=SeldonResourceRequirements(


    limits={"cpu": "200m", "memory": "250Mi"}


    ),


    serviceAccountName="kubernetes-service-account",


    ),


    Advanced Custom Code Deployment with Seldon Core Integration


    Before creating your custom model class, you should take a look at the custom
    Python model section of the Seldon Core documentation.


    The built-in Seldon Core custom deployment step is a good starting point for deploying
    your custom models. However, if you want to deploy more than the trained model,
    you can create your own custom class and a custom step to achieve this.


    See the ZenML custom Seldon model class as a reference.


    PreviousMLflow


    NextBentoML


    Last updated 15 days ago'
- source_sentence: What is the importance of using `get_step_context` in the do_predictions
    pipeline in ZenML?
  sentences:
  - '3_store -f s3 --path s3://my_bucket


    How to use itThe Artifact Store provides low-level object storage services for
    other ZenML mechanisms. When you develop ZenML pipelines, you normally don''t
    even have to be aware of its existence or interact with it directly. ZenML provides
    higher-level APIs that can be used as an alternative to store and access artifacts:


    return one or more objects from your pipeline steps to have them automatically
    saved in the active Artifact Store as pipeline artifacts.


    retrieve pipeline artifacts from the active Artifact Store after a pipeline run
    is complete.


    You will probably need to interact with the low-level Artifact Store API directly:


    if you implement custom Materializers for your artifact data types


    if you want to store custom objects in the Artifact Store


    The Artifact Store API


    All ZenML Artifact Stores implement the same IO API that resembles a standard
    file system. This allows you to access and manipulate the objects stored in the
    Artifact Store in the same manner you would normally handle files on your computer
    and independently of the particular type of Artifact Store that is configured
    in your ZenML stack.


    Accessing the low-level Artifact Store API can be done through the following Python
    modules:


    zenml.io.fileio provides low-level utilities for manipulating Artifact Store objects
    (e.g. open, copy, rename , remove, mkdir). These functions work seamlessly across
    Artifact Stores types. They have the same signature as the Artifact Store abstraction
    methods ( in fact, they are one and the same under the hood).


    zenml.utils.io_utils includes some higher-level helper utilities that make it
    easier to find and transfer objects between the Artifact Store and the local filesystem
    or memory.'
  - "ace. Try it out at https://www.zenml.io/live-demo!No Vendor Lock-In: Since infrastructure\
    \ is decoupled from code, ZenML gives you the freedom to switch to a different\
    \ tooling stack whenever it suits you. By avoiding vendor lock-in, you have the\
    \ flexibility to transition between cloud providers or services, ensuring that\
    \ you receive the best performance and pricing available in the market at any\
    \ time.Copyzenml stack set gcp\npython run.py  # Run your ML workflows in GCP\n\
    zenml stack set aws\npython run.py  # Now your ML workflow runs in AWS\n\n\U0001F680\
    \ Learn More\n\nReady to deploy and manage your MLOps infrastructure with ZenML?\
    \ Here is a collection of pages you can take a look at next:\n\nSet up and manage\
    \ production-ready infrastructure with ZenML.\n\nExplore the existing infrastructure\
    \ and tooling integrations of ZenML.\n\nFind answers to the most frequently asked\
    \ questions.\n\nZenML gives data scientists the freedom to fully focus on modeling\
    \ and experimentation while writing code that is production-ready from the get-go.\n\
    \nDevelop Locally: ZenML allows you to develop ML models in any environment using\
    \ your favorite tools. This means you can start developing locally, and simply\
    \ switch to a production environment once you are satisfied with your results.Copypython\
    \ run.py  # develop your code locally with all your favorite tools\nzenml stack\
    \ set production\npython run.py  # run on production infrastructure without any\
    \ code changes\n\nPythonic SDK: ZenML is designed to be as unintrusive as possible.\
    \ Adding a ZenML @step or @pipeline decorator to your Python functions is enough\
    \ to turn your existing code into ZenML pipelines:Copyfrom zenml import pipeline,\
    \ step\n\n@step\ndef step_1() -> str:\n  return \"world\"\n\n@step\ndef step_2(input_one:\
    \ str, input_two: str) -> None:\n  combined_str = input_one + ' ' + input_two\n\
    \  print(combined_str)\n\n@pipeline\ndef my_pipeline():\n  output_step_one = step_1()\n\
    \  step_2(input_one=\"hello\", input_two=output_step_one)\n\nmy_pipeline()"
  - 'e alone - uses the latest version of this artifacttrain_data = client.get_artifact_version(name="iris_training_dataset")


    # For test, we want a particular version


    test_data = client.get_artifact_version(name="iris_testing_dataset", version="raw_2023")


    # We can now send these directly into ZenML steps


    sklearn_classifier = model_trainer(train_data)


    model_evaluator(model, sklearn_classifier)


    materialized in memory in the


    Pattern 2: Artifact exchange between pipelines through a Model


    While passing around artifacts with IDs or names is very useful, it is often desirable
    to have the ZenML Model be the point of reference instead.


    ZenML Model. Each time the


    On the other side, the do_predictions pipeline simply picks up the latest promoted
    model and runs batch inference on it. It need not know of the IDs or names of
    any of the artifacts produced by the training pipeline''s many runs. This way
    these two pipelines can independently be run, but can rely on each other''s output.


    In code, this is very simple. Once the pipelines are configured to use a particular
    model, we can use get_step_context to fetch the configured model within a step
    directly. Assuming there is a predict step in the do_predictions pipeline, we
    can fetch the production model like so:


    from zenml import step, get_step_context


    # IMPORTANT: Cache needs to be disabled to avoid unexpected behavior


    @step(enable_cache=False)


    def predict(


    data: pd.DataFrame,


    ) -> Annotated[pd.Series, "predictions"]:


    # model name and version are derived from pipeline context


    model = get_step_context().model


    # Fetch the model directly from the model control plane


    model = model.get_model_artifact("trained_model")


    # Make predictions


    predictions = pd.Series(model.predict(data))


    return predictions'
- source_sentence: Where can I find bite-sized updates about ZenML?
  sentences:
  - '💜Community & content


    All possible ways for our community to get in touch with ZenML.


    The ZenML team and community have put together a list of references that can be
    used to get in touch with the development team of ZenML and develop a deeper understanding
    of the framework.


    Slack Channel: Get help from the community


    The ZenML Slack channel is the main gathering point for the community. Not only
    is it the best place to get in touch with the core team of ZenML, but it is also
    a great way to discuss new ideas and share your ZenML projects with the community.
    If you have a question, there is a high chance someone else might have already
    answered it on Slack!


    Social Media: Bite-sized updates


    We are active on LinkedIn and Twitter where we post bite-sized updates on releases,
    events, and MLOps in general. Follow us to interact and stay up to date! We would
    appreciate it if you could comment on and share our posts so more people can benefit
    from our work at ZenML!


    YouTube Channel: Video tutorials, workshops, and more


    Our YouTube channel features a growing set of videos that take you through the
    entire framework. Go here if you are a visual learner, and follow along with some
    tutorials.


    Public roadmap


    The feedback from our community plays a significant role in the development of
    ZenML. That''s why we have a public roadmap that serves as a bridge between our
    users and our development team. If you have ideas regarding any new features or
    want to prioritize one over the other, feel free to share your thoughts here or
    vote on existing ideas.


    Blog


    On our Blog page, you can find various articles written by our team. We use it
    as a platform to share our thoughts and explain the implementation process of
    our tool, its new features, and the thought process behind them.


    Podcast'
  - 'Skypilot


    Use Skypilot with ZenML.


    The ZenML SkyPilot VM Orchestrator allows you to provision and manage VMs on any
    supported cloud provider (AWS, GCP, Azure, Lambda Labs) for running your ML pipelines.
    It simplifies the process and offers cost savings and high GPU availability.


    Prerequisites


    To use the SkyPilot VM Orchestrator, you''ll need:


    ZenML SkyPilot integration for your cloud provider installed (zenml integration
    install <PROVIDER> skypilot_<PROVIDER>)


    Docker installed and running


    A remote artifact store and container registry in your ZenML stack


    A remote ZenML deployment


    Appropriate permissions to provision VMs on your cloud provider


    A service connector configured to authenticate with your cloud provider (not needed
    for Lambda Labs)


    Configuring the Orchestrator


    Configuration steps vary by cloud provider:


    AWS, GCP, Azure:


    Install the SkyPilot integration and connectors extra for your provider


    Register a service connector with credentials that have SkyPilot''s required permissions


    Register the orchestrator and connect it to the service connector


    Register and activate a stack with the new orchestrator


    zenml service-connector register <PROVIDER>-skypilot-vm -t <PROVIDER> --auto-configure


    zenml orchestrator register <ORCHESTRATOR_NAME> --flavor vm_<PROVIDER>


    zenml orchestrator connect <ORCHESTRATOR_NAME> --connector <PROVIDER>-skypilot-vm


    zenml stack register <STACK_NAME> -o <ORCHESTRATOR_NAME> ... --set


    Lambda Labs:


    Install the SkyPilot Lambda integration


    Register a secret with your Lambda Labs API key


    Register the orchestrator with the API key secret


    Register and activate a stack with the new orchestrator


    zenml secret create lambda_api_key --scope user --api_key=<KEY>


    zenml orchestrator register <ORCHESTRATOR_NAME> --flavor vm_lambda --api_key={{lambda_api_key.api_key}}


    zenml stack register <STACK_NAME> -o <ORCHESTRATOR_NAME> ... --set


    Running a Pipeline'
  - 'racking import MlflowClient, artifact_utils


    @stepdef deploy_model() -> Optional[MLFlowDeploymentService]:


    # Deploy a model using the MLflow Model Deployer


    zenml_client = Client()


    model_deployer = zenml_client.active_stack.model_deployer


    experiment_tracker = zenml_client.active_stack.experiment_tracker


    # Let''s get the run id of the current pipeline


    mlflow_run_id = experiment_tracker.get_run_id(


    experiment_name=get_step_context().pipeline_name,


    run_name=get_step_context().run_name,


    # Once we have the run id, we can get the model URI using mlflow client


    experiment_tracker.configure_mlflow()


    client = MlflowClient()


    model_name = "model" # set the model name that was logged


    model_uri = artifact_utils.get_artifact_uri(


    run_id=mlflow_run_id, artifact_path=model_name


    mlflow_deployment_config = MLFlowDeploymentConfig(


    name: str = "mlflow-model-deployment-example",


    description: str = "An example of deploying a model using the MLflow Model Deployer",


    pipeline_name: str = get_step_context().pipeline_name,


    pipeline_step_name: str = get_step_context().step_name,


    model_uri: str = model_uri,


    model_name: str = model_name,


    workers: int = 1,


    mlserver: bool = False,


    timeout: int = 300,


    service = model_deployer.deploy_model(mlflow_deployment_config)


    return service


    Configuration


    Within the MLFlowDeploymentService you can configure:


    name: The name of the deployment.


    description: The description of the deployment.


    pipeline_name: The name of the pipeline that deployed the MLflow prediction server.


    pipeline_step_name: The name of the step that deployed the MLflow prediction server.


    model_name: The name of the model that is deployed in case of model registry the
    name must be a valid registered model name.


    model_version: The version of the model that is deployed in case of model registry
    the version must be a valid registered model version.'
- source_sentence: Can you explain how to implement a custom secret store in ZenML?
  sentences:
  - 'he need to rerun unchanged parts of your pipeline.With ZenML, you can easily
    trace an artifact back to its origins and understand the exact sequence of executions
    that led to its creation, such as a trained model. This feature enables you to
    gain insights into the entire lineage of your artifacts, providing a clear understanding
    of how your data has been processed and transformed throughout your machine-learning
    pipelines. With ZenML, you can ensure the reproducibility of your results, and
    identify potential issues or bottlenecks in your pipelines. This level of transparency
    and traceability is essential for maintaining the reliability and trustworthiness
    of machine learning projects, especially when working in a team or across different
    environments.


    For more details on how to adjust the names or versions assigned to your artifacts,
    assign tags to them, or adjust other artifact properties, see the documentation
    on artifact versioning and configuration.


    By tracking the lineage of artifacts across environments and stacks, ZenML enables
    ML engineers to reproduce results and understand the exact steps taken to create
    a model. This is crucial for ensuring the reliability and reproducibility of machine
    learning models, especially when working in a team or across different environments.


    Saving and Loading Artifacts with Materializers


    Materializers play a crucial role in ZenML''s artifact management system. They
    are responsible for handling the serialization and deserialization of artifacts,
    ensuring that data is consistently stored and retrieved from the artifact store.
    Each materializer stores data flowing through a pipeline in one or more files
    within a unique directory in the artifact store:'
  - '

    zenml model version list breast_cancer_classifierThe ZenML Cloud ships with a
    Model Control Plane dashboard where you can visualize all the versions:


    Passing parameters


    The last part of the config YAML is the parameters key:


    # Configure the pipeline


    parameters:


    model_type: "rf"  # Choose between rf/sgd


    This parameters key aligns with the parameters that the pipeline expects. In this
    case, the pipeline expects a string called model_type that will inform it which
    type of model to use:


    @pipeline


    def training_pipeline(model_type: str):


    ...


    So you can see that the YAML config is fairly easy to use and is an important
    part of the codebase to control the execution of our pipeline. You can read more
    about how to configure a pipeline in the how to section, but for now, we can move
    on to scaling our pipeline.


    Scaling compute on the cloud


    When we ran our pipeline with the above config, ZenML used some sane defaults
    to pick the resource requirements for that pipeline. However, in the real world,
    you might want to add more memory, CPU, or even a GPU depending on the pipeline
    at hand.


    This is as easy as adding the following section to your local training_rf.yaml
    file:


    # These are the resources for the entire pipeline, i.e., each step


    settings:


    ...


    # Adapt this to vm_azure or vm_gcp accordingly


    orchestrator.vm_aws:


    memory: 32 # in GB


    ...


    steps:


    model_trainer:


    settings:


    orchestrator.vm_aws:


    cpus: 8


    Here we are configuring the entire pipeline with a certain amount of memory, while
    for the trainer step we are additionally configuring 8 CPU cores. The orchestrator.vm_aws
    key corresponds to the SkypilotBaseOrchestratorSettings class in the Python SDK.
    You can adapt it to vm_gcp or vm_azure depending on which flavor of skypilot you
    have configured.


    Read more about settings in ZenML here.


    Now let''s run the pipeline again:


    python run.py --training-pipeline'
  - 'Custom secret stores


    Learning how to develop a custom secret store.


    The secrets store acts as the one-stop shop for all the secrets to which your
    pipeline or stack components might need access. It is responsible for storing,
    updating and deleting only the secrets values for ZenML secrets, while the ZenML
    secret metadata is stored in the SQL database. The secrets store interface implemented
    by all available secrets store back-ends is defined in the zenml.zen_stores.secrets_stores.secrets_store_interface
    core module and looks more or less like this:


    class SecretsStoreInterface(ABC):


    """ZenML secrets store interface.


    All ZenML secrets stores must implement the methods in this interface.


    """


    # ---------------------------------


    # Initialization and configuration


    # ---------------------------------


    @abstractmethod


    def _initialize(self) -> None:


    """Initialize the secrets store.


    This method is called immediately after the secrets store is created.


    It should be used to set up the backend (database, connection etc.).


    """


    # ---------


    # Secrets


    # ---------


    @abstractmethod


    def store_secret_values(


    self,


    secret_id: UUID,


    secret_values: Dict[str, str],


    ) -> None:


    """Store secret values for a new secret.


    Args:


    secret_id: ID of the secret.


    secret_values: Values for the secret.


    """


    @abstractmethod


    def get_secret_values(self, secret_id: UUID) -> Dict[str, str]:


    """Get the secret values for an existing secret.


    Args:


    secret_id: ID of the secret.


    Returns:


    The secret values.


    Raises:


    KeyError: if no secret values for the given ID are stored in the


    secrets store.


    """


    @abstractmethod


    def update_secret_values(


    self,


    secret_id: UUID,


    secret_values: Dict[str, str],


    ) -> None:


    """Updates secret values for an existing secret.


    Args:


    secret_id: The ID of the secret to be updated.


    secret_values: The new secret values.


    Raises:


    KeyError: if no secret values for the given ID are stored in the


    secrets store.


    """


    @abstractmethod'
- source_sentence: Can you explain how to deploy a stack on AWS using the ZenML stack
    deploy command with S3 and Sagemaker?
  sentences:
  - 'r eu-north-1 -x bucket_name=my_bucket -o sagemakerThis command deploys a stack
    on AWS that uses an S3 bucket as an artifact store and Sagemaker as your orchestrator.
    The stack will be imported into ZenML once the deployment is complete and you
    can start using it right away!


    Supported flavors and component types are as follows:


    Component Type Flavor(s) Artifact Store s3, gcp, minio Container Registry aws,
    gcp Experiment Tracker mlflow Orchestrator kubernetes, kubeflow, tekton, vertex
    MLOps Platform zenml Model Deployer seldon Step Operator sagemaker, vertex


    MLStacks currently only supports deployments using AWS, GCP, and K3D as providers.


    Want more details on how this works internally?


    The stack recipe CLI interacts with the mlstacks repository to fetch the recipes
    and stores them locally in the Global Config directory.


    This is where you could potentially make any changes you want to the recipe files.
    You can also use native terraform commands like terraform apply to deploy components
    but this would require you to pass the variables manually using the -var-file
    flag to the terraform CLI.


    CLI Options for zenml stack deploy


    Current required options to be passed in to the zenml stack deploy subcommand
    are:


    -p or --provider: The cloud provider to deploy the stack on. Currently supported
    providers are aws, gcp, and k3d.


    -n or --name: The name of the stack to be deployed. This is used to identify the
    stack in ZenML.


    -r or --region: The region to deploy the stack in.


    The remaining options relate to which components you want to deploy.


    If you want to pass an mlstacks stack specification file into the CLI to use for
    deployment, you can do so with the -f option. Similarly, if you wish to see more
    of the Terraform logging, prompts and output, you can pass the -d flag to turn
    on debug-mode.


    Any extra configuration for specific components (as noted in the individual component
    deployment documentation) can be passed in with the -x option. This option can
    be used multiple times to pass in multiple configurations.'
  - 'token_hex

    token_hex(32)or:Copyopenssl rand -hex 32Important: If you configure encryption
    for your SQL database secrets store, you should keep the ZENML_SECRETS_STORE_ENCRYPTION_KEY
    value somewhere safe and secure, as it will always be required by the ZenML server
    to decrypt the secrets in the database. If you lose the encryption key, you will
    not be able to decrypt the secrets in the database and will have to reset them.


    These configuration options are only relevant if you''re using the AWS Secrets
    Manager as the secrets store backend.


    ZENML_SECRETS_STORE_TYPE: Set this to aws in order to set this type of secret
    store.


    The AWS Secrets Store uses the ZenML AWS Service Connector under the hood to authenticate
    with the AWS Secrets Manager API. This means that you can use any of the authentication
    methods supported by the AWS Service Connector to authenticate with the AWS Secrets
    Manager API.


    "Version": "2012-10-17",


    "Statement": [


    "Sid": "ZenMLSecretsStore",


    "Effect": "Allow",


    "Action": [


    "secretsmanager:CreateSecret",


    "secretsmanager:GetSecretValue",


    "secretsmanager:DescribeSecret",


    "secretsmanager:PutSecretValue",


    "secretsmanager:TagResource",


    "secretsmanager:DeleteSecret"


    ],


    "Resource": "arn:aws:secretsmanager:<AWS-region>:<AWS-account-id>:secret:zenml/*"


    The following configuration options are supported:


    ZENML_SECRETS_STORE_AUTH_METHOD: The AWS Service Connector authentication method
    to use (e.g. secret-key or iam-role).


    ZENML_SECRETS_STORE_AUTH_CONFIG: The AWS Service Connector configuration, in JSON
    format (e.g. {"aws_access_key_id":"<aws-key-id>","aws_secret_access_key":"<aws-secret-key>","region":"<aws-region>"}).


    Note: The remaining configuration options are deprecated and may be removed in
    a future release. Instead, you should set the ZENML_SECRETS_STORE_AUTH_METHOD
    and ZENML_SECRETS_STORE_AUTH_CONFIG variables to use the AWS Service Connector
    authentication method.'
  - '_settings})


    def my_pipeline() -> None:


    my_step()# Or configure the pipelines options


    my_pipeline = my_pipeline.with_options(


    settings={"docker": docker_settings}


    Configuring them on a step gives you more fine-grained control and enables you
    to build separate specialized Docker images for different steps of your pipelines:


    docker_settings = DockerSettings()


    # Either add it to the decorator


    @step(settings={"docker": docker_settings})


    def my_step() -> None:


    pass


    # Or configure the step options


    my_step = my_step.with_options(


    settings={"docker": docker_settings}


    Using a YAML configuration file as described here:


    settings:


    docker:


    ...


    steps:


    step_name:


    settings:


    docker:


    ...


    Check out this page for more information on the hierarchy and precedence of the
    various ways in which you can supply the settings.


    Using a custom parent image


    By default, ZenML performs all the steps described above on top of the official
    ZenML image for the Python and ZenML version in the active Python environment.
    To have more control over the entire environment used to execute your pipelines,
    you can either specify a custom pre-built parent image or a Dockerfile that ZenML
    uses to build a parent image for you.


    If you''re going to use a custom parent image (either pre-built or by specifying
    a Dockerfile), you need to make sure that it has Python, pip, and ZenML installed
    for it to work. If you need a starting point, you can take a look at the Dockerfile
    that ZenML uses here.


    Using a pre-built parent image


    To use a static parent image (e.g., with internal dependencies installed) that
    doesn''t need to be rebuilt on every pipeline run, specify it in the Docker settings
    for your pipeline:


    docker_settings = DockerSettings(parent_image="my_registry.io/image_name:tag")


    @pipeline(settings={"docker": docker_settings})


    def my_pipeline(...):


    ...


    To use this image directly to run your steps without including any code or installing
    any requirements on top of it, skip the Docker builds by specifying it in the
    Docker settings:'
model-index:
- name: zenml/finetuned-snowflake-arctic-embed-m
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 384
      type: dim_384
    metrics:
    - type: cosine_accuracy@1
      value: 0.3433734939759036
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.6445783132530121
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.7048192771084337
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.7891566265060241
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.3433734939759036
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.21485943775100397
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.1409638554216867
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.0789156626506024
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.3433734939759036
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.6445783132530121
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.7048192771084337
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.7891566265060241
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.573090139827556
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.5032797858099063
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.5097554744597325
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 256
      type: dim_256
    metrics:
    - type: cosine_accuracy@1
      value: 0.30120481927710846
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.6144578313253012
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.6927710843373494
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.7650602409638554
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.30120481927710846
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.20481927710843373
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.13855421686746983
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.07650602409638553
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.30120481927710846
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.6144578313253012
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.6927710843373494
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.7650602409638554
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.5423414051340752
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.469719353604896
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.47720088094729723
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 128
      type: dim_128
    metrics:
    - type: cosine_accuracy@1
      value: 0.3433734939759036
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.6325301204819277
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.6686746987951807
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.7409638554216867
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.3433734939759036
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.21084337349397586
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.1337349397590361
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.07409638554216866
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.3433734939759036
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.6325301204819277
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.6686746987951807
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.7409638554216867
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.5525841372437652
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.4909519028494932
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.49975471886162304
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 64
      type: dim_64
    metrics:
    - type: cosine_accuracy@1
      value: 0.2289156626506024
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.5120481927710844
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.608433734939759
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.7108433734939759
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.2289156626506024
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.17068273092369476
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.12168674698795179
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.07108433734939758
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.2289156626506024
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.5120481927710844
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.608433734939759
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.7108433734939759
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.46952259375314126
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.39217345572767276
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.4002406001397042
      name: Cosine Map@100
---

# zenml/finetuned-snowflake-arctic-embed-m

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) <!-- at revision 71bc94c8f9ea1e54fba11167004205a65e5da2cc -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
- **Language:** en
- **License:** apache-2.0

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zenml/finetuned-snowflake-arctic-embed-m")
# Run inference
sentences = [
    'Can you explain how to deploy a stack on AWS using the ZenML stack deploy command with S3 and Sagemaker?',
    'r eu-north-1 -x bucket_name=my_bucket -o sagemakerThis command deploys a stack on AWS that uses an S3 bucket as an artifact store and Sagemaker as your orchestrator. The stack will be imported into ZenML once the deployment is complete and you can start using it right away!\n\nSupported flavors and component types are as follows:\n\nComponent Type Flavor(s) Artifact Store s3, gcp, minio Container Registry aws, gcp Experiment Tracker mlflow Orchestrator kubernetes, kubeflow, tekton, vertex MLOps Platform zenml Model Deployer seldon Step Operator sagemaker, vertex\n\nMLStacks currently only supports deployments using AWS, GCP, and K3D as providers.\n\nWant more details on how this works internally?\n\nThe stack recipe CLI interacts with the mlstacks repository to fetch the recipes and stores them locally in the Global Config directory.\n\nThis is where you could potentially make any changes you want to the recipe files. You can also use native terraform commands like terraform apply to deploy components but this would require you to pass the variables manually using the -var-file flag to the terraform CLI.\n\nCLI Options for zenml stack deploy\n\nCurrent required options to be passed in to the zenml stack deploy subcommand are:\n\n-p or --provider: The cloud provider to deploy the stack on. Currently supported providers are aws, gcp, and k3d.\n\n-n or --name: The name of the stack to be deployed. This is used to identify the stack in ZenML.\n\n-r or --region: The region to deploy the stack in.\n\nThe remaining options relate to which components you want to deploy.\n\nIf you want to pass an mlstacks stack specification file into the CLI to use for deployment, you can do so with the -f option. Similarly, if you wish to see more of the Terraform logging, prompts and output, you can pass the -d flag to turn on debug-mode.\n\nAny extra configuration for specific components (as noted in the individual component deployment documentation) can be passed in with the -x option. This option can be used multiple times to pass in multiple configurations.',
    '_settings})\n\ndef my_pipeline() -> None:\n\nmy_step()# Or configure the pipelines options\n\nmy_pipeline = my_pipeline.with_options(\n\nsettings={"docker": docker_settings}\n\nConfiguring them on a step gives you more fine-grained control and enables you to build separate specialized Docker images for different steps of your pipelines:\n\ndocker_settings = DockerSettings()\n\n# Either add it to the decorator\n\n@step(settings={"docker": docker_settings})\n\ndef my_step() -> None:\n\npass\n\n# Or configure the step options\n\nmy_step = my_step.with_options(\n\nsettings={"docker": docker_settings}\n\nUsing a YAML configuration file as described here:\n\nsettings:\n\ndocker:\n\n...\n\nsteps:\n\nstep_name:\n\nsettings:\n\ndocker:\n\n...\n\nCheck out this page for more information on the hierarchy and precedence of the various ways in which you can supply the settings.\n\nUsing a custom parent image\n\nBy default, ZenML performs all the steps described above on top of the official ZenML image for the Python and ZenML version in the active Python environment. To have more control over the entire environment used to execute your pipelines, you can either specify a custom pre-built parent image or a Dockerfile that ZenML uses to build a parent image for you.\n\nIf you\'re going to use a custom parent image (either pre-built or by specifying a Dockerfile), you need to make sure that it has Python, pip, and ZenML installed for it to work. If you need a starting point, you can take a look at the Dockerfile that ZenML uses here.\n\nUsing a pre-built parent image\n\nTo use a static parent image (e.g., with internal dependencies installed) that doesn\'t need to be rebuilt on every pipeline run, specify it in the Docker settings for your pipeline:\n\ndocker_settings = DockerSettings(parent_image="my_registry.io/image_name:tag")\n\n@pipeline(settings={"docker": docker_settings})\n\ndef my_pipeline(...):\n\n...\n\nTo use this image directly to run your steps without including any code or installing any requirements on top of it, skip the Docker builds by specifying it in the Docker settings:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval
* Dataset: `dim_384`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.3434     |
| cosine_accuracy@3   | 0.6446     |
| cosine_accuracy@5   | 0.7048     |
| cosine_accuracy@10  | 0.7892     |
| cosine_precision@1  | 0.3434     |
| cosine_precision@3  | 0.2149     |
| cosine_precision@5  | 0.141      |
| cosine_precision@10 | 0.0789     |
| cosine_recall@1     | 0.3434     |
| cosine_recall@3     | 0.6446     |
| cosine_recall@5     | 0.7048     |
| cosine_recall@10    | 0.7892     |
| cosine_ndcg@10      | 0.5731     |
| cosine_mrr@10       | 0.5033     |
| **cosine_map@100**  | **0.5098** |

#### Information Retrieval
* Dataset: `dim_256`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.3012     |
| cosine_accuracy@3   | 0.6145     |
| cosine_accuracy@5   | 0.6928     |
| cosine_accuracy@10  | 0.7651     |
| cosine_precision@1  | 0.3012     |
| cosine_precision@3  | 0.2048     |
| cosine_precision@5  | 0.1386     |
| cosine_precision@10 | 0.0765     |
| cosine_recall@1     | 0.3012     |
| cosine_recall@3     | 0.6145     |
| cosine_recall@5     | 0.6928     |
| cosine_recall@10    | 0.7651     |
| cosine_ndcg@10      | 0.5423     |
| cosine_mrr@10       | 0.4697     |
| **cosine_map@100**  | **0.4772** |

#### Information Retrieval
* Dataset: `dim_128`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.3434     |
| cosine_accuracy@3   | 0.6325     |
| cosine_accuracy@5   | 0.6687     |
| cosine_accuracy@10  | 0.741      |
| cosine_precision@1  | 0.3434     |
| cosine_precision@3  | 0.2108     |
| cosine_precision@5  | 0.1337     |
| cosine_precision@10 | 0.0741     |
| cosine_recall@1     | 0.3434     |
| cosine_recall@3     | 0.6325     |
| cosine_recall@5     | 0.6687     |
| cosine_recall@10    | 0.741      |
| cosine_ndcg@10      | 0.5526     |
| cosine_mrr@10       | 0.491      |
| **cosine_map@100**  | **0.4998** |

#### Information Retrieval
* Dataset: `dim_64`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.2289     |
| cosine_accuracy@3   | 0.512      |
| cosine_accuracy@5   | 0.6084     |
| cosine_accuracy@10  | 0.7108     |
| cosine_precision@1  | 0.2289     |
| cosine_precision@3  | 0.1707     |
| cosine_precision@5  | 0.1217     |
| cosine_precision@10 | 0.0711     |
| cosine_recall@1     | 0.2289     |
| cosine_recall@3     | 0.512      |
| cosine_recall@5     | 0.6084     |
| cosine_recall@10    | 0.7108     |
| cosine_ndcg@10      | 0.4695     |
| cosine_mrr@10       | 0.3922     |
| **cosine_map@100**  | **0.4002** |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset


* Size: 1,490 training samples
* Columns: <code>positive</code> and <code>anchor</code>
* Approximate statistics based on the first 1000 samples:
  |         | positive                                                                          | anchor                                                                               |
  |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
  | type    | string                                                                            | string                                                                               |
  | details | <ul><li>min: 9 tokens</li><li>mean: 21.18 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 21 tokens</li><li>mean: 376.76 tokens</li><li>max: 512 tokens</li></ul> |
* Samples:
  | positive                                                                                                                     | anchor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
  |:-----------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>What are the steps to deploy and use Pigeon for data annotation within a Jupyter notebook?</code>                      | <code>Pigeon<br><br>Annotating data using Pigeon.<br><br>Pigeon is a lightweight, open-source annotation tool designed for quick and easy labeling of data directly within Jupyter notebooks. It provides a simple and intuitive interface for annotating various types of data, including:<br><br>Text Classification<br><br>Image Classification<br><br>Text Captioning<br><br>When would you want to use it?<br><br>If you need to label a small to medium-sized dataset as part of your ML workflow and prefer the convenience of doing it directly within your Jupyter notebook, Pigeon is a great choice. It is particularly useful for:<br><br>Quick labeling tasks that don't require a full-fledged annotation platform<br><br>Iterative labeling during the exploratory phase of your ML project<br><br>Collaborative labeling within a Jupyter notebook environment<br><br>How to deploy it?<br><br>To use the Pigeon annotator, you first need to install the ZenML Pigeon integration:<br><br>zenml integration install pigeon<br><br>Next, register the Pigeon annotator with ZenML, specifying the output directory where the annotation files will be stored:<br><br>zenml annotator register pigeon --flavor pigeon --output_dir="path/to/dir"<br><br>Note that the output_dir is relative to the repository or notebook root.<br><br>Finally, add the Pigeon annotator to your stack and set it as the active stack:<br><br>zenml stack update <YOUR_STACK_NAME> --annotator pigeon<br><br>Now you're ready to use the Pigeon annotator in your ML workflow!<br><br>How do you use it?<br><br>With the Pigeon annotator registered and added to your active stack, you can easily access it using the ZenML client within your Jupyter notebook.<br><br>For text classification tasks, you can launch the Pigeon annotator as follows:<br><br>from zenml.client import Client<br><br>annotator = Client().active_stack.annotator<br><br>annotations = annotator.launch(<br><br>data=[<br><br>'I love this movie',<br><br>'I was really disappointed by the book'<br><br>],<br><br>options=[<br><br>'positive',<br><br>'negative'<br><br>For image classification tasks, you can provide a custom display function to render the images:<br><br>from zenml.client import Client</code> |
  | <code>How can I attach metadata to a specific step during my work in ZenML?</code>                                           | <code>Attach metadata to steps<br><br>You might want to log metadata and have that be attached to a specific step during the course of your work. This is possible by using the log_step_metadata method. This method allows you to attach a dictionary of key-value pairs as metadata to a step. The metadata can be any JSON-serializable value, including custom classes such as Uri, Path, DType, and StorageSize.<br><br>You can call this method from within a step or from outside. If you call it from within it will attach the metadata to the step and run that is currently being executed.<br><br>from zenml import step, log_step_metadata, ArtifactConfig, get_step_context<br><br>from typing import Annotated<br><br>import pandas as pd<br><br>from sklearn.ensemble import RandomForestClassifier<br><br>from sklearn.base import ClassifierMixin<br><br>@step<br><br>def train_model(dataset: pd.DataFrame) -> Annotated[ClassifierMixin, ArtifactConfig(name="sklearn_classifier", is_model_artifact=True)]:<br><br>"""Train a model"""<br><br># Fit the model and compute metrics<br><br>classifier = RandomForestClassifier().fit(dataset)<br><br>accuracy, precision, recall = ...<br><br># Log metadata at the step level<br><br># This associates the metadata with the ZenML step run<br><br>log_step_metadata(<br><br>metadata={<br><br>"evaluation_metrics": {<br><br>"accuracy": accuracy,<br><br>"precision": precision,<br><br>"recall": recall<br><br>},<br><br>return classifier<br><br>If you call it from outside you can attach the metadata to a specific step run from any pipeline and step. This is useful if you want to attach the metadata after you've run the step.<br><br>from zenml import log_step_metadata<br><br># run some step<br><br># subsequently log the metadata for the step<br><br>log_step_metadata(<br><br>metadata={<br><br>"some_metadata": {"a_number": 3}<br><br>},<br><br>pipeline_name_id_or_prefix="my_pipeline",<br><br>step_name="my_step",<br><br>run_id="my_step_run_id"<br><br>Fetching logged metadata<br><br>Once metadata has been logged in an artifact, model, we can easily fetch the metadata with the ZenML Client:<br><br>from zenml.client import Client<br><br>client = Client()</code>                            |
  | <code>How can I list the Docker registry resources accessible by service connectors configured in my ZenML workspace?</code> | <code>━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━┛<br><br>```<br><br>```shzenml service-connector list-resources --resource-type docker-registry<br><br>```<br><br>Example Command Output<br><br>```text<br><br>The following 'docker-registry' resources can be accessed by service connectors configured in your workspace:<br><br>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━┓<br><br>┃             CONNECTOR ID             │ CONNECTOR NAME │ CONNECTOR TYPE │ RESOURCE TYPE      │ RESOURCE NAMES    ┃<br><br>┠──────────────────────────────────────┼────────────────┼────────────────┼────────────────────┼───────────────────┨<br><br>┃ eeeabc13-9203-463b-aa52-216e629e903c │ gcp-demo-multi │ 🔵 gcp         │ 🐳 docker-registry │ gcr.io/zenml-core ┃<br><br>┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━┛<br><br>```<br><br>register and connect a GCS Artifact Store Stack Component to a GCS bucket:Copyzenml artifact-store register gcs-zenml-bucket-sl --flavor gcp --path=gs://zenml-bucket-sl<br><br>Example Command Output<br><br>```text<br><br>Running with active workspace: 'default' (global)<br><br>Running with active stack: 'default' (global)<br><br>Successfully registered artifact_store `gcs-zenml-bucket-sl`.<br><br>```<br><br>```sh<br><br>zenml artifact-store connect gcs-zenml-bucket-sl --connector gcp-demo-multi<br><br>```<br><br>Example Command Output<br><br>```text<br><br>Running with active workspace: 'default' (global)<br><br>Running with active stack: 'default' (global)<br><br>Successfully connected artifact store `gcs-zenml-bucket-sl` to the following resources:<br><br>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━┓<br><br>┃             CONNECTOR ID             │ CONNECTOR NAME │ CONNECTOR TYPE │ RESOURCE TYPE │ RESOURCE NAMES       ┃<br><br>┠──────────────────────────────────────┼────────────────┼────────────────┼───────────────┼──────────────────────┨<br><br>┃ eeeabc13-9203-463b-aa52-216e629e903c │ gcp-demo-multi │ 🔵 gcp         │ 📦 gcs-bucket │ gs://zenml-bucket-sl ┃</code>                        |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          384,
          256,
          128,
          64
      ],
      "matryoshka_weights": [
          1,
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: epoch
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: True
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 16
- `eval_accumulation_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: True
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: True
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch      | Step  | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_384_cosine_map@100 | dim_64_cosine_map@100 |
|:----------:|:-----:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
| 0.6667     | 1     | 0.3797                 | 0.3924                 | 0.4168                 | 0.2953                |
| 2.0        | 3     | 0.4951                 | 0.4642                 | 0.5104                 | 0.3945                |
| **2.6667** | **4** | **0.4998**             | **0.4772**             | **0.5098**             | **0.4002**            |

* The bold row denotes the saved checkpoint.

### Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.3.1+cu121
- Accelerate: 0.31.0
- Datasets: 2.19.1
- Tokenizers: 0.19.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->