diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -31,1213 +31,1249 @@ tags: - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss widget: -- source_sentence: How do I register a Discord Alerter in ZenML to send automated - alerts? +- source_sentence: How do you configure the LENGTH_OUT_OF_BOUNDS check to ensure the + number of outliers is within acceptable limits? sentences: - - 'Discord Alerter + - 'LENGTH_OUT_OF_BOUNDS: dict( - Sending automated alerts to a Discord channel. + num_percentiles=1000,min_unique_values=3, - The DiscordAlerter enables you to send messages to a dedicated Discord channel - directly from within your ZenML pipelines. + condition_number_of_outliers_less_or_equal=dict( - The discord integration contains the following two standard steps: + max_outliers=3, - discord_alerter_post_step takes a string message, posts it to a Discord channel, - and returns whether the operation was successful. + ), - discord_alerter_ask_step also posts a message to a Discord channel, but waits - for user feedback, and only returns True if a user explicitly approved the operation - from within Discord (e.g., by sending "approve" / "reject" to the bot in response). + }, - Interacting with Discord from within your pipelines can be very useful in practice: + ... - The discord_alerter_post_step allows you to get notified immediately when failures - happen (e.g., model performance degradation, data drift, ...), + is equivalent to running the following Deepchecks tests: - The discord_alerter_ask_step allows you to integrate a human-in-the-loop into - your pipelines before executing critical steps, such as deploying new models. + import deepchecks.tabular.checks as tabular_checks - How to use it + from deepchecks.tabular import Suite - Requirements + from deepchecks.tabular import Dataset - Before you can use the DiscordAlerter, you first need to install ZenML''s discord - integration: + train_dataset = Dataset( - zenml integration install discord -y + reference_dataset, - See the Integrations page for more details on ZenML integrations and how to install - and use them. + label=''class'', - Setting Up a Discord Bot + cat_features=[''country'', ''state''] - In order to use the DiscordAlerter, you first need to have a Discord workspace - set up with a channel that you want your pipelines to post to. This is the - you will need when registering the discord alerter component. + suite = Suite(name="custom") - Then, you need to create a Discord App with a bot in your server . + check = tabular_checks.OutlierSampleDetection( - Note in the bot token copy step, if you don''t find the copy button then click - on reset token to reset the bot and you will get a new token which you can use. - Also, make sure you give necessary permissions to the bot required for sending - and receiving messages. + nearest_neighbors_percent=0.01, - Registering a Discord Alerter in ZenML' - - 'af89af ┃┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + extent_parameter=3, - ┃ NAME │ azure-session-token ┃ + check.add_condition_outlier_ratio_less_or_equal( - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + max_outliers_ratio=0.007, - ┃ TYPE │ 🇦 azure ┃ + outlier_score_threshold=0.5, - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + check.add_condition_no_outliers( - ┃ AUTH METHOD │ access-token ┃ + outlier_score_threshold=0.6, - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + suite.add(check) - ┃ RESOURCE TYPES │ 🇦 azure-generic, 📦 blob-container, 🌀 kubernetes-cluster, - 🐳 docker-registry ┃ + check = tabular_checks.StringLengthOutOfBounds( - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + num_percentiles=1000, - ┃ RESOURCE NAME │ ┃ + min_unique_values=3, - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + check.add_condition_number_of_outliers_less_or_equal( - ┃ SECRET ID │ b34f2e95-ae16-43b6-8ab6-f0ee33dbcbd8 ┃ + max_outliers=3, - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + suite.run(train_dataset=train_dataset) - ┃ SESSION DURATION │ N/A ┃ + You can view the complete list of configuration parameters in the SDK docs. - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + The Deepchecks Data Validator - ┃ EXPIRES IN │ 42m25s ┃ + The Deepchecks Data Validator implements the same interface as do all Data Validators, + so this method forces you to maintain some level of compatibility with the overall + Data Validator abstraction, which guarantees an easier migration in case you decide + to switch to another Data Validator. - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + All you have to do is call the Deepchecks Data Validator methods when you need + to interact with Deepchecks to run tests, e.g.: - ┃ OWNER │ default ┃ + import pandas as pd - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨' - - 'Storing embeddings in a vector database + from deepchecks.core.suite import SuiteResult - Store embeddings in a vector database for efficient retrieval. + from zenml.integrations.deepchecks.data_validators import DeepchecksDataValidator - The process of generating the embeddings doesn''t take too long, especially if - the machine on which the step is running has a GPU, but it''s still not something - we want to do every time we need to retrieve a document. Instead, we can store - the embeddings in a vector database, which allows us to quickly retrieve the most - relevant chunks based on their similarity to the query. + from zenml.integrations.deepchecks.validation_checks import DeepchecksDataIntegrityCheck - For the purposes of this guide, we''ll use PostgreSQL as our vector database. - This is a popular choice for storing embeddings, as it provides a scalable and - efficient way to store and retrieve high-dimensional vectors. However, you can - use any vector database that supports high-dimensional vectors. If you want to - explore a list of possible options, this is a good website to compare different - options. + from zenml import step - For more information on how to set up a PostgreSQL database to follow along with - this guide, please see the instructions in the repository which show how to set - up a PostgreSQL database using Supabase. + @step - Since PostgreSQL is a well-known and battle-tested database, we can use known - and minimal packages to connect and to interact with it. We can use the psycopg2 - package to connect and then raw SQL statements to interact with the database. + def data_integrity_check( - The code for the step is fairly simple: + dataset: pd.DataFrame, - from zenml import step + ) -> SuiteResult: - @step + """Custom data integrity check step with Deepchecks - def index_generator( + Args: - documents: List[Document], + dataset: input Pandas DataFrame - ) -> None: + Returns: - try: + Deepchecks test suite execution result - conn = get_db_conn() + """' + - ' the Tekton orchestrator, check out the SDK Docs .Enabling CUDA for GPU-backed + hardware - with conn.cursor() as cur: + Note that if you wish to use this orchestrator to run steps on a GPU, you will + need to follow the instructions on this page to ensure that it works. It requires + adding some extra settings customization and is essential to enable CUDA for the + GPU to give its full acceleration. - # Install pgvector if not already installed + PreviousAWS Sagemaker Orchestrator - cur.execute("CREATE EXTENSION IF NOT EXISTS vector") + NextAirflow Orchestrator - conn.commit() + Last updated 19 days ago' + - 'gmax(prediction.numpy()) - # Create the embeddings table if it doesn''t exist + return classes[maxindex]The custom predict function should get the model and the + input data as arguments and return the model predictions. ZenML will automatically + take care of loading the model into memory and starting the seldon-core-microservice + that will be responsible for serving the model and running the predict function. - table_create_command = f""" + After defining your custom predict function in code, you can use the seldon_custom_model_deployer_step + to automatically build your function into a Docker image and deploy it as a model + server by setting the predict_function argument to the path of your custom_predict + function: - CREATE TABLE IF NOT EXISTS embeddings ( + from zenml.integrations.seldon.steps import seldon_custom_model_deployer_step - id SERIAL PRIMARY KEY, + from zenml.integrations.seldon.services import SeldonDeploymentConfig - content TEXT, + from zenml import pipeline - token_count INTEGER, + @pipeline - embedding VECTOR({EMBEDDING_DIMENSIONALITY}), + def seldon_deployment_pipeline(): - filename TEXT, + model = ... - parent_section TEXT, + seldon_custom_model_deployer_step( - url TEXT + model=model, - ); + predict_function="", # TODO: path to custom code - """ + service_config=SeldonDeploymentConfig( - cur.execute(table_create_command) + model_name="", # TODO: name of the deployed model - conn.commit() + replicas=1, - register_vector(conn)' -- source_sentence: How can I configure a step or pipeline to use a custom materializer - in ZenML? - sentences: - - 'e following: + implementation="custom", - Question: What are Plasma Phoenixes?Answer: Plasma Phoenixes are majestic creatures - made of pure energy that soar above the chromatic canyons of Zenml World. They - leave fiery trails behind them, painting the sky with dazzling displays of colors. + resources=SeldonResourceRequirements( - Question: What kinds of creatures live on the prismatic shores of ZenML World? + limits={"cpu": "200m", "memory": "250Mi"} - Answer: On the prismatic shores of ZenML World, you can find crystalline crabs - scuttling and burrowing with their transparent exoskeletons, which refract light - into a kaleidoscope of hues. + ), - Question: What is the capital of Panglossia? + serviceAccountName="kubernetes-service-account", - Answer: The capital of Panglossia is not mentioned in the provided context. + ), - The implementation above is by no means sophisticated or performant, but it''s - simple enough that you can see all the moving parts. Our tokenization process - consists of splitting the text into individual words. + Advanced Custom Code Deployment with Seldon Core Integration - The way we check for similarity between the question / query and the chunks of - text is extremely naive and inefficient. The similarity between the query and - the current chunk is calculated using the Jaccard similarity coefficient. This - coefficient measures the similarity between two sets and is defined as the size - of the intersection divided by the size of the union of the two sets. So we count - the number of words that are common between the query and the chunk and divide - it by the total number of unique words in both the query and the chunk. There - are much better ways of measuring the similarity between two pieces of text, such - as using embeddings or other more sophisticated techniques, but this example is - kept simple for illustrative purposes. + Before creating your custom model class, you should take a look at the custom + Python model section of the Seldon Core documentation. - The rest of this guide will showcase a more performant and scalable way of performing - the same task using ZenML. If you ever are unsure why we''re doing something, - feel free to return to this example for the high-level overview. + The built-in Seldon Core custom deployment step is a good starting point for deploying + your custom models. However, if you want to deploy more than the trained model, + you can create your own custom class and a custom step to achieve this. - PreviousRAG with ZenML + See the ZenML custom Seldon model class as a reference. - NextUnderstanding Retrieval-Augmented Generation (RAG) + PreviousMLflow - Last updated 2 months ago' - - 'ral options are presented. + NextBentoML - hyperparameter tuning?Our dedicated documentation guide on implementing this is - the place to learn more. + Last updated 15 days ago' +- source_sentence: What is the importance of using `get_step_context` in the do_predictions + pipeline in ZenML? + sentences: + - '3_store -f s3 --path s3://my_bucket - reset things when something goes wrong? + How to use itThe Artifact Store provides low-level object storage services for + other ZenML mechanisms. When you develop ZenML pipelines, you normally don''t + even have to be aware of its existence or interact with it directly. ZenML provides + higher-level APIs that can be used as an alternative to store and access artifacts: - To reset your ZenML client, you can run zenml clean which will wipe your local - metadata database and reset your client. Note that this is a destructive action, - so feel free to reach out to us on Slack before doing this if you are unsure. + return one or more objects from your pipeline steps to have them automatically + saved in the active Artifact Store as pipeline artifacts. - steps that create other steps AKA dynamic pipelines and steps? + retrieve pipeline artifacts from the active Artifact Store after a pipeline run + is complete. - Please read our general information on how to compose steps + pipelines together - to start with. You might also find the code examples in our guide to implementing - hyperparameter tuning which is related to this topic. + You will probably need to interact with the low-level Artifact Store API directly: - templates: using starter code with ZenML? + if you implement custom Materializers for your artifact data types - Project templates allow you to get going quickly with ZenML. We recommend the - Starter template (starter) for most use cases which gives you a basic scaffold - and structure around which you can write your own code. You can also build templates - for others inside a Git repository and use them with ZenML''s templates functionality. + if you want to store custom objects in the Artifact Store - upgrade my ZenML client and/or server? + The Artifact Store API - Upgrading your ZenML client package is as simple as running pip install --upgrade - zenml in your terminal. For upgrading your ZenML server, please refer to the dedicated - documentation section which covers most of the ways you might do this as well - as common troubleshooting steps. + All ZenML Artifact Stores implement the same IO API that resembles a standard + file system. This allows you to access and manipulate the objects stored in the + Artifact Store in the same manner you would normally handle files on your computer + and independently of the particular type of Artifact Store that is configured + in your ZenML stack. - use a stack component? + Accessing the low-level Artifact Store API can be done through the following Python + modules: - For information on how to use a specific stack component, please refer to the - component guide which contains all our tips and advice on how to use each integration - and component with ZenML. + zenml.io.fileio provides low-level utilities for manipulating Artifact Store objects + (e.g. open, copy, rename , remove, mkdir). These functions work seamlessly across + Artifact Stores types. They have the same signature as the Artifact Store abstraction + methods ( in fact, they are one and the same under the hood). - PreviousAPI reference + zenml.utils.io_utils includes some higher-level helper utilities that make it + easier to find and transfer objects between the Artifact Store and the local filesystem + or memory.' + - "ace. Try it out at https://www.zenml.io/live-demo!No Vendor Lock-In: Since infrastructure\ + \ is decoupled from code, ZenML gives you the freedom to switch to a different\ + \ tooling stack whenever it suits you. By avoiding vendor lock-in, you have the\ + \ flexibility to transition between cloud providers or services, ensuring that\ + \ you receive the best performance and pricing available in the market at any\ + \ time.Copyzenml stack set gcp\npython run.py # Run your ML workflows in GCP\n\ + zenml stack set aws\npython run.py # Now your ML workflow runs in AWS\n\n\U0001F680\ + \ Learn More\n\nReady to deploy and manage your MLOps infrastructure with ZenML?\ + \ Here is a collection of pages you can take a look at next:\n\nSet up and manage\ + \ production-ready infrastructure with ZenML.\n\nExplore the existing infrastructure\ + \ and tooling integrations of ZenML.\n\nFind answers to the most frequently asked\ + \ questions.\n\nZenML gives data scientists the freedom to fully focus on modeling\ + \ and experimentation while writing code that is production-ready from the get-go.\n\ + \nDevelop Locally: ZenML allows you to develop ML models in any environment using\ + \ your favorite tools. This means you can start developing locally, and simply\ + \ switch to a production environment once you are satisfied with your results.Copypython\ + \ run.py # develop your code locally with all your favorite tools\nzenml stack\ + \ set production\npython run.py # run on production infrastructure without any\ + \ code changes\n\nPythonic SDK: ZenML is designed to be as unintrusive as possible.\ + \ Adding a ZenML @step or @pipeline decorator to your Python functions is enough\ + \ to turn your existing code into ZenML pipelines:Copyfrom zenml import pipeline,\ + \ step\n\n@step\ndef step_1() -> str:\n return \"world\"\n\n@step\ndef step_2(input_one:\ + \ str, input_two: str) -> None:\n combined_str = input_one + ' ' + input_two\n\ + \ print(combined_str)\n\n@pipeline\ndef my_pipeline():\n output_step_one = step_1()\n\ + \ step_2(input_one=\"hello\", input_two=output_step_one)\n\nmy_pipeline()" + - 'e alone - uses the latest version of this artifacttrain_data = client.get_artifact_version(name="iris_training_dataset") - NextMigration guide + # For test, we want a particular version - Last updated 18 days ago' - - ' configuration documentation for more information.Custom materializers + test_data = client.get_artifact_version(name="iris_testing_dataset", version="raw_2023") - Configuring a step/pipeline to use a custom materializer + # We can now send these directly into ZenML steps - Defining which step uses what materializer + sklearn_classifier = model_trainer(train_data) - ZenML automatically detects if your materializer is imported in your source code - and registers them for the corresponding data type (defined in ASSOCIATED_TYPES). - Therefore, just having a custom materializer definition in your code is enough - to enable the respective data type to be used in your pipelines. + model_evaluator(model, sklearn_classifier) - However, it is best practice to explicitly define which materializer to use for - a specific step and not rely on the ASSOCIATED_TYPES to make that connection: + materialized in memory in the - class MyObj: + Pattern 2: Artifact exchange between pipelines through a Model - ... + While passing around artifacts with IDs or names is very useful, it is often desirable + to have the ZenML Model be the point of reference instead. - class MyMaterializer(BaseMaterializer): + ZenML Model. Each time the - """Materializer to read data to and from MyObj.""" + On the other side, the do_predictions pipeline simply picks up the latest promoted + model and runs batch inference on it. It need not know of the IDs or names of + any of the artifacts produced by the training pipeline''s many runs. This way + these two pipelines can independently be run, but can rely on each other''s output. - ASSOCIATED_TYPES = (MyObj) + In code, this is very simple. Once the pipelines are configured to use a particular + model, we can use get_step_context to fetch the configured model within a step + directly. Assuming there is a predict step in the do_predictions pipeline, we + can fetch the production model like so: - ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA + from zenml import step, get_step_context - # Read below to learn how to implement this materializer + # IMPORTANT: Cache needs to be disabled to avoid unexpected behavior - # You can define it at the decorator level + @step(enable_cache=False) - @step(output_materializers=MyMaterializer) + def predict( - def my_first_step() -> MyObj: + data: pd.DataFrame, - return 1 + ) -> Annotated[pd.Series, "predictions"]: - # No need to explicitly specify materializer here: + # model name and version are derived from pipeline context - # it is coupled with Artifact Version generated by + model = get_step_context().model - # `my_first_step` already. + # Fetch the model directly from the model control plane - def my_second_step(a: MyObj): + model = model.get_model_artifact("trained_model") - print(a) + # Make predictions - # or you can use the `configure()` method of the step. E.g.: + predictions = pd.Series(model.predict(data)) - my_first_step.configure(output_materializers=MyMaterializer) + return predictions' +- source_sentence: Where can I find bite-sized updates about ZenML? + sentences: + - '💜Community & content - When there are multiple outputs, a dictionary of type {: } - can be supplied to the decorator or the .configure(...) method: + All possible ways for our community to get in touch with ZenML. - class MyObj1: + The ZenML team and community have put together a list of references that can be + used to get in touch with the development team of ZenML and develop a deeper understanding + of the framework. - ... + Slack Channel: Get help from the community - class MyObj2: + The ZenML Slack channel is the main gathering point for the community. Not only + is it the best place to get in touch with the core team of ZenML, but it is also + a great way to discuss new ideas and share your ZenML projects with the community. + If you have a question, there is a high chance someone else might have already + answered it on Slack! - ... + Social Media: Bite-sized updates - class MyMaterializer1(BaseMaterializer): + We are active on LinkedIn and Twitter where we post bite-sized updates on releases, + events, and MLOps in general. Follow us to interact and stay up to date! We would + appreciate it if you could comment on and share our posts so more people can benefit + from our work at ZenML! - """Materializer to read data to and from MyObj1.""" + YouTube Channel: Video tutorials, workshops, and more - ASSOCIATED_TYPES = (MyObj1) + Our YouTube channel features a growing set of videos that take you through the + entire framework. Go here if you are a visual learner, and follow along with some + tutorials. - ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA + Public roadmap - class MyMaterializer2(BaseMaterializer): + The feedback from our community plays a significant role in the development of + ZenML. That''s why we have a public roadmap that serves as a bridge between our + users and our development team. If you have ideas regarding any new features or + want to prioritize one over the other, feel free to share your thoughts here or + vote on existing ideas. - """Materializer to read data to and from MyObj2.""" + Blog - ASSOCIATED_TYPES = (MyObj2) + On our Blog page, you can find various articles written by our team. We use it + as a platform to share our thoughts and explain the implementation process of + our tool, its new features, and the thought process behind them. - ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA + Podcast' + - 'Skypilot - # This is where we connect the objects to the materializer + Use Skypilot with ZenML. - @step(output_materializers={"1": MyMaterializer1, "2": MyMaterializer2})' -- source_sentence: How do I use it to start the local Airflow server for ZenML? - sentences: - - ' use it + The ZenML SkyPilot VM Orchestrator allows you to provision and manage VMs on any + supported cloud provider (AWS, GCP, Azure, Lambda Labs) for running your ML pipelines. + It simplifies the process and offers cost savings and high GPU availability. - To use the Airflow orchestrator, we need:The ZenML airflow integration installed. - If you haven''t done so, runCopyzenml integration install airflow + Prerequisites - Docker installed and running. + To use the SkyPilot VM Orchestrator, you''ll need: - The orchestrator registered and part of our active stack: + ZenML SkyPilot integration for your cloud provider installed (zenml integration + install skypilot_) - zenml orchestrator register \ + Docker installed and running - --flavor=airflow \ + A remote artifact store and container registry in your ZenML stack - --local=True # set this to `False` if using a remote Airflow deployment + A remote ZenML deployment - # Register and activate a stack with the new orchestrator + Appropriate permissions to provision VMs on your cloud provider - zenml stack register -o ... --set + A service connector configured to authenticate with your cloud provider (not needed + for Lambda Labs) - In the local case, we need to reinstall in a certain way for the local Airflow - server: + Configuring the Orchestrator - pip install "apache-airflow-providers-docker<3.8.0" "apache-airflow==2.4.0" "pendulum<3.0.0" - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.4.0/constraints-3.9.txt" + Configuration steps vary by cloud provider: - Please make sure to replace 3.9 with your Python (major) version in the constraints - file URL given above. + AWS, GCP, Azure: - Once that is installed, we can start the local Airflow server by running the following - command in your terminal. See further below on an alternative way to set up the - Airflow server manually since the zenml stack up command is deprecated. + Install the SkyPilot integration and connectors extra for your provider - zenml stack up + Register a service connector with credentials that have SkyPilot''s required permissions - This command will start up an Airflow server on your local machine that''s running - in the same Python environment that you used to provision it. When it is finished, - it will print a username and password which you can use to log in to the Airflow - UI here. + Register the orchestrator and connect it to the service connector - As long as you didn''t configure any custom value for the dag_output_dir attribute - of your orchestrator, running a pipeline locally is as simple as calling: + Register and activate a stack with the new orchestrator - python file_that_runs_a_zenml_pipeline.py + zenml service-connector register -skypilot-vm -t --auto-configure - This call will produce a .zip file containing a representation of your ZenML pipeline - to the Airflow DAGs directory. From there, the local Airflow server will load - it and run your pipeline (It might take a few seconds until the pipeline shows - up in the Airflow UI).' - - 'nentConfig class here. + zenml orchestrator register --flavor vm_ - Base Abstraction 3: Flavorfrom zenml.enums import StackComponentType + zenml orchestrator connect --connector -skypilot-vm - from zenml.stack import Flavor + zenml stack register -o ... --set - class LocalArtifactStore(BaseArtifactStore): + Lambda Labs: - ... + Install the SkyPilot Lambda integration - class LocalArtifactStoreConfig(BaseArtifactStoreConfig): + Register a secret with your Lambda Labs API key - ... + Register the orchestrator with the API key secret - class LocalArtifactStoreFlavor(Flavor): + Register and activate a stack with the new orchestrator - @property + zenml secret create lambda_api_key --scope user --api_key= - def name(self) -> str: + zenml orchestrator register --flavor vm_lambda --api_key={{lambda_api_key.api_key}} - """Returns the name of the flavor.""" + zenml stack register -o ... --set - return "local" + Running a Pipeline' + - 'racking import MlflowClient, artifact_utils - @property + @stepdef deploy_model() -> Optional[MLFlowDeploymentService]: - def type(self) -> StackComponentType: + # Deploy a model using the MLflow Model Deployer - """Returns the flavor type.""" + zenml_client = Client() - return StackComponentType.ARTIFACT_STORE + model_deployer = zenml_client.active_stack.model_deployer - @property + experiment_tracker = zenml_client.active_stack.experiment_tracker - def config_class(self) -> Type[LocalArtifactStoreConfig]: + # Let''s get the run id of the current pipeline - """Config class of this flavor.""" + mlflow_run_id = experiment_tracker.get_run_id( - return LocalArtifactStoreConfig + experiment_name=get_step_context().pipeline_name, - @property + run_name=get_step_context().run_name, - def implementation_class(self) -> Type[LocalArtifactStore]: + # Once we have the run id, we can get the model URI using mlflow client - """Implementation class of this flavor.""" + experiment_tracker.configure_mlflow() - return LocalArtifactStore + client = MlflowClient() - See the full code of the base Flavor class definition here. + model_name = "model" # set the model name that was logged - Implementing a Custom Stack Component Flavor + model_uri = artifact_utils.get_artifact_uri( - Let''s recap what we just learned by reimplementing the S3ArtifactStore from the - aws integration as a custom flavor. + run_id=mlflow_run_id, artifact_path=model_name - We can start with the configuration class: here we need to define the SUPPORTED_SCHEMES - class variable introduced by the BaseArtifactStore. We also define several additional - configuration values that users can use to configure how the artifact store will - authenticate with AWS: + mlflow_deployment_config = MLFlowDeploymentConfig( - from zenml.artifact_stores import BaseArtifactStoreConfig + name: str = "mlflow-model-deployment-example", - from zenml.utils.secret_utils import SecretField + description: str = "An example of deploying a model using the MLflow Model Deployer", - class MyS3ArtifactStoreConfig(BaseArtifactStoreConfig): + pipeline_name: str = get_step_context().pipeline_name, - """Configuration for the S3 Artifact Store.""" + pipeline_step_name: str = get_step_context().step_name, - SUPPORTED_SCHEMES: ClassVar[Set[str]] = {"s3://"} + model_uri: str = model_uri, - key: Optional[str] = SecretField(default=None) + model_name: str = model_name, - secret: Optional[str] = SecretField(default=None) + workers: int = 1, - token: Optional[str] = SecretField(default=None) + mlserver: bool = False, - client_kwargs: Optional[Dict[str, Any]] = None + timeout: int = 300, - config_kwargs: Optional[Dict[str, Any]] = None + service = model_deployer.deploy_model(mlflow_deployment_config) - s3_additional_kwargs: Optional[Dict[str, Any]] = None + return service - You can pass sensitive configuration values as secrets by defining them as type - SecretField in the configuration class.' - - '─────────────────────────────────────────────────┨┃ UUID │ 2b7773eb-d371-4f24-96f1-fad15e74fd6e ┃ + Configuration - ┠────────────────────┼──────────────────────────────────────────────────────────────────────────────┨ + Within the MLFlowDeploymentService you can configure: - ┃ PATH │ /home/stefan/.config/zenml/local_stores/2b7773eb-d371-4f24-96f1-fad15e74fd6e - ┃ + name: The name of the deployment. - ┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + description: The description of the deployment. - As shown by the PATH value in the zenml artifact-store describe output, the artifacts - are stored inside a folder on your local filesystem. + pipeline_name: The name of the pipeline that deployed the MLflow prediction server. - You can create additional instances of local Artifact Stores and use them in your - stacks as you see fit, e.g.: + pipeline_step_name: The name of the step that deployed the MLflow prediction server. - # Register the local artifact store + model_name: The name of the model that is deployed in case of model registry the + name must be a valid registered model name. - zenml artifact-store register custom_local --flavor local + model_version: The version of the model that is deployed in case of model registry + the version must be a valid registered model version.' +- source_sentence: Can you explain how to implement a custom secret store in ZenML? + sentences: + - 'he need to rerun unchanged parts of your pipeline.With ZenML, you can easily + trace an artifact back to its origins and understand the exact sequence of executions + that led to its creation, such as a trained model. This feature enables you to + gain insights into the entire lineage of your artifacts, providing a clear understanding + of how your data has been processed and transformed throughout your machine-learning + pipelines. With ZenML, you can ensure the reproducibility of your results, and + identify potential issues or bottlenecks in your pipelines. This level of transparency + and traceability is essential for maintaining the reliability and trustworthiness + of machine learning projects, especially when working in a team or across different + environments. - # Register and set a stack with the new artifact store + For more details on how to adjust the names or versions assigned to your artifacts, + assign tags to them, or adjust other artifact properties, see the documentation + on artifact versioning and configuration. - zenml stack register custom_stack -o default -a custom_local --set + By tracking the lineage of artifacts across environments and stacks, ZenML enables + ML engineers to reproduce results and understand the exact steps taken to create + a model. This is crucial for ensuring the reliability and reproducibility of machine + learning models, especially when working in a team or across different environments. - Same as all other Artifact Store flavors, the local Artifact Store does take in - a path configuration parameter that can be set during registration to point to - a custom path on your machine. However, it is highly recommended that you rely - on the default path value, otherwise, it may lead to unexpected results. Other - local stack components depend on the convention used for the default path to be - able to access the local Artifact Store. + Saving and Loading Artifacts with Materializers - For more, up-to-date information on the local Artifact Store implementation and - its configuration, you can have a look at the SDK docs . + Materializers play a crucial role in ZenML''s artifact management system. They + are responsible for handling the serialization and deserialization of artifacts, + ensuring that data is consistently stored and retrieved from the artifact store. + Each materializer stores data flowing through a pipeline in one or more files + within a unique directory in the artifact store:' + - ' + zenml model version list breast_cancer_classifierThe ZenML Cloud ships with a + Model Control Plane dashboard where you can visualize all the versions: - How do you use it? + Passing parameters - Aside from the fact that the artifacts are stored locally, using the local Artifact - Store is no different from using any other flavor of Artifact Store. + The last part of the config YAML is the parameters key: - PreviousArtifact Stores + # Configure the pipeline - NextAmazon Simple Cloud Storage (S3) + parameters: - Last updated 19 days ago' -- source_sentence: How do I configure the evidently_test_step to run an Evidently - test suite with specific column mappings? - sentences: - - ' ┃┗━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + model_type: "rf" # Choose between rf/sgd - $ zenml model-deployer models get-url 8cbe671b-9fce-4394-a051-68e001f92765 + This parameters key aligns with the parameters that the pipeline expects. In this + case, the pipeline expects a string called model_type that will inform it which + type of model to use: - Prediction URL of Served Model 8cbe671b-9fce-4394-a051-68e001f92765 is: + @pipeline - http://abb84c444c7804aa98fc8c097896479d-377673393.us-east-1.elb.amazonaws.com/seldon/zenml-workloads/zenml-8cbe67 + def training_pipeline(model_type: str): - 1b-9fce-4394-a051-68e001f92765/api/v0.1/predictions + ... - $ zenml model-deployer models delete 8cbe671b-9fce-4394-a051-68e001f92765 + So you can see that the YAML config is fairly easy to use and is an important + part of the codebase to control the execution of our pipeline. You can read more + about how to configure a pipeline in the how to section, but for now, we can move + on to scaling our pipeline. - In Python, you can alternatively discover the prediction URL of a deployed model - by inspecting the metadata of the step that deployed the model: + Scaling compute on the cloud - from zenml.client import Client + When we ran our pipeline with the above config, ZenML used some sane defaults + to pick the resource requirements for that pipeline. However, in the real world, + you might want to add more memory, CPU, or even a GPU depending on the pipeline + at hand. - pipeline_run = Client().get_pipeline_run("") + This is as easy as adding the following section to your local training_rf.yaml + file: - deployer_step = pipeline_run.steps[""] + # These are the resources for the entire pipeline, i.e., each step - deployed_model_url = deployer_step.run_metadata["deployed_model_url"].value + settings: - The ZenML integrations that provide Model Deployer stack components also include - standard pipeline steps that can directly be inserted into any pipeline to achieve - a continuous model deployment workflow. These steps take care of all the aspects - of continuously deploying models to an external server and saving the Service - configuration into the Artifact Store, where they can be loaded at a later time - and re-create the initial conditions used to serve a particular model. + ... - PreviousDevelop a custom experiment tracker + # Adapt this to vm_azure or vm_gcp accordingly - NextMLflow + orchestrator.vm_aws: - Last updated 15 days ago' - - 'Load artifacts into memory + memory: 32 # in GB - Often ZenML pipeline steps consume artifacts produced by one another directly - in the pipeline code, but there are scenarios where you need to pull external - data into your steps. Such external data could be artifacts produced by non-ZenML - codes. For those cases, it is advised to use ExternalArtifact, but what if we - plan to exchange data created with other ZenML pipelines? + ... - ZenML pipelines are first compiled and only executed at some later point. During - the compilation phase, all function calls are executed, and this data is fixed - as step input parameters. Given all this, the late materialization of dynamic - objects, like data artifacts, is crucial. Without late materialization, it would - not be possible to pass not-yet-existing artifacts as step inputs, or their metadata, - which is often the case in a multi-pipeline setting. + steps: - We identify two major use cases for exchanging artifacts between pipelines: + model_trainer: - You semantically group your data products using ZenML Models + settings: - You prefer to use ZenML Client to bring all the pieces together + orchestrator.vm_aws: - We recommend using models to group and access artifacts across pipelines. Find - out how to load an artifact from a ZenML Model here. + cpus: 8 - Use client methods to exchange artifacts + Here we are configuring the entire pipeline with a certain amount of memory, while + for the trainer step we are additionally configuring 8 CPU cores. The orchestrator.vm_aws + key corresponds to the SkypilotBaseOrchestratorSettings class in the Python SDK. + You can adapt it to vm_gcp or vm_azure depending on which flavor of skypilot you + have configured. - If you don''t yet use the Model Control Plane, you can still exchange data between - pipelines with late materialization. Let''s rework the do_predictions pipeline - code as follows: + Read more about settings in ZenML here. - from typing import Annotated + Now let''s run the pipeline again: - from zenml import step, pipeline + python run.py --training-pipeline' + - 'Custom secret stores - from zenml.client import Client + Learning how to develop a custom secret store. - import pandas as pd + The secrets store acts as the one-stop shop for all the secrets to which your + pipeline or stack components might need access. It is responsible for storing, + updating and deleting only the secrets values for ZenML secrets, while the ZenML + secret metadata is stored in the SQL database. The secrets store interface implemented + by all available secrets store back-ends is defined in the zenml.zen_stores.secrets_stores.secrets_store_interface + core module and looks more or less like this: - from sklearn.base import ClassifierMixin + class SecretsStoreInterface(ABC): - @step + """ZenML secrets store interface. - def predict( + All ZenML secrets stores must implement the methods in this interface. - model1: ClassifierMixin, + """ - model2: ClassifierMixin, + # --------------------------------- - model1_metric: float, + # Initialization and configuration - model2_metric: float, + # --------------------------------- - data: pd.DataFrame, + @abstractmethod - ) -> Annotated[pd.Series, "predictions"]: + def _initialize(self) -> None: - # compare which model performs better on the fly + """Initialize the secrets store. - if model1_metric < model2_metric: + This method is called immediately after the secrets store is created. - predictions = pd.Series(model1.predict(data)) + It should be used to set up the backend (database, connection etc.). - else: + """ - predictions = pd.Series(model2.predict(data)) + # --------- - return predictions + # Secrets - @step' - - 't ( + # --------- - EvidentlyColumnMapping, + @abstractmethod - evidently_test_step,from zenml.integrations.evidently.tests import EvidentlyTestConfig + def store_secret_values( - text_data_test = evidently_test_step.with_options( + self, - parameters=dict( + secret_id: UUID, - column_mapping=EvidentlyColumnMapping( + secret_values: Dict[str, str], - target="Rating", + ) -> None: - numerical_features=["Age", "Positive_Feedback_Count"], + """Store secret values for a new secret. - categorical_features=[ + Args: - "Division_Name", + secret_id: ID of the secret. - "Department_Name", + secret_values: Values for the secret. - "Class_Name", + """ - ], + @abstractmethod - text_features=["Review_Text", "Title"], + def get_secret_values(self, secret_id: UUID) -> Dict[str, str]: - ), + """Get the secret values for an existing secret. - tests=[ + Args: - EvidentlyTestConfig.test("DataQualityTestPreset"), + secret_id: ID of the secret. - EvidentlyTestConfig.test_generator( + Returns: - "TestColumnRegExp", + The secret values. - columns=["Review_Text", "Title"], + Raises: - reg_exp=r"[A-Z][A-Za-z0-9 ]*", + KeyError: if no secret values for the given ID are stored in the - ), + secrets store. - ], + """ - # We need to download the NLTK data for the TestColumnRegExp test + @abstractmethod - download_nltk_data=True, + def update_secret_values( - ), + self, - The configuration shown in the example is the equivalent of running the following - Evidently code inside the step: + secret_id: UUID, - from evidently.tests import TestColumnRegExp + secret_values: Dict[str, str], - from evidently.test_preset import DataQualityTestPreset + ) -> None: - from evidently import ColumnMapping + """Updates secret values for an existing secret. - from evidently.test_suite import TestSuite + Args: - from evidently.tests.base_test import generate_column_tests + secret_id: The ID of the secret to be updated. - import nltk + secret_values: The new secret values. - nltk.download("words") + Raises: - nltk.download("wordnet") + KeyError: if no secret values for the given ID are stored in the - nltk.download("omw-1.4") + secrets store. - column_mapping = ColumnMapping( + """ - target="Rating", + @abstractmethod' +- source_sentence: Can you explain how to deploy a stack on AWS using the ZenML stack + deploy command with S3 and Sagemaker? + sentences: + - 'r eu-north-1 -x bucket_name=my_bucket -o sagemakerThis command deploys a stack + on AWS that uses an S3 bucket as an artifact store and Sagemaker as your orchestrator. + The stack will be imported into ZenML once the deployment is complete and you + can start using it right away! - numerical_features=["Age", "Positive_Feedback_Count"], + Supported flavors and component types are as follows: - categorical_features=[ + Component Type Flavor(s) Artifact Store s3, gcp, minio Container Registry aws, + gcp Experiment Tracker mlflow Orchestrator kubernetes, kubeflow, tekton, vertex + MLOps Platform zenml Model Deployer seldon Step Operator sagemaker, vertex - "Division_Name", + MLStacks currently only supports deployments using AWS, GCP, and K3D as providers. - "Department_Name", + Want more details on how this works internally? - "Class_Name", + The stack recipe CLI interacts with the mlstacks repository to fetch the recipes + and stores them locally in the Global Config directory. - ], + This is where you could potentially make any changes you want to the recipe files. + You can also use native terraform commands like terraform apply to deploy components + but this would require you to pass the variables manually using the -var-file + flag to the terraform CLI. - text_features=["Review_Text", "Title"], + CLI Options for zenml stack deploy - test_suite = TestSuite( + Current required options to be passed in to the zenml stack deploy subcommand + are: - tests=[ + -p or --provider: The cloud provider to deploy the stack on. Currently supported + providers are aws, gcp, and k3d. - DataQualityTestPreset(), + -n or --name: The name of the stack to be deployed. This is used to identify the + stack in ZenML. - generate_column_tests( + -r or --region: The region to deploy the stack in. - TestColumnRegExp, + The remaining options relate to which components you want to deploy. - columns=["Review_Text", "Title"], + If you want to pass an mlstacks stack specification file into the CLI to use for + deployment, you can do so with the -f option. Similarly, if you wish to see more + of the Terraform logging, prompts and output, you can pass the -d flag to turn + on debug-mode. - parameters={"reg_exp": r"[A-Z][A-Za-z0-9 ]*"} + Any extra configuration for specific components (as noted in the individual component + deployment documentation) can be passed in with the -x option. This option can + be used multiple times to pass in multiple configurations.' + - 'token_hex - # The datasets are those that are passed to the Evidently step + token_hex(32)or:Copyopenssl rand -hex 32Important: If you configure encryption + for your SQL database secrets store, you should keep the ZENML_SECRETS_STORE_ENCRYPTION_KEY + value somewhere safe and secure, as it will always be required by the ZenML server + to decrypt the secrets in the database. If you lose the encryption key, you will + not be able to decrypt the secrets in the database and will have to reset them. - # as input artifacts + These configuration options are only relevant if you''re using the AWS Secrets + Manager as the secrets store backend. - test_suite.run( + ZENML_SECRETS_STORE_TYPE: Set this to aws in order to set this type of secret + store. - current_data=current_dataset, + The AWS Secrets Store uses the ZenML AWS Service Connector under the hood to authenticate + with the AWS Secrets Manager API. This means that you can use any of the authentication + methods supported by the AWS Service Connector to authenticate with the AWS Secrets + Manager API. - reference_data=reference_dataset, + "Version": "2012-10-17", - column_mapping=column_mapping, + "Statement": [ - Let''s break this down... + "Sid": "ZenMLSecretsStore", - We configure the evidently_test_step using parameters that you would normally - pass to the Evidently TestSuite object to configure and run an Evidently test - suite . It consists of the following fields:' -- source_sentence: What is the purpose of the CustomContainerRegistryConfig class - in a ZenML workflow? - sentences: - - 'e the new flavor in the list of available flavors:zenml container-registry flavor - list + "Effect": "Allow", - It is important to draw attention to when and how these base abstractions are - coming into play in a ZenML workflow. + "Action": [ - The CustomContainerRegistryFlavor class is imported and utilized upon the creation - of the custom flavor through the CLI. + "secretsmanager:CreateSecret", - The CustomContainerRegistryConfig class is imported when someone tries to register/update - a stack component with this custom flavor. Especially, during the registration - process of the stack component, the config will be used to validate the values - given by the user. As Config object are inherently pydantic objects, you can also - add your own custom validators here. + "secretsmanager:GetSecretValue", - The CustomContainerRegistry only comes into play when the component is ultimately - in use. + "secretsmanager:DescribeSecret", - The design behind this interaction lets us separate the configuration of the flavor - from its implementation. This way we can register flavors and components even - when the major dependencies behind their implementation are not installed in our - local setting (assuming the CustomContainerRegistryFlavor and the CustomContainerRegistryConfig - are implemented in a different module/path than the actual CustomContainerRegistry). + "secretsmanager:PutSecretValue", - PreviousGitHub Container Registry + "secretsmanager:TagResource", - NextData Validators + "secretsmanager:DeleteSecret" - Last updated 15 days ago' - - "ons. Try it out at https://www.zenml.io/live-demo!Automated Deployments: With\ - \ ZenML, you no longer need to upload custom Docker images to the cloud whenever\ - \ you want to deploy a new model to production. Simply define your ML workflow\ - \ as a ZenML pipeline, let ZenML handle the containerization, and have your model\ - \ automatically deployed to a highly scalable Kubernetes deployment service like\ - \ Seldon.Copyfrom zenml.integrations.seldon.steps import seldon_model_deployer_step\n\ - from my_organization.steps import data_loader_step, model_trainer_step\n\n@pipeline\n\ - def my_pipeline():\n data = data_loader_step()\n model = model_trainer_step(data)\n\ - \ seldon_model_deployer_step(model)\n\n\U0001F680 Learn More\n\nReady to manage\ - \ your ML lifecycles end-to-end with ZenML? Here is a collection of pages you\ - \ can take a look at next:\n\nGet started with ZenML and learn how to build your\ - \ first pipeline and stack.\n\nDiscover advanced ZenML features like config management\ - \ and containerization.\n\nExplore ZenML through practical use-case examples.\n\ - \nNextInstallation\n\nLast updated 14 days ago" - - 'our active stack: + ], - from zenml.client import Clientexperiment_tracker = Client().active_stack.experiment_tracker + "Resource": "arn:aws:secretsmanager:::secret:zenml/*" - @step(experiment_tracker=experiment_tracker.name) + The following configuration options are supported: - def tf_trainer(...): + ZENML_SECRETS_STORE_AUTH_METHOD: The AWS Service Connector authentication method + to use (e.g. secret-key or iam-role). - ... + ZENML_SECRETS_STORE_AUTH_CONFIG: The AWS Service Connector configuration, in JSON + format (e.g. {"aws_access_key_id":"","aws_secret_access_key":"","region":""}). - MLflow UI + Note: The remaining configuration options are deprecated and may be removed in + a future release. Instead, you should set the ZENML_SECRETS_STORE_AUTH_METHOD + and ZENML_SECRETS_STORE_AUTH_CONFIG variables to use the AWS Service Connector + authentication method.' + - '_settings}) - MLflow comes with its own UI that you can use to find further details about your - tracked experiments. + def my_pipeline() -> None: - You can find the URL of the MLflow experiment linked to a specific ZenML run via - the metadata of the step in which the experiment tracker was used: + my_step()# Or configure the pipelines options - from zenml.client import Client + my_pipeline = my_pipeline.with_options( - last_run = client.get_pipeline("").last_run + settings={"docker": docker_settings} - trainer_step = last_run.get_step("") + Configuring them on a step gives you more fine-grained control and enables you + to build separate specialized Docker images for different steps of your pipelines: - tracking_url = trainer_step.run_metadata["experiment_tracker_url"].value + docker_settings = DockerSettings() - print(tracking_url) + # Either add it to the decorator - This will be the URL of the corresponding experiment in your deployed MLflow instance, - or a link to the corresponding mlflow experiment file if you are using local MLflow. + @step(settings={"docker": docker_settings}) - If you are using local MLflow, you can use the mlflow ui command to start MLflow - at localhost:5000 where you can then explore the UI in your browser. + def my_step() -> None: - mlflow ui --backend-store-uri + pass - Additional configuration + # Or configure the step options - For additional configuration of the MLflow experiment tracker, you can pass MLFlowExperimentTrackerSettings - to create nested runs or add additional tags to your MLflow runs: + my_step = my_step.with_options( - import mlflow + settings={"docker": docker_settings} - from zenml.integrations.mlflow.flavors.mlflow_experiment_tracker_flavor import - MLFlowExperimentTrackerSettings + Using a YAML configuration file as described here: - mlflow_settings = MLFlowExperimentTrackerSettings( + settings: - nested=True, + docker: - tags={"key": "value"} + ... - @step( + steps: - experiment_tracker="", + step_name: - settings={ + settings: - "experiment_tracker.mlflow": mlflow_settings + docker: - def step_one( + ... - data: np.ndarray, + Check out this page for more information on the hierarchy and precedence of the + various ways in which you can supply the settings. - ) -> np.ndarray: + Using a custom parent image - ... + By default, ZenML performs all the steps described above on top of the official + ZenML image for the Python and ZenML version in the active Python environment. + To have more control over the entire environment used to execute your pipelines, + you can either specify a custom pre-built parent image or a Dockerfile that ZenML + uses to build a parent image for you. - Check out the SDK docs for a full list of available attributes and this docs page - for more information on how to specify settings. + If you''re going to use a custom parent image (either pre-built or by specifying + a Dockerfile), you need to make sure that it has Python, pip, and ZenML installed + for it to work. If you need a starting point, you can take a look at the Dockerfile + that ZenML uses here. - PreviousComet + Using a pre-built parent image - NextNeptune + To use a static parent image (e.g., with internal dependencies installed) that + doesn''t need to be rebuilt on every pipeline run, specify it in the Docker settings + for your pipeline: - Last updated 15 days ago' + docker_settings = DockerSettings(parent_image="my_registry.io/image_name:tag") + + + @pipeline(settings={"docker": docker_settings}) + + + def my_pipeline(...): + + + ... + + + To use this image directly to run your steps without including any code or installing + any requirements on top of it, skip the Docker builds by specifying it in the + Docker settings:' model-index: - name: zenml/finetuned-snowflake-arctic-embed-m results: @@ -1249,49 +1285,49 @@ model-index: type: dim_384 metrics: - type: cosine_accuracy@1 - value: 0.3373493975903614 + value: 0.3433734939759036 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.572289156626506 + value: 0.6445783132530121 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6927710843373494 + value: 0.7048192771084337 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7951807228915663 + value: 0.7891566265060241 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.3373493975903614 + value: 0.3433734939759036 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.19076305220883533 + value: 0.21485943775100397 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.13855421686746985 + value: 0.1409638554216867 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07951807228915661 + value: 0.0789156626506024 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.3373493975903614 + value: 0.3433734939759036 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.572289156626506 + value: 0.6445783132530121 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6927710843373494 + value: 0.7048192771084337 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7951807228915663 + value: 0.7891566265060241 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5579541825293861 + value: 0.573090139827556 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.48281937272901143 + value: 0.5032797858099063 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4881409762768584 + value: 0.5097554744597325 name: Cosine Map@100 - task: type: information-retrieval @@ -1301,49 +1337,49 @@ model-index: type: dim_256 metrics: - type: cosine_accuracy@1 - value: 0.3433734939759036 + value: 0.30120481927710846 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5963855421686747 + value: 0.6144578313253012 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6626506024096386 + value: 0.6927710843373494 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7469879518072289 + value: 0.7650602409638554 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.3433734939759036 + value: 0.30120481927710846 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.19879518072289157 + value: 0.20481927710843373 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.1325301204819277 + value: 0.13855421686746983 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07469879518072287 + value: 0.07650602409638553 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.3433734939759036 + value: 0.30120481927710846 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5963855421686747 + value: 0.6144578313253012 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6626506024096386 + value: 0.6927710843373494 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7469879518072289 + value: 0.7650602409638554 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5462463547623214 + value: 0.5423414051340752 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.4817340791738385 + value: 0.469719353604896 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.48971987967160924 + value: 0.47720088094729723 name: Cosine Map@100 - task: type: information-retrieval @@ -1353,49 +1389,49 @@ model-index: type: dim_128 metrics: - type: cosine_accuracy@1 - value: 0.29518072289156627 + value: 0.3433734939759036 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5301204819277109 + value: 0.6325301204819277 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6445783132530121 + value: 0.6686746987951807 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7349397590361446 + value: 0.7409638554216867 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.29518072289156627 + value: 0.3433734939759036 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.17670682730923695 + value: 0.21084337349397586 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.1289156626506024 + value: 0.1337349397590361 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07349397590361444 + value: 0.07409638554216866 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.29518072289156627 + value: 0.3433734939759036 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5301204819277109 + value: 0.6325301204819277 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6445783132530121 + value: 0.6686746987951807 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7349397590361446 + value: 0.7409638554216867 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5127103099003618 + value: 0.5525841372437652 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.44137980493402174 + value: 0.4909519028494932 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4487298407008574 + value: 0.49975471886162304 name: Cosine Map@100 - task: type: information-retrieval @@ -1405,49 +1441,49 @@ model-index: type: dim_64 metrics: - type: cosine_accuracy@1 - value: 0.27710843373493976 + value: 0.2289156626506024 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5180722891566265 + value: 0.5120481927710844 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.5542168674698795 + value: 0.608433734939759 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.6626506024096386 + value: 0.7108433734939759 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.27710843373493976 + value: 0.2289156626506024 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.17269076305220885 + value: 0.17068273092369476 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.1108433734939759 + value: 0.12168674698795179 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.06626506024096383 + value: 0.07108433734939758 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.27710843373493976 + value: 0.2289156626506024 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5180722891566265 + value: 0.5120481927710844 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.5542168674698795 + value: 0.608433734939759 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.6626506024096386 + value: 0.7108433734939759 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.46724356296794395 + value: 0.46952259375314126 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.4052663033084721 + value: 0.39217345572767276 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.41253529663957095 + value: 0.4002406001397042 name: Cosine Map@100 --- @@ -1501,9 +1537,9 @@ from sentence_transformers import SentenceTransformer model = SentenceTransformer("zenml/finetuned-snowflake-arctic-embed-m") # Run inference sentences = [ - 'What is the purpose of the CustomContainerRegistryConfig class in a ZenML workflow?', - 'e the new flavor in the list of available flavors:zenml container-registry flavor list\n\nIt is important to draw attention to when and how these base abstractions are coming into play in a ZenML workflow.\n\nThe CustomContainerRegistryFlavor class is imported and utilized upon the creation of the custom flavor through the CLI.\n\nThe CustomContainerRegistryConfig class is imported when someone tries to register/update a stack component with this custom flavor. Especially, during the registration process of the stack component, the config will be used to validate the values given by the user. As Config object are inherently pydantic objects, you can also add your own custom validators here.\n\nThe CustomContainerRegistry only comes into play when the component is ultimately in use.\n\nThe design behind this interaction lets us separate the configuration of the flavor from its implementation. This way we can register flavors and components even when the major dependencies behind their implementation are not installed in our local setting (assuming the CustomContainerRegistryFlavor and the CustomContainerRegistryConfig are implemented in a different module/path than the actual CustomContainerRegistry).\n\nPreviousGitHub Container Registry\n\nNextData Validators\n\nLast updated 15 days ago', - 'our active stack:\n\nfrom zenml.client import Clientexperiment_tracker = Client().active_stack.experiment_tracker\n\n@step(experiment_tracker=experiment_tracker.name)\n\ndef tf_trainer(...):\n\n...\n\nMLflow UI\n\nMLflow comes with its own UI that you can use to find further details about your tracked experiments.\n\nYou can find the URL of the MLflow experiment linked to a specific ZenML run via the metadata of the step in which the experiment tracker was used:\n\nfrom zenml.client import Client\n\nlast_run = client.get_pipeline("").last_run\n\ntrainer_step = last_run.get_step("")\n\ntracking_url = trainer_step.run_metadata["experiment_tracker_url"].value\n\nprint(tracking_url)\n\nThis will be the URL of the corresponding experiment in your deployed MLflow instance, or a link to the corresponding mlflow experiment file if you are using local MLflow.\n\nIf you are using local MLflow, you can use the mlflow ui command to start MLflow at localhost:5000 where you can then explore the UI in your browser.\n\nmlflow ui --backend-store-uri \n\nAdditional configuration\n\nFor additional configuration of the MLflow experiment tracker, you can pass MLFlowExperimentTrackerSettings to create nested runs or add additional tags to your MLflow runs:\n\nimport mlflow\n\nfrom zenml.integrations.mlflow.flavors.mlflow_experiment_tracker_flavor import MLFlowExperimentTrackerSettings\n\nmlflow_settings = MLFlowExperimentTrackerSettings(\n\nnested=True,\n\ntags={"key": "value"}\n\n@step(\n\nexperiment_tracker="",\n\nsettings={\n\n"experiment_tracker.mlflow": mlflow_settings\n\ndef step_one(\n\ndata: np.ndarray,\n\n) -> np.ndarray:\n\n...\n\nCheck out the SDK docs for a full list of available attributes and this docs page for more information on how to specify settings.\n\nPreviousComet\n\nNextNeptune\n\nLast updated 15 days ago', + 'Can you explain how to deploy a stack on AWS using the ZenML stack deploy command with S3 and Sagemaker?', + 'r eu-north-1 -x bucket_name=my_bucket -o sagemakerThis command deploys a stack on AWS that uses an S3 bucket as an artifact store and Sagemaker as your orchestrator. The stack will be imported into ZenML once the deployment is complete and you can start using it right away!\n\nSupported flavors and component types are as follows:\n\nComponent Type Flavor(s) Artifact Store s3, gcp, minio Container Registry aws, gcp Experiment Tracker mlflow Orchestrator kubernetes, kubeflow, tekton, vertex MLOps Platform zenml Model Deployer seldon Step Operator sagemaker, vertex\n\nMLStacks currently only supports deployments using AWS, GCP, and K3D as providers.\n\nWant more details on how this works internally?\n\nThe stack recipe CLI interacts with the mlstacks repository to fetch the recipes and stores them locally in the Global Config directory.\n\nThis is where you could potentially make any changes you want to the recipe files. You can also use native terraform commands like terraform apply to deploy components but this would require you to pass the variables manually using the -var-file flag to the terraform CLI.\n\nCLI Options for zenml stack deploy\n\nCurrent required options to be passed in to the zenml stack deploy subcommand are:\n\n-p or --provider: The cloud provider to deploy the stack on. Currently supported providers are aws, gcp, and k3d.\n\n-n or --name: The name of the stack to be deployed. This is used to identify the stack in ZenML.\n\n-r or --region: The region to deploy the stack in.\n\nThe remaining options relate to which components you want to deploy.\n\nIf you want to pass an mlstacks stack specification file into the CLI to use for deployment, you can do so with the -f option. Similarly, if you wish to see more of the Terraform logging, prompts and output, you can pass the -d flag to turn on debug-mode.\n\nAny extra configuration for specific components (as noted in the individual component deployment documentation) can be passed in with the -x option. This option can be used multiple times to pass in multiple configurations.', + '_settings})\n\ndef my_pipeline() -> None:\n\nmy_step()# Or configure the pipelines options\n\nmy_pipeline = my_pipeline.with_options(\n\nsettings={"docker": docker_settings}\n\nConfiguring them on a step gives you more fine-grained control and enables you to build separate specialized Docker images for different steps of your pipelines:\n\ndocker_settings = DockerSettings()\n\n# Either add it to the decorator\n\n@step(settings={"docker": docker_settings})\n\ndef my_step() -> None:\n\npass\n\n# Or configure the step options\n\nmy_step = my_step.with_options(\n\nsettings={"docker": docker_settings}\n\nUsing a YAML configuration file as described here:\n\nsettings:\n\ndocker:\n\n...\n\nsteps:\n\nstep_name:\n\nsettings:\n\ndocker:\n\n...\n\nCheck out this page for more information on the hierarchy and precedence of the various ways in which you can supply the settings.\n\nUsing a custom parent image\n\nBy default, ZenML performs all the steps described above on top of the official ZenML image for the Python and ZenML version in the active Python environment. To have more control over the entire environment used to execute your pipelines, you can either specify a custom pre-built parent image or a Dockerfile that ZenML uses to build a parent image for you.\n\nIf you\'re going to use a custom parent image (either pre-built or by specifying a Dockerfile), you need to make sure that it has Python, pip, and ZenML installed for it to work. If you need a starting point, you can take a look at the Dockerfile that ZenML uses here.\n\nUsing a pre-built parent image\n\nTo use a static parent image (e.g., with internal dependencies installed) that doesn\'t need to be rebuilt on every pipeline run, specify it in the Docker settings for your pipeline:\n\ndocker_settings = DockerSettings(parent_image="my_registry.io/image_name:tag")\n\n@pipeline(settings={"docker": docker_settings})\n\ndef my_pipeline(...):\n\n...\n\nTo use this image directly to run your steps without including any code or installing any requirements on top of it, skip the Docker builds by specifying it in the Docker settings:', ] embeddings = model.encode(sentences) print(embeddings.shape) @@ -1549,21 +1585,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.3373 | -| cosine_accuracy@3 | 0.5723 | -| cosine_accuracy@5 | 0.6928 | -| cosine_accuracy@10 | 0.7952 | -| cosine_precision@1 | 0.3373 | -| cosine_precision@3 | 0.1908 | -| cosine_precision@5 | 0.1386 | -| cosine_precision@10 | 0.0795 | -| cosine_recall@1 | 0.3373 | -| cosine_recall@3 | 0.5723 | -| cosine_recall@5 | 0.6928 | -| cosine_recall@10 | 0.7952 | -| cosine_ndcg@10 | 0.558 | -| cosine_mrr@10 | 0.4828 | -| **cosine_map@100** | **0.4881** | +| cosine_accuracy@1 | 0.3434 | +| cosine_accuracy@3 | 0.6446 | +| cosine_accuracy@5 | 0.7048 | +| cosine_accuracy@10 | 0.7892 | +| cosine_precision@1 | 0.3434 | +| cosine_precision@3 | 0.2149 | +| cosine_precision@5 | 0.141 | +| cosine_precision@10 | 0.0789 | +| cosine_recall@1 | 0.3434 | +| cosine_recall@3 | 0.6446 | +| cosine_recall@5 | 0.7048 | +| cosine_recall@10 | 0.7892 | +| cosine_ndcg@10 | 0.5731 | +| cosine_mrr@10 | 0.5033 | +| **cosine_map@100** | **0.5098** | #### Information Retrieval * Dataset: `dim_256` @@ -1571,21 +1607,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.3434 | -| cosine_accuracy@3 | 0.5964 | -| cosine_accuracy@5 | 0.6627 | -| cosine_accuracy@10 | 0.747 | -| cosine_precision@1 | 0.3434 | -| cosine_precision@3 | 0.1988 | -| cosine_precision@5 | 0.1325 | -| cosine_precision@10 | 0.0747 | -| cosine_recall@1 | 0.3434 | -| cosine_recall@3 | 0.5964 | -| cosine_recall@5 | 0.6627 | -| cosine_recall@10 | 0.747 | -| cosine_ndcg@10 | 0.5462 | -| cosine_mrr@10 | 0.4817 | -| **cosine_map@100** | **0.4897** | +| cosine_accuracy@1 | 0.3012 | +| cosine_accuracy@3 | 0.6145 | +| cosine_accuracy@5 | 0.6928 | +| cosine_accuracy@10 | 0.7651 | +| cosine_precision@1 | 0.3012 | +| cosine_precision@3 | 0.2048 | +| cosine_precision@5 | 0.1386 | +| cosine_precision@10 | 0.0765 | +| cosine_recall@1 | 0.3012 | +| cosine_recall@3 | 0.6145 | +| cosine_recall@5 | 0.6928 | +| cosine_recall@10 | 0.7651 | +| cosine_ndcg@10 | 0.5423 | +| cosine_mrr@10 | 0.4697 | +| **cosine_map@100** | **0.4772** | #### Information Retrieval * Dataset: `dim_128` @@ -1593,21 +1629,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.2952 | -| cosine_accuracy@3 | 0.5301 | -| cosine_accuracy@5 | 0.6446 | -| cosine_accuracy@10 | 0.7349 | -| cosine_precision@1 | 0.2952 | -| cosine_precision@3 | 0.1767 | -| cosine_precision@5 | 0.1289 | -| cosine_precision@10 | 0.0735 | -| cosine_recall@1 | 0.2952 | -| cosine_recall@3 | 0.5301 | -| cosine_recall@5 | 0.6446 | -| cosine_recall@10 | 0.7349 | -| cosine_ndcg@10 | 0.5127 | -| cosine_mrr@10 | 0.4414 | -| **cosine_map@100** | **0.4487** | +| cosine_accuracy@1 | 0.3434 | +| cosine_accuracy@3 | 0.6325 | +| cosine_accuracy@5 | 0.6687 | +| cosine_accuracy@10 | 0.741 | +| cosine_precision@1 | 0.3434 | +| cosine_precision@3 | 0.2108 | +| cosine_precision@5 | 0.1337 | +| cosine_precision@10 | 0.0741 | +| cosine_recall@1 | 0.3434 | +| cosine_recall@3 | 0.6325 | +| cosine_recall@5 | 0.6687 | +| cosine_recall@10 | 0.741 | +| cosine_ndcg@10 | 0.5526 | +| cosine_mrr@10 | 0.491 | +| **cosine_map@100** | **0.4998** | #### Information Retrieval * Dataset: `dim_64` @@ -1615,21 +1651,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.2771 | -| cosine_accuracy@3 | 0.5181 | -| cosine_accuracy@5 | 0.5542 | -| cosine_accuracy@10 | 0.6627 | -| cosine_precision@1 | 0.2771 | -| cosine_precision@3 | 0.1727 | -| cosine_precision@5 | 0.1108 | -| cosine_precision@10 | 0.0663 | -| cosine_recall@1 | 0.2771 | -| cosine_recall@3 | 0.5181 | -| cosine_recall@5 | 0.5542 | -| cosine_recall@10 | 0.6627 | -| cosine_ndcg@10 | 0.4672 | -| cosine_mrr@10 | 0.4053 | -| **cosine_map@100** | **0.4125** | +| cosine_accuracy@1 | 0.2289 | +| cosine_accuracy@3 | 0.512 | +| cosine_accuracy@5 | 0.6084 | +| cosine_accuracy@10 | 0.7108 | +| cosine_precision@1 | 0.2289 | +| cosine_precision@3 | 0.1707 | +| cosine_precision@5 | 0.1217 | +| cosine_precision@10 | 0.0711 | +| cosine_recall@1 | 0.2289 | +| cosine_recall@3 | 0.512 | +| cosine_recall@5 | 0.6084 | +| cosine_recall@10 | 0.7108 | +| cosine_ndcg@10 | 0.4695 | +| cosine_mrr@10 | 0.3922 | +| **cosine_map@100** | **0.4002** |