diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -31,632 +31,817 @@ tags: - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss widget: -- source_sentence: What is the RESOURCE NAME for the kubernetes-cluster in the ZenML - documentation? +- source_sentence: How do I register a Discord Alerter in ZenML to send automated + alerts? sentences: - - ' ┃┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + - 'Discord Alerter - ┃ RESOURCE TYPES │ 🌀 kubernetes-cluster ┃ + Sending automated alerts to a Discord channel. - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + The DiscordAlerter enables you to send messages to a dedicated Discord channel + directly from within your ZenML pipelines. - ┃ RESOURCE NAME │ arn:aws:eks:us-east-1:715803424590:cluster/zenhacks-cluster ┃ + The discord integration contains the following two standard steps: - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + discord_alerter_post_step takes a string message, posts it to a Discord channel, + and returns whether the operation was successful. - ┃ SECRET ID │ ┃ + discord_alerter_ask_step also posts a message to a Discord channel, but waits + for user feedback, and only returns True if a user explicitly approved the operation + from within Discord (e.g., by sending "approve" / "reject" to the bot in response). - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + Interacting with Discord from within your pipelines can be very useful in practice: - ┃ SESSION DURATION │ N/A ┃ + The discord_alerter_post_step allows you to get notified immediately when failures + happen (e.g., model performance degradation, data drift, ...), - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + The discord_alerter_ask_step allows you to integrate a human-in-the-loop into + your pipelines before executing critical steps, such as deploying new models. - ┃ EXPIRES IN │ 11h59m57s ┃ + How to use it - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + Requirements - ┃ OWNER │ default ┃ + Before you can use the DiscordAlerter, you first need to install ZenML''s discord + integration: - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + zenml integration install discord -y - ┃ WORKSPACE │ default ┃ + See the Integrations page for more details on ZenML integrations and how to install + and use them. - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + Setting Up a Discord Bot - ┃ SHARED │ ➖ ┃ + In order to use the DiscordAlerter, you first need to have a Discord workspace + set up with a channel that you want your pipelines to post to. This is the + you will need when registering the discord alerter component. - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + Then, you need to create a Discord App with a bot in your server . - ┃ CREATED_AT │ 2023-06-16 10:17:46.931091 ┃ + Note in the bot token copy step, if you don''t find the copy button then click + on reset token to reset the bot and you will get a new token which you can use. + Also, make sure you give necessary permissions to the bot required for sending + and receiving messages. - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + Registering a Discord Alerter in ZenML' + - 'af89af ┃┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - ┃ UPDATED_AT │ 2023-06-16 10:17:46.931094 ┃ + ┃ NAME │ azure-session-token ┃ - ┗━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - Configuration' - - 'urns it with the configuration of the cloud stack.Based on the stack info and - pipeline specification, the client builds and pushes an image to the container - registry. The image contains the environment needed to execute the pipeline and - the code of the steps. + ┃ TYPE │ 🇦 azure ┃ - The client creates a run in the orchestrator. For example, in the case of the - Skypilot orchestrator, it creates a virtual machine in the cloud with some commands - to pull and run a Docker image from the specified container registry. + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - The orchestrator pulls the appropriate image from the container registry as it''s - executing the pipeline (each step has an image). + ┃ AUTH METHOD │ access-token ┃ - As each pipeline runs, it stores artifacts physically in the artifact store. Of - course, this artifact store needs to be some form of cloud storage. + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - As each pipeline runs, it reports status back to the ZenML server and optionally - queries the server for metadata. + ┃ RESOURCE TYPES │ 🇦 azure-generic, 📦 blob-container, 🌀 kubernetes-cluster, + 🐳 docker-registry ┃ - Provisioning and registering a Skypilot orchestrator alongside a container registry + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - While there are detailed docs on how to set up a Skypilot orchestrator and a container - registry on each public cloud, we have put the most relevant details here for - convenience: + ┃ RESOURCE NAME │ ┃ - In order to launch a pipeline on AWS with the SkyPilot orchestrator, the first - thing that you need to do is to install the AWS and Skypilot integrations: + ┠───────────��──────┼────────────────────────────────────────────────────────────────────────────────┨ - zenml integration install aws skypilot_aws -y + ┃ SECRET ID │ b34f2e95-ae16-43b6-8ab6-f0ee33dbcbd8 ┃ - Before we start registering any components, there is another step that we have - to execute. As we explained in the previous section, components such as orchestrators - and container registries often require you to set up the right permissions. In - ZenML, this process is simplified with the use of Service Connectors. For this - example, we need to use the IAM role authentication method of our AWS service - connector: + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - AWS_PROFILE= zenml service-connector register cloud_connector --type - aws --auto-configure + ┃ SESSION DURATION │ N/A ┃ - Once the service connector is set up, we can register a Skypilot orchestrator: + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - zenml orchestrator register skypilot_orchestrator -f vm_aws' - - 'pose -f /path/to/docker-compose.yml -p zenml up -dYou need to visit the ZenML - dashboard at http://localhost:8080 to activate the server by creating an initial - admin account. You can then connect your client to the server with the web login - flow: + ┃ EXPIRES IN │ 42m25s ┃ - zenml connect --url http://localhost:8080 + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - Tearing down the installation is as simple as running: + ┃ OWNER │ default ┃ - docker-compose -p zenml down + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨' + - 'Storing embeddings in a vector database - Database backup and recovery + Store embeddings in a vector database for efficient retrieval. - An automated database backup and recovery feature is enabled by default for all - Docker deployments. The ZenML server will automatically back up the database in-memory - before every database schema migration and restore it if the migration fails. + The process of generating the embeddings doesn''t take too long, especially if + the machine on which the step is running has a GPU, but it''s still not something + we want to do every time we need to retrieve a document. Instead, we can store + the embeddings in a vector database, which allows us to quickly retrieve the most + relevant chunks based on their similarity to the query. - The database backup automatically created by the ZenML server is only temporary - and only used as an immediate recovery in case of database migration failures. - It is not meant to be used as a long-term backup solution. If you need to back - up your database for long-term storage, you should use a dedicated backup solution. + For the purposes of this guide, we''ll use PostgreSQL as our vector database. + This is a popular choice for storing embeddings, as it provides a scalable and + efficient way to store and retrieve high-dimensional vectors. However, you can + use any vector database that supports high-dimensional vectors. If you want to + explore a list of possible options, this is a good website to compare different + options. - Several database backup strategies are supported, depending on where and how the - backup is stored. The strategy can be configured by means of the ZENML_STORE_BACKUP_STRATEGY - environment variable: + For more information on how to set up a PostgreSQL database to follow along with + this guide, please see the instructions in the repository which show how to set + up a PostgreSQL database using Supabase. - disabled - no backup is performed + Since PostgreSQL is a well-known and battle-tested database, we can use known + and minimal packages to connect and to interact with it. We can use the psycopg2 + package to connect and then raw SQL statements to interact with the database. - in-memory - the database schema and data are stored in memory. This is the fastest - backup strategy, but the backup is not persisted across container restarts, so - no manual intervention is possible in case the automatic DB recovery fails after - a failed DB migration. Adequate memory resources should be allocated to the ZenML - server container when using this backup strategy with larger databases. This is - the default backup strategy.' -- source_sentence: What are the benefits of deploying ZenML to a production environment? - sentences: - - 'graph that includes custom TRANSFORMER and ROUTER.If you are looking for a more - easy way to deploy your models locally, you can use the MLflow Model Deployer - flavor. + The code for the step is fairly simple: - How to deploy it? + from zenml import step - ZenML provides a Seldon Core flavor build on top of the Seldon Core Integration - to allow you to deploy and use your models in a production-grade environment. - In order to use the integration you need to install it on your local machine to - be able to register a Seldon Core Model deployer with ZenML and add it to your - stack: + @step - zenml integration install seldon -y + def index_generator( - To deploy and make use of the Seldon Core integration we need to have the following - prerequisites: + documents: List[Document], - access to a Kubernetes cluster. This can be configured using the kubernetes_context - configuration attribute to point to a local kubectl context or an in-cluster configuration, - but the recommended approach is to use a Service Connector to link the Seldon - Deployer Stack Component to a Kubernetes cluster. + ) -> None: - Seldon Core needs to be preinstalled and running in the target Kubernetes cluster. - Check out the official Seldon Core installation instructions or the EKS installation - example below. + try: - models deployed with Seldon Core need to be stored in some form of persistent - shared storage that is accessible from the Kubernetes cluster where Seldon Core - is installed (e.g. AWS S3, GCS, Azure Blob Storage, etc.). You can use one of - the supported remote artifact store flavors to store your models as part of your - stack. For a smoother experience running Seldon Core with a cloud artifact store, - we also recommend configuring explicit credentials for the artifact store. The - Seldon Core model deployer knows how to automatically convert those credentials - in the format needed by Seldon Core model servers to authenticate to the storage - back-end where models are stored. + conn = get_db_conn() - Since the Seldon Model Deployer is interacting with the Seldon Core model server - deployed on a Kubernetes cluster, you need to provide a set of configuration parameters. - These parameters are:' - - 'S Secrets Manager accounts or regions may be used.Always make sure that the backup - Secrets Store is configured to use a different location than the primary Secrets - Store. The location can be different in terms of the Secrets Store back-end type - (e.g. internal database vs. AWS Secrets Manager) or the actual location of the - Secrets Store back-end (e.g. different AWS Secrets Manager account or region, - GCP Secret Manager project or Azure Key Vault''s vault). + with conn.cursor() as cur: - Using the same location for both the primary and backup Secrets Store will not - provide any additional benefits and may even result in unexpected behavior. + # Install pgvector if not already installed - When a backup secrets store is in use, the ZenML Server will always attempt to - read and write secret values from/to the primary Secrets Store first while ensuring - to keep the backup Secrets Store in sync. If the primary Secrets Store is unreachable, - if the secret values are not found there or any otherwise unexpected error occurs, - the ZenML Server falls back to reading and writing from/to the backup Secrets - Store. Only if the backup Secrets Store is also unavailable, the ZenML Server - will return an error. + cur.execute("CREATE EXTENSION IF NOT EXISTS vector") - In addition to the hidden backup operations, users can also explicitly trigger - a backup operation by using the zenml secret backup CLI command. This command - will attempt to read all secrets from the primary Secrets Store and write them - to the backup Secrets Store. Similarly, the zenml secret restore CLI command can - be used to restore secrets from the backup Secrets Store to the primary Secrets - Store. These CLI commands are useful for migrating secrets from one Secrets Store - to another. + conn.commit() - Secrets migration strategy + # Create the embeddings table if it doesn''t exist - Sometimes you may need to change the external provider or location where secrets - values are stored by the Secrets Store. The immediate implication of this is that - the ZenML server will no longer be able to access existing secrets with the new - configuration until they are also manually copied to the new location. Some examples - of such changes include:' - - '🤔Deploying ZenML + table_create_command = f""" - Why do we need to deploy ZenML? + CREATE TABLE IF NOT EXISTS embeddings ( - Moving your ZenML Server to a production environment offers several benefits over - staying local: + id SERIAL PRIMARY KEY, - Scalability: Production environments are designed to handle large-scale workloads, - allowing your models to process more data and deliver faster results. + content TEXT, - Reliability: Production-grade infrastructure ensures high availability and fault - tolerance, minimizing downtime and ensuring consistent performance. + token_count INTEGER, - Collaboration: A shared production environment enables seamless collaboration - between team members, making it easier to iterate on models and share insights. + embedding VECTOR({EMBEDDING_DIMENSIONALITY}), - Despite these advantages, transitioning to production can be challenging due to - the complexities involved in setting up the needed infrastructure. + filename TEXT, - ZenML Server + parent_section TEXT, - When you first get started with ZenML, it relies with the following architecture - on your machine. + url TEXT - The SQLite database that you can see in this diagram is used to store information - about pipelines, pipeline runs, stacks, and other configurations. Users can run - the zenml up command to spin up a local REST server to serve the dashboard. The - diagram for this looks as follows: + ); - In Scenario 2, the zenml up command implicitly connects the client to the server. + """ - Currently the ZenML server supports a legacy and a brand-new version of the dashboard. - To use the legacy version simply use the following command zenml up --legacy + cur.execute(table_create_command) - In order to move into production, the ZenML server needs to be deployed somewhere - centrally so that the different cloud stack components can read from and write - to the server. Additionally, this also allows all your team members to connect - to it and share stacks and pipelines. + conn.commit() - Deploying a ZenML Server' -- source_sentence: What is the tenant_id value in the configuration section? + register_vector(conn)' +- source_sentence: How can I configure a step or pipeline to use a custom materializer + in ZenML? sentences: - - '─────────────────────────────────────────────────┨┃ OWNER │ default ┃ + - 'e following: - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + Question: What are Plasma Phoenixes?Answer: Plasma Phoenixes are majestic creatures + made of pure energy that soar above the chromatic canyons of Zenml World. They + leave fiery trails behind them, painting the sky with dazzling displays of colors. - ┃ WORKSPACE │ default ┃ + Question: What kinds of creatures live on the prismatic shores of ZenML World? - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + Answer: On the prismatic shores of ZenML World, you can find crystalline crabs + scuttling and burrowing with their transparent exoskeletons, which refract light + into a kaleidoscope of hues. - ┃ SHARED │ ➖ ┃ + Question: What is the capital of Panglossia? - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + Answer: The capital of Panglossia is not mentioned in the provided context. - ┃ CREATED_AT │ 2023-06-20 19:16:26.802374 ┃ + The implementation above is by no means sophisticated or performant, but it''s + simple enough that you can see all the moving parts. Our tokenization process + consists of splitting the text into individual words. - ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ + The way we check for similarity between the question / query and the chunks of + text is extremely naive and inefficient. The similarity between the query and + the current chunk is calculated using the Jaccard similarity coefficient. This + coefficient measures the similarity between two sets and is defined as the size + of the intersection divided by the size of the union of the two sets. So we count + the number of words that are common between the query and the chunk and divide + it by the total number of unique words in both the query and the chunk. There + are much better ways of measuring the similarity between two pieces of text, such + as using embeddings or other more sophisticated techniques, but this example is + kept simple for illustrative purposes. - ┃ UPDATED_AT │ 2023-06-20 19:16:26.802378 ┃ + The rest of this guide will showcase a more performant and scalable way of performing + the same task using ZenML. If you ever are unsure why we''re doing something, + feel free to return to this example for the high-level overview. - ┗━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + PreviousRAG with ZenML - Configuration + NextUnderstanding Retrieval-Augmented Generation (RAG) - ┏━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + Last updated 2 months ago' + - 'ral options are presented. - ┃ PROPERTY │ VALUE ┃ + hyperparameter tuning?Our dedicated documentation guide on implementing this is + the place to learn more. - ┠───────────────┼──────────────────────────────────────┨ + reset things when something goes wrong? - ┃ tenant_id │ a79ff333-8f45-4a74-a42e-68871c17b7fb ┃ + To reset your ZenML client, you can run zenml clean which will wipe your local + metadata database and reset your client. Note that this is a destructive action, + so feel free to reach out to us on Slack before doing this if you are unsure. - ┠───────────────┼──────────────────────────────────────┨ + steps that create other steps AKA dynamic pipelines and steps? - ┃ client_id │ 8926254a-8c3f-430a-a2fd-bdab234d491e ┃ + Please read our general information on how to compose steps + pipelines together + to start with. You might also find the code examples in our guide to implementing + hyperparameter tuning which is related to this topic. - ┠───────────────┼──────────────────────────────────────┨ + templates: using starter code with ZenML? - ┃ client_secret │ [HIDDEN] ┃ + Project templates allow you to get going quickly with ZenML. We recommend the + Starter template (starter) for most use cases which gives you a basic scaffold + and structure around which you can write your own code. You can also build templates + for others inside a Git repository and use them with ZenML''s templates functionality. - ┗━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + upgrade my ZenML client and/or server? - Azure Access Token + Upgrading your ZenML client package is as simple as running pip install --upgrade + zenml in your terminal. For upgrading your ZenML server, please refer to the dedicated + documentation section which covers most of the ways you might do this as well + as common troubleshooting steps. - Uses temporary Azure access tokens explicitly configured by the user or auto-configured - from a local environment.' - - ' should pick the one that best fits your use case.If you already have one or - more GCP Service Connectors configured in your ZenML deployment, you can check - which of them can be used to access generic GCP resources like the GCP Image Builder - required for your GCP Image Builder by running e.g.: + use a stack component? - zenml service-connector list-resources --resource-type gcp-generic + For information on how to use a specific stack component, please refer to the + component guide which contains all our tips and advice on how to use each integration + and component with ZenML. - Example Command Output + PreviousAPI reference - The following ''gcp-generic'' resources can be accessed by service connectors - configured in your workspace: + NextMigration guide - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓ + Last updated 18 days ago' + - ' configuration documentation for more information.Custom materializers - ┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE │ RESOURCE - TYPE │ RESOURCE NAMES ┃ + Configuring a step/pipeline to use a custom materializer - ┠──────────────────────────────────────┼────────────────┼────────────────┼────────────────┼────────────────┨ + Defining which step uses what materializer - ┃ bfdb657d-d808-47e7-9974-9ba6e4919d83 │ gcp-generic │ 🔵 gcp │ 🔵 gcp-generic - │ zenml-core ┃ + ZenML automatically detects if your materializer is imported in your source code + and registers them for the corresponding data type (defined in ASSOCIATED_TYPES). + Therefore, just having a custom materializer definition in your code is enough + to enable the respective data type to be used in your pipelines. - ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛ + However, it is best practice to explicitly define which materializer to use for + a specific step and not rely on the ASSOCIATED_TYPES to make that connection: - After having set up or decided on a GCP Service Connector to use to authenticate - to GCP, you can register the GCP Image Builder as follows: + class MyObj: - zenml image-builder register \ + ... + + + class MyMaterializer(BaseMaterializer): + + """Materializer to read data to and from MyObj.""" - --flavor=gcp \ + ASSOCIATED_TYPES = (MyObj) - --cloud_builder_image= \ + ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA - --network= \ + # Read below to learn how to implement this materializer - --build_timeout= + # You can define it at the decorator level - # Connect the GCP Image Builder to GCP via a GCP Service Connector + @step(output_materializers=MyMaterializer) - zenml image-builder connect -i + def my_first_step() -> MyObj: - A non-interactive version that connects the GCP Image Builder to a target GCP - Service Connector: + return 1 - zenml image-builder connect --connector + # No need to explicitly specify materializer here: - Example Command Output + # it is coupled with Artifact Version generated by - $ zenml image-builder connect gcp-image-builder --connector gcp-generic + # `my_first_step` already. - Successfully connected image builder `gcp-image-builder` to the following resources: + def my_second_step(a: MyObj): - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓' - - 'gistry or even more than one type of AWS resource:zenml service-connector register - --type aws -i + print(a) - A non-interactive CLI example that leverages the AWS CLI configuration on your - local machine to auto-configure an AWS Service Connector targeting an ECR registry - is: + # or you can use the `configure()` method of the step. E.g.: - zenml service-connector register --type aws --resource-type docker-registry - --auto-configure + my_first_step.configure(output_materializers=MyMaterializer) - Example Command Output + When there are multiple outputs, a dictionary of type {: } + can be supplied to the decorator or the .configure(...) method: - $ zenml service-connector register aws-us-east-1 --type aws --resource-type docker-registry - --auto-configure + class MyObj1: - ⠸ Registering service connector ''aws-us-east-1''... + + ... - Successfully registered service connector `aws-us-east-1` with access to the following - resources: + class MyObj2: - ┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + ... + + class MyMaterializer1(BaseMaterializer): - ┃ RESOURCE TYPE │ RESOURCE NAMES ┃ + """Materializer to read data to and from MyObj1.""" - ┠────────────────────┼──────────────────────────────────────────────┨ + ASSOCIATED_TYPES = (MyObj1) - ┃ 🐳 docker-registry │ 715803424590.dkr.ecr.us-east-1.amazonaws.com ┃ + ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA - ┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + class MyMaterializer2(BaseMaterializer): - Note: Please remember to grant the entity associated with your AWS credentials - permissions to read and write to one or more ECR repositories as well as to list - accessible ECR repositories. For a full list of permissions required to use an - AWS Service Connector to access an ECR registry, please refer to the AWS Service - Connector ECR registry resource type documentation or read the documentation available - in the interactive CLI commands and dashboard. The AWS Service Connector supports - many different authentication methods with different levels of security and convenience. - You should pick the one that best fits your use case. + """Materializer to read data to and from MyObj2.""" - If you already have one or more AWS Service Connectors configured in your ZenML - deployment, you can check which of them can be used to access the ECR registry - you want to use for your AWS Container Registry by running e.g.: + ASSOCIATED_TYPES = (MyObj2) - zenml service-connector list-resources --connector-type aws --resource-type docker-registry + ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA - Example Command Output' -- source_sentence: How can I customize the Docker settings for individual steps in - a ZenML pipeline? + + # This is where we connect the objects to the materializer + + + @step(output_materializers={"1": MyMaterializer1, "2": MyMaterializer2})' +- source_sentence: How do I use it to start the local Airflow server for ZenML? sentences: - - '🌎Environment Variables + - ' use it + + + To use the Airflow orchestrator, we need:The ZenML airflow integration installed. + If you haven''t done so, runCopyzenml integration install airflow + + + Docker installed and running. - How to control ZenML behavior with environmental variables. + The orchestrator registered and part of our active stack: - There are a few pre-defined environmental variables that can be used to control - the behavior of ZenML. See the list below with default values and options: + zenml orchestrator register \ - Logging verbosity + --flavor=airflow \ - export ZENML_LOGGING_VERBOSITY=INFO + --local=True # set this to `False` if using a remote Airflow deployment - Choose from INFO, WARN, ERROR, CRITICAL, DEBUG. + # Register and activate a stack with the new orchestrator - Disable step logs + zenml stack register -o ... --set - Usually, ZenML stores step logs in the artifact store, but this can sometimes - cause performance bottlenecks, especially if the code utilizes progress bars. + In the local case, we need to reinstall in a certain way for the local Airflow + server: + + + pip install "apache-airflow-providers-docker<3.8.0" "apache-airflow==2.4.0" "pendulum<3.0.0" + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.4.0/constraints-3.9.txt" + + + Please make sure to replace 3.9 with your Python (major) version in the constraints + file URL given above. + + + Once that is installed, we can start the local Airflow server by running the following + command in your terminal. See further below on an alternative way to set up the + Airflow server manually since the zenml stack up command is deprecated. + + + zenml stack up + + + This command will start up an Airflow server on your local machine that''s running + in the same Python environment that you used to provision it. When it is finished, + it will print a username and password which you can use to log in to the Airflow + UI here. + + + As long as you didn''t configure any custom value for the dag_output_dir attribute + of your orchestrator, running a pipeline locally is as simple as calling: + + + python file_that_runs_a_zenml_pipeline.py + + + This call will produce a .zip file containing a representation of your ZenML pipeline + to the Airflow DAGs directory. From there, the local Airflow server will load + it and run your pipeline (It might take a few seconds until the pipeline shows + up in the Airflow UI).' + - 'nentConfig class here. + + + Base Abstraction 3: Flavorfrom zenml.enums import StackComponentType + + + from zenml.stack import Flavor + + + class LocalArtifactStore(BaseArtifactStore): + + + ... + + + class LocalArtifactStoreConfig(BaseArtifactStoreConfig): + + + ... - If you want to configure whether logged output from steps is stored or not, set - the ZENML_DISABLE_STEP_LOGS_STORAGE environment variable to true. Note that this - will mean that logs from your steps will no longer be stored and thus won''t be - visible on the dashboard anymore. + class LocalArtifactStoreFlavor(Flavor): - export ZENML_DISABLE_STEP_LOGS_STORAGE=false + @property - ZenML repository path + def name(self) -> str: - To configure where ZenML will install and look for its repository, set the environment - variable ZENML_REPOSITORY_PATH. + """Returns the name of the flavor.""" - export ZENML_REPOSITORY_PATH=/path/to/somewhere + return "local" - Analytics + @property - Please see our full page on what analytics are tracked and how you can opt out, - but the quick summary is that you can set this to false if you want to opt out - of analytics. + def type(self) -> StackComponentType: - export ZENML_ANALYTICS_OPT_IN=false + """Returns the flavor type.""" - Debug mode + return StackComponentType.ARTIFACT_STORE - Setting to true switches to developer mode: + @property - export ZENML_DEBUG=true + def config_class(self) -> Type[LocalArtifactStoreConfig]: - Active stack + """Config class of this flavor.""" - Setting the ZENML_ACTIVE_STACK_ID to a specific UUID will make the corresponding - stack the active stack: + return LocalArtifactStoreConfig - export ZENML_ACTIVE_STACK_ID= + @property - Prevent pipeline execution + def implementation_class(self) -> Type[LocalArtifactStore]: - When true, this prevents a pipeline from executing: + """Implementation class of this flavor.""" - export ZENML_PREVENT_PIPELINE_EXECUTION=false + return LocalArtifactStore - Disable rich traceback + See the full code of the base Flavor class definition here. - Set to false to disable the rich traceback: + Implementing a Custom Stack Component Flavor - export ZENML_ENABLE_RICH_TRACEBACK=true + Let''s recap what we just learned by reimplementing the S3ArtifactStore from the + aws integration as a custom flavor. - Disable colourful logging + We can start with the configuration class: here we need to define the SUPPORTED_SCHEMES + class variable introduced by the BaseArtifactStore. We also define several additional + configuration values that users can use to configure how the artifact store will + authenticate with AWS: + + + from zenml.artifact_stores import BaseArtifactStoreConfig + + + from zenml.utils.secret_utils import SecretField + + + class MyS3ArtifactStoreConfig(BaseArtifactStoreConfig): + + + """Configuration for the S3 Artifact Store.""" + + + SUPPORTED_SCHEMES: ClassVar[Set[str]] = {"s3://"} + + + key: Optional[str] = SecretField(default=None) + + + secret: Optional[str] = SecretField(default=None) + + + token: Optional[str] = SecretField(default=None) + + + client_kwargs: Optional[Dict[str, Any]] = None + + + config_kwargs: Optional[Dict[str, Any]] = None + + + s3_additional_kwargs: Optional[Dict[str, Any]] = None + + + You can pass sensitive configuration values as secrets by defining them as type + SecretField in the configuration class.' + - '─────────────────────────────────────────────────┨┃ UUID │ 2b7773eb-d371-4f24-96f1-fad15e74fd6e ┃ + + + ┠────────────────────┼──────────────────────────────────────────────────────────────────────────────┨ + + + ┃ PATH │ /home/stefan/.config/zenml/local_stores/2b7773eb-d371-4f24-96f1-fad15e74fd6e + ┃ + + + ┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + + + As shown by the PATH value in the zenml artifact-store describe output, the artifacts + are stored inside a folder on your local filesystem. + + + You can create additional instances of local Artifact Stores and use them in your + stacks as you see fit, e.g.: + + + # Register the local artifact store + + + zenml artifact-store register custom_local --flavor local + + + # Register and set a stack with the new artifact store + + + zenml stack register custom_stack -o default -a custom_local --set + + + Same as all other Artifact Store flavors, the local Artifact Store does take in + a path configuration parameter that can be set during registration to point to + a custom path on your machine. However, it is highly recommended that you rely + on the default path value, otherwise, it may lead to unexpected results. Other + local stack components depend on the convention used for the default path to be + able to access the local Artifact Store. + + + For more, up-to-date information on the local Artifact Store implementation and + its configuration, you can have a look at the SDK docs . + + + How do you use it? + + + Aside from the fact that the artifacts are stored locally, using the local Artifact + Store is no different from using any other flavor of Artifact Store. + + + PreviousArtifact Stores + + + NextAmazon Simple Cloud Storage (S3) + + + Last updated 19 days ago' +- source_sentence: How do I configure the evidently_test_step to run an Evidently + test suite with specific column mappings? + sentences: + - ' ┃┗━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + + $ zenml model-deployer models get-url 8cbe671b-9fce-4394-a051-68e001f92765 - If you wish to disable colourful logging, set the following environment variable: + Prediction URL of Served Model 8cbe671b-9fce-4394-a051-68e001f92765 is: - ZENML_LOGGING_COLORS_DISABLED=true' - - 'pd.Series(model.predict(data)) + http://abb84c444c7804aa98fc8c097896479d-377673393.us-east-1.elb.amazonaws.com/seldon/zenml-workloads/zenml-8cbe67 - return predictionsHowever, this approach has the downside that if the step is - cached, then it could lead to unexpected results. You could simply disable the - cache in the above step or the corresponding pipeline. However, one other way - of achieving this would be to resolve the artifact at the pipeline level: + 1b-9fce-4394-a051-68e001f92765/api/v0.1/predictions - from typing_extensions import Annotated + $ zenml model-deployer models delete 8cbe671b-9fce-4394-a051-68e001f92765 - from zenml import get_pipeline_context, pipeline, Model + In Python, you can alternatively discover the prediction URL of a deployed model + by inspecting the metadata of the step that deployed the model: - from zenml.enums import ModelStages + + from zenml.client import Client + + + pipeline_run = Client().get_pipeline_run("") + + + deployer_step = pipeline_run.steps[""] + + + deployed_model_url = deployer_step.run_metadata["deployed_model_url"].value + + + The ZenML integrations that provide Model Deployer stack components also include + standard pipeline steps that can directly be inserted into any pipeline to achieve + a continuous model deployment workflow. These steps take care of all the aspects + of continuously deploying models to an external server and saving the Service + configuration into the Artifact Store, where they can be loaded at a later time + and re-create the initial conditions used to serve a particular model. + + + PreviousDevelop a custom experiment tracker + + + NextMLflow + + + Last updated 15 days ago' + - 'Load artifacts into memory + + + Often ZenML pipeline steps consume artifacts produced by one another directly + in the pipeline code, but there are scenarios where you need to pull external + data into your steps. Such external data could be artifacts produced by non-ZenML + codes. For those cases, it is advised to use ExternalArtifact, but what if we + plan to exchange data created with other ZenML pipelines? + + + ZenML pipelines are first compiled and only executed at some later point. During + the compilation phase, all function calls are executed, and this data is fixed + as step input parameters. Given all this, the late materialization of dynamic + objects, like data artifacts, is crucial. Without late materialization, it would + not be possible to pass not-yet-existing artifacts as step inputs, or their metadata, + which is often the case in a multi-pipeline setting. + + + We identify two major use cases for exchanging artifacts between pipelines: + + + You semantically group your data products using ZenML Models + + + You prefer to use ZenML Client to bring all the pieces together + + + We recommend using models to group and access artifacts across pipelines. Find + out how to load an artifact from a ZenML Model here. + + + Use client methods to exchange artifacts + + + If you don''t yet use the Model Control Plane, you can still exchange data between + pipelines with late materialization. Let''s rework the do_predictions pipeline + code as follows: + + + from typing import Annotated + + + from zenml import step, pipeline + + + from zenml.client import Client import pandas as pd @@ -671,7 +856,16 @@ widget: def predict( - model: ClassifierMixin, + model1: ClassifierMixin, + + + model2: ClassifierMixin, + + + model1_metric: float, + + + model2_metric: float, data: pd.DataFrame, @@ -680,315 +874,370 @@ widget: ) -> Annotated[pd.Series, "predictions"]: - predictions = pd.Series(model.predict(data)) + # compare which model performs better on the fly + + + if model1_metric < model2_metric: + + + predictions = pd.Series(model1.predict(data)) + + + else: + + + predictions = pd.Series(model2.predict(data)) return predictions - @pipeline( + @step' + - 't ( + + + EvidentlyColumnMapping, + + + evidently_test_step,from zenml.integrations.evidently.tests import EvidentlyTestConfig + + + text_data_test = evidently_test_step.with_options( + + parameters=dict( - model=Model( + column_mapping=EvidentlyColumnMapping( - name="iris_classifier", + target="Rating", - # Using the production stage + numerical_features=["Age", "Positive_Feedback_Count"], - version=ModelStages.PRODUCTION, + + categorical_features=[ + + + "Division_Name", + + + "Department_Name", + + + "Class_Name", + + + ], + + + text_features=["Review_Text", "Title"], ), - def do_predictions(): + tests=[ - # model name and version are derived from pipeline context + EvidentlyTestConfig.test("DataQualityTestPreset"), - model = get_pipeline_context().model + EvidentlyTestConfig.test_generator( - inference_data = load_data() + "TestColumnRegExp", - predict( + columns=["Review_Text", "Title"], - # Here, we load in the `trained_model` from a trainer step + reg_exp=r"[A-Z][A-Za-z0-9 ]*", - model=model.get_model_artifact("trained_model"), + ), - data=inference_data, + ], - if __name__ == "__main__": + # We need to download the NLTK data for the TestColumnRegExp test - do_predictions() + download_nltk_data=True, - Ultimately, both approaches are fine. You should decide which one to use based - on your own preferences. + ), - PreviousLoad artifacts into memory + The configuration shown in the example is the equivalent of running the following + Evidently code inside the step: - NextVisualizing artifacts + from evidently.tests import TestColumnRegExp - Last updated 15 days ago' - - 'Docker settings on a step + from evidently.test_preset import DataQualityTestPreset - You have the option to customize the Docker settings at a step level. + from evidently import ColumnMapping - By default every step of a pipeline uses the same Docker image that is defined - at the pipeline level. Sometimes your steps will have special requirements that - make it necessary to define a different Docker image for one or many steps. This - can easily be accomplished by adding the DockerSettings to the step decorator - directly. + from evidently.test_suite import TestSuite - from zenml import step + from evidently.tests.base_test import generate_column_tests - from zenml.config import DockerSettings + import nltk - @step( + nltk.download("words") - settings={ + nltk.download("wordnet") - "docker": DockerSettings( + nltk.download("omw-1.4") - parent_image="pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime" + column_mapping = ColumnMapping( - def training(...): + target="Rating", - ... + numerical_features=["Age", "Positive_Feedback_Count"], - Alternatively, this can also be done within the configuration file. + categorical_features=[ - steps: + "Division_Name", - training: + "Department_Name", - settings: + "Class_Name", - docker: + ], - parent_image: pytorch/pytorch:2.2.0-cuda11.8-cudnn8-runtime + text_features=["Review_Text", "Title"], - required_integrations: + test_suite = TestSuite( - gcp + tests=[ - github + DataQualityTestPreset(), - requirements: + generate_column_tests( - zenml # Make sure to include ZenML for other parent images + TestColumnRegExp, - numpy + columns=["Review_Text", "Title"], - PreviousDocker settings on a pipeline + parameters={"reg_exp": r"[A-Z][A-Za-z0-9 ]*"} - NextSpecify pip dependencies and apt packages + # The datasets are those that are passed to the Evidently step - Last updated 19 days ago' -- source_sentence: How do I configure the Kubernetes Service Connector to connect - ZenML to Kubernetes clusters? + # as input artifacts + + + test_suite.run( + + + current_data=current_dataset, + + + reference_data=reference_dataset, + + + column_mapping=column_mapping, + + + Let''s break this down... + + + We configure the evidently_test_step using parameters that you would normally + pass to the Evidently TestSuite object to configure and run an Evidently test + suite . It consists of the following fields:' +- source_sentence: What is the purpose of the CustomContainerRegistryConfig class + in a ZenML workflow? sentences: - - 'Kubernetes Service Connector + - 'e the new flavor in the list of available flavors:zenml container-registry flavor + list + + It is important to draw attention to when and how these base abstractions are + coming into play in a ZenML workflow. - Configuring Kubernetes Service Connectors to connect ZenML to Kubernetes clusters. + The CustomContainerRegistryFlavor class is imported and utilized upon the creation + of the custom flavor through the CLI. - The ZenML Kubernetes service connector facilitates authenticating and connecting - to a Kubernetes cluster. The connector can be used to access to any generic Kubernetes - cluster by providing pre-authenticated Kubernetes python clients to Stack Components - that are linked to it and also allows configuring the local Kubernetes CLI (i.e. - kubectl). + The CustomContainerRegistryConfig class is imported when someone tries to register/update + a stack component with this custom flavor. Especially, during the registration + process of the stack component, the config will be used to validate the values + given by the user. As Config object are inherently pydantic objects, you can also + add your own custom validators here. - Prerequisites + The CustomContainerRegistry only comes into play when the component is ultimately + in use. - The Kubernetes Service Connector is part of the Kubernetes ZenML integration. - You can either install the entire integration or use a pypi extra to install it - independently of the integration: + The design behind this interaction lets us separate the configuration of the flavor + from its implementation. This way we can register flavors and components even + when the major dependencies behind their implementation are not installed in our + local setting (assuming the CustomContainerRegistryFlavor and the CustomContainerRegistryConfig + are implemented in a different module/path than the actual CustomContainerRegistry). - pip install "zenml[connectors-kubernetes]" installs only prerequisites for the - Kubernetes Service Connector Type + PreviousGitHub Container Registry - zenml integration install kubernetes installs the entire Kubernetes ZenML integration + NextData Validators - A local Kubernetes CLI (i.e. kubectl ) and setting up local kubectl configuration - contexts is not required to access Kubernetes clusters in your Stack Components - through the Kubernetes Service Connector. + + Last updated 15 days ago' + - "ons. Try it out at https://www.zenml.io/live-demo!Automated Deployments: With\ + \ ZenML, you no longer need to upload custom Docker images to the cloud whenever\ + \ you want to deploy a new model to production. Simply define your ML workflow\ + \ as a ZenML pipeline, let ZenML handle the containerization, and have your model\ + \ automatically deployed to a highly scalable Kubernetes deployment service like\ + \ Seldon.Copyfrom zenml.integrations.seldon.steps import seldon_model_deployer_step\n\ + from my_organization.steps import data_loader_step, model_trainer_step\n\n@pipeline\n\ + def my_pipeline():\n data = data_loader_step()\n model = model_trainer_step(data)\n\ + \ seldon_model_deployer_step(model)\n\n\U0001F680 Learn More\n\nReady to manage\ + \ your ML lifecycles end-to-end with ZenML? Here is a collection of pages you\ + \ can take a look at next:\n\nGet started with ZenML and learn how to build your\ + \ first pipeline and stack.\n\nDiscover advanced ZenML features like config management\ + \ and containerization.\n\nExplore ZenML through practical use-case examples.\n\ + \nNextInstallation\n\nLast updated 14 days ago" + - 'our active stack: - $ zenml service-connector list-types --type kubernetes + from zenml.client import Clientexperiment_tracker = Client().active_stack.experiment_tracker - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━┓ + @step(experiment_tracker=experiment_tracker.name) - ┃ NAME │ TYPE │ RESOURCE TYPES │ AUTH - METHODS │ LOCAL │ REMOTE ┃ + def tf_trainer(...): - ┠──────────────────────────────┼───────────────┼───────────────────────┼──────────────┼───────┼────────┨ + ... - ┃ Kubernetes Service Connector │ 🌀 kubernetes │ 🌀 kubernetes-cluster │ password │ - ✅ │ ✅ ┃ + MLflow UI - ┃ │ │ │ token │ │ ┃ + MLflow comes with its own UI that you can use to find further details about your + tracked experiments. - ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━┛ + You can find the URL of the MLflow experiment linked to a specific ZenML run via + the metadata of the step in which the experiment tracker was used: - Resource Types + from zenml.client import Client - The Kubernetes Service Connector only supports authenticating to and granting - access to a generic Kubernetes cluster. This type of resource is identified by - the kubernetes-cluster Resource Type.' - - 'to the container registry. + last_run = client.get_pipeline("").last_run - Authentication MethodsIntegrating and using an Azure Container Registry in your - pipelines is not possible without employing some form of authentication. If you''re - looking for a quick way to get started locally, you can use the Local Authentication - method. However, the recommended way to authenticate to the Azure cloud platform - is through an Azure Service Connector. This is particularly useful if you are - configuring ZenML stacks that combine the Azure Container Registry with other - remote stack components also running in Azure. + trainer_step = last_run.get_step("") - This method uses the Docker client authentication available in the environment - where the ZenML code is running. On your local machine, this is the quickest way - to configure an Azure Container Registry. You don''t need to supply credentials - explicitly when you register the Azure Container Registry, as it leverages the - local credentials and configuration that the Azure CLI and Docker client store - on your local machine. However, you will need to install and set up the Azure - CLI on your machine as a prerequisite, as covered in the Azure CLI documentation, - before you register the Azure Container Registry. + tracking_url = trainer_step.run_metadata["experiment_tracker_url"].value - With the Azure CLI installed and set up with credentials, you need to login to - the container registry so Docker can pull and push images: + print(tracking_url) - # Fill your REGISTRY_NAME in the placeholder in the following command. + This will be the URL of the corresponding experiment in your deployed MLflow instance, + or a link to the corresponding mlflow experiment file if you are using local MLflow. - # You can find the REGISTRY_NAME as part of your registry URI: `.azurecr.io` + If you are using local MLflow, you can use the mlflow ui command to start MLflow + at localhost:5000 where you can then explore the UI in your browser. - az acr login --name= + mlflow ui --backend-store-uri - Stacks using the Azure Container Registry set up with local authentication are - not portable across environments. To make ZenML pipelines fully portable, it is - recommended to use an Azure Service Connector to link your Azure Container Registry - to the remote ACR registry.' - - 'he Post-execution workflow has changed as follows:The get_pipelines and get_pipeline - methods have been moved out of the Repository (i.e. the new Client ) class and - lie directly in the post_execution module now. To use the user has to do: + Additional configuration - from zenml.post_execution import get_pipelines, get_pipeline + For additional configuration of the MLflow experiment tracker, you can pass MLFlowExperimentTrackerSettings + to create nested runs or add additional tags to your MLflow runs: - New methods to directly get a run have been introduced: get_run and get_unlisted_runs - method has been introduced to get unlisted runs. + import mlflow - Usage remains largely similar. Please read the new docs for post-execution to - inform yourself of what further has changed. + from zenml.integrations.mlflow.flavors.mlflow_experiment_tracker_flavor import + MLFlowExperimentTrackerSettings - How to migrate: Replace all post-execution workflows from the paradigm of Repository.get_pipelines - or Repository.get_pipeline_run to the corresponding post_execution methods. + mlflow_settings = MLFlowExperimentTrackerSettings( - 📡Future Changes + nested=True, - While this rehaul is big and will break previous releases, we do have some more - work left to do. However we also expect this to be the last big rehaul of ZenML - before our 1.0.0 release, and no other release will be so hard breaking as this - one. Currently planned future breaking changes are: + tags={"key": "value"} - Following the metadata store, the secrets manager stack component might move out - of the stack. + @step( - ZenML StepContext might be deprecated. + experiment_tracker="", - 🐞 Reporting Bugs + settings={ - While we have tried our best to document everything that has changed, we realize - that mistakes can be made and smaller changes overlooked. If this is the case, - or you encounter a bug at any time, the ZenML core team and community are available - around the clock on the growing Slack community. + "experiment_tracker.mlflow": mlflow_settings - For bug reports, please also consider submitting a GitHub Issue. + def step_one( - Lastly, if the new changes have left you desiring a feature, then consider adding - it to our public feature voting board. Before doing so, do check what is already - on there and consider upvoting the features you desire the most. + data: np.ndarray, - PreviousMigration guide + ) -> np.ndarray: - NextMigration guide 0.23.0 → 0.30.0 + ... - Last updated 12 days ago' + Check out the SDK docs for a full list of available attributes and this docs page + for more information on how to specify settings. + + + PreviousComet + + + NextNeptune + + + Last updated 15 days ago' model-index: - name: zenml/finetuned-snowflake-arctic-embed-m results: @@ -1000,49 +1249,49 @@ model-index: type: dim_384 metrics: - type: cosine_accuracy@1 - value: 0.3614457831325301 + value: 0.3373493975903614 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.6024096385542169 + value: 0.572289156626506 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6987951807228916 + value: 0.6927710843373494 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7831325301204819 + value: 0.7951807228915663 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.3614457831325301 + value: 0.3373493975903614 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.2008032128514056 + value: 0.19076305220883533 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.1397590361445783 + value: 0.13855421686746985 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07831325301204817 + value: 0.07951807228915661 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.3614457831325301 + value: 0.3373493975903614 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.6024096385542169 + value: 0.572289156626506 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6987951807228916 + value: 0.6927710843373494 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7831325301204819 + value: 0.7951807228915663 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5756072832948543 + value: 0.5579541825293861 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.5091365461847391 + value: 0.48281937272901143 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.5165480061197206 + value: 0.4881409762768584 name: Cosine Map@100 - task: type: information-retrieval @@ -1052,49 +1301,49 @@ model-index: type: dim_256 metrics: - type: cosine_accuracy@1 - value: 0.3674698795180723 + value: 0.3433734939759036 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.6144578313253012 + value: 0.5963855421686747 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6987951807228916 + value: 0.6626506024096386 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7710843373493976 + value: 0.7469879518072289 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.3674698795180723 + value: 0.3433734939759036 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.2048192771084337 + value: 0.19879518072289157 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.1397590361445783 + value: 0.1325301204819277 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07710843373493974 + value: 0.07469879518072287 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.3674698795180723 + value: 0.3433734939759036 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.6144578313253012 + value: 0.5963855421686747 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6987951807228916 + value: 0.6626506024096386 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7710843373493976 + value: 0.7469879518072289 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5732430988480587 + value: 0.5462463547623214 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.509569229298145 + value: 0.4817340791738385 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.5167702755195493 + value: 0.48971987967160924 name: Cosine Map@100 - task: type: information-retrieval @@ -1107,46 +1356,46 @@ model-index: value: 0.29518072289156627 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5542168674698795 + value: 0.5301204819277109 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6506024096385542 + value: 0.6445783132530121 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7469879518072289 + value: 0.7349397590361446 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.29518072289156627 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.18473895582329317 + value: 0.17670682730923695 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.1301204819277108 + value: 0.1289156626506024 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07469879518072288 + value: 0.07349397590361444 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.29518072289156627 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5542168674698795 + value: 0.5301204819277109 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6506024096385542 + value: 0.6445783132530121 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7469879518072289 + value: 0.7349397590361446 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5199227959343978 + value: 0.5127103099003618 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.44722939376553855 + value: 0.44137980493402174 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4541483656933914 + value: 0.4487298407008574 name: Cosine Map@100 - task: type: information-retrieval @@ -1156,49 +1405,49 @@ model-index: type: dim_64 metrics: - type: cosine_accuracy@1 - value: 0.28313253012048195 + value: 0.27710843373493976 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.5180722891566265 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.5843373493975904 + value: 0.5542168674698795 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.6746987951807228 + value: 0.6626506024096386 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.28313253012048195 + value: 0.27710843373493976 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.17269076305220882 + value: 0.17269076305220885 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.11686746987951806 + value: 0.1108433734939759 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.06746987951807228 + value: 0.06626506024096383 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.28313253012048195 + value: 0.27710843373493976 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.5180722891566265 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.5843373493975904 + value: 0.5542168674698795 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.6746987951807228 + value: 0.6626506024096386 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.47987356927913916 + value: 0.46724356296794395 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.4177519602218399 + value: 0.4052663033084721 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4261749847732839 + value: 0.41253529663957095 name: Cosine Map@100 --- @@ -1252,9 +1501,9 @@ from sentence_transformers import SentenceTransformer model = SentenceTransformer("zenml/finetuned-snowflake-arctic-embed-m") # Run inference sentences = [ - 'How do I configure the Kubernetes Service Connector to connect ZenML to Kubernetes clusters?', - 'Kubernetes Service Connector\n\nConfiguring Kubernetes Service Connectors to connect ZenML to Kubernetes clusters.\n\nThe ZenML Kubernetes service connector facilitates authenticating and connecting to a Kubernetes cluster. The connector can be used to access to any generic Kubernetes cluster by providing pre-authenticated Kubernetes python clients to Stack Components that are linked to it and also allows configuring the local Kubernetes CLI (i.e. kubectl).\n\nPrerequisites\n\nThe Kubernetes Service Connector is part of the Kubernetes ZenML integration. You can either install the entire integration or use a pypi extra to install it independently of the integration:\n\npip install "zenml[connectors-kubernetes]" installs only prerequisites for the Kubernetes Service Connector Type\n\nzenml integration install kubernetes installs the entire Kubernetes ZenML integration\n\nA local Kubernetes CLI (i.e. kubectl ) and setting up local kubectl configuration contexts is not required to access Kubernetes clusters in your Stack Components through the Kubernetes Service Connector.\n\n$ zenml service-connector list-types --type kubernetes\n\n┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━┓\n\n┃ NAME │ TYPE │ RESOURCE TYPES │ AUTH METHODS │ LOCAL │ REMOTE ┃\n\n┠──────────────────────────────┼───────────────┼───────────────────────┼──────────────┼───────┼────────┨\n\n┃ Kubernetes Service Connector │ 🌀 kubernetes │ 🌀 kubernetes-cluster │ password │ ✅ │ ✅ ┃\n\n┃ │ │ │ token │ │ ┃\n\n┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━┛\n\nResource Types\n\nThe Kubernetes Service Connector only supports authenticating to and granting access to a generic Kubernetes cluster. This type of resource is identified by the kubernetes-cluster Resource Type.', - 'he Post-execution workflow has changed as follows:The get_pipelines and get_pipeline methods have been moved out of the Repository (i.e. the new Client ) class and lie directly in the post_execution module now. To use the user has to do:\n\nfrom zenml.post_execution import get_pipelines, get_pipeline\n\nNew methods to directly get a run have been introduced: get_run and get_unlisted_runs method has been introduced to get unlisted runs.\n\nUsage remains largely similar. Please read the new docs for post-execution to inform yourself of what further has changed.\n\nHow to migrate: Replace all post-execution workflows from the paradigm of Repository.get_pipelines or Repository.get_pipeline_run to the corresponding post_execution methods.\n\n📡Future Changes\n\nWhile this rehaul is big and will break previous releases, we do have some more work left to do. However we also expect this to be the last big rehaul of ZenML before our 1.0.0 release, and no other release will be so hard breaking as this one. Currently planned future breaking changes are:\n\nFollowing the metadata store, the secrets manager stack component might move out of the stack.\n\nZenML StepContext might be deprecated.\n\n🐞 Reporting Bugs\n\nWhile we have tried our best to document everything that has changed, we realize that mistakes can be made and smaller changes overlooked. If this is the case, or you encounter a bug at any time, the ZenML core team and community are available around the clock on the growing Slack community.\n\nFor bug reports, please also consider submitting a GitHub Issue.\n\nLastly, if the new changes have left you desiring a feature, then consider adding it to our public feature voting board. Before doing so, do check what is already on there and consider upvoting the features you desire the most.\n\nPreviousMigration guide\n\nNextMigration guide 0.23.0 → 0.30.0\n\nLast updated 12 days ago', + 'What is the purpose of the CustomContainerRegistryConfig class in a ZenML workflow?', + 'e the new flavor in the list of available flavors:zenml container-registry flavor list\n\nIt is important to draw attention to when and how these base abstractions are coming into play in a ZenML workflow.\n\nThe CustomContainerRegistryFlavor class is imported and utilized upon the creation of the custom flavor through the CLI.\n\nThe CustomContainerRegistryConfig class is imported when someone tries to register/update a stack component with this custom flavor. Especially, during the registration process of the stack component, the config will be used to validate the values given by the user. As Config object are inherently pydantic objects, you can also add your own custom validators here.\n\nThe CustomContainerRegistry only comes into play when the component is ultimately in use.\n\nThe design behind this interaction lets us separate the configuration of the flavor from its implementation. This way we can register flavors and components even when the major dependencies behind their implementation are not installed in our local setting (assuming the CustomContainerRegistryFlavor and the CustomContainerRegistryConfig are implemented in a different module/path than the actual CustomContainerRegistry).\n\nPreviousGitHub Container Registry\n\nNextData Validators\n\nLast updated 15 days ago', + 'our active stack:\n\nfrom zenml.client import Clientexperiment_tracker = Client().active_stack.experiment_tracker\n\n@step(experiment_tracker=experiment_tracker.name)\n\ndef tf_trainer(...):\n\n...\n\nMLflow UI\n\nMLflow comes with its own UI that you can use to find further details about your tracked experiments.\n\nYou can find the URL of the MLflow experiment linked to a specific ZenML run via the metadata of the step in which the experiment tracker was used:\n\nfrom zenml.client import Client\n\nlast_run = client.get_pipeline("").last_run\n\ntrainer_step = last_run.get_step("")\n\ntracking_url = trainer_step.run_metadata["experiment_tracker_url"].value\n\nprint(tracking_url)\n\nThis will be the URL of the corresponding experiment in your deployed MLflow instance, or a link to the corresponding mlflow experiment file if you are using local MLflow.\n\nIf you are using local MLflow, you can use the mlflow ui command to start MLflow at localhost:5000 where you can then explore the UI in your browser.\n\nmlflow ui --backend-store-uri \n\nAdditional configuration\n\nFor additional configuration of the MLflow experiment tracker, you can pass MLFlowExperimentTrackerSettings to create nested runs or add additional tags to your MLflow runs:\n\nimport mlflow\n\nfrom zenml.integrations.mlflow.flavors.mlflow_experiment_tracker_flavor import MLFlowExperimentTrackerSettings\n\nmlflow_settings = MLFlowExperimentTrackerSettings(\n\nnested=True,\n\ntags={"key": "value"}\n\n@step(\n\nexperiment_tracker="",\n\nsettings={\n\n"experiment_tracker.mlflow": mlflow_settings\n\ndef step_one(\n\ndata: np.ndarray,\n\n) -> np.ndarray:\n\n...\n\nCheck out the SDK docs for a full list of available attributes and this docs page for more information on how to specify settings.\n\nPreviousComet\n\nNextNeptune\n\nLast updated 15 days ago', ] embeddings = model.encode(sentences) print(embeddings.shape) @@ -1300,21 +1549,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.3614 | -| cosine_accuracy@3 | 0.6024 | -| cosine_accuracy@5 | 0.6988 | -| cosine_accuracy@10 | 0.7831 | -| cosine_precision@1 | 0.3614 | -| cosine_precision@3 | 0.2008 | -| cosine_precision@5 | 0.1398 | -| cosine_precision@10 | 0.0783 | -| cosine_recall@1 | 0.3614 | -| cosine_recall@3 | 0.6024 | -| cosine_recall@5 | 0.6988 | -| cosine_recall@10 | 0.7831 | -| cosine_ndcg@10 | 0.5756 | -| cosine_mrr@10 | 0.5091 | -| **cosine_map@100** | **0.5165** | +| cosine_accuracy@1 | 0.3373 | +| cosine_accuracy@3 | 0.5723 | +| cosine_accuracy@5 | 0.6928 | +| cosine_accuracy@10 | 0.7952 | +| cosine_precision@1 | 0.3373 | +| cosine_precision@3 | 0.1908 | +| cosine_precision@5 | 0.1386 | +| cosine_precision@10 | 0.0795 | +| cosine_recall@1 | 0.3373 | +| cosine_recall@3 | 0.5723 | +| cosine_recall@5 | 0.6928 | +| cosine_recall@10 | 0.7952 | +| cosine_ndcg@10 | 0.558 | +| cosine_mrr@10 | 0.4828 | +| **cosine_map@100** | **0.4881** | #### Information Retrieval * Dataset: `dim_256` @@ -1322,21 +1571,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.3675 | -| cosine_accuracy@3 | 0.6145 | -| cosine_accuracy@5 | 0.6988 | -| cosine_accuracy@10 | 0.7711 | -| cosine_precision@1 | 0.3675 | -| cosine_precision@3 | 0.2048 | -| cosine_precision@5 | 0.1398 | -| cosine_precision@10 | 0.0771 | -| cosine_recall@1 | 0.3675 | -| cosine_recall@3 | 0.6145 | -| cosine_recall@5 | 0.6988 | -| cosine_recall@10 | 0.7711 | -| cosine_ndcg@10 | 0.5732 | -| cosine_mrr@10 | 0.5096 | -| **cosine_map@100** | **0.5168** | +| cosine_accuracy@1 | 0.3434 | +| cosine_accuracy@3 | 0.5964 | +| cosine_accuracy@5 | 0.6627 | +| cosine_accuracy@10 | 0.747 | +| cosine_precision@1 | 0.3434 | +| cosine_precision@3 | 0.1988 | +| cosine_precision@5 | 0.1325 | +| cosine_precision@10 | 0.0747 | +| cosine_recall@1 | 0.3434 | +| cosine_recall@3 | 0.5964 | +| cosine_recall@5 | 0.6627 | +| cosine_recall@10 | 0.747 | +| cosine_ndcg@10 | 0.5462 | +| cosine_mrr@10 | 0.4817 | +| **cosine_map@100** | **0.4897** | #### Information Retrieval * Dataset: `dim_128` @@ -1345,20 +1594,20 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.2952 | -| cosine_accuracy@3 | 0.5542 | -| cosine_accuracy@5 | 0.6506 | -| cosine_accuracy@10 | 0.747 | +| cosine_accuracy@3 | 0.5301 | +| cosine_accuracy@5 | 0.6446 | +| cosine_accuracy@10 | 0.7349 | | cosine_precision@1 | 0.2952 | -| cosine_precision@3 | 0.1847 | -| cosine_precision@5 | 0.1301 | -| cosine_precision@10 | 0.0747 | +| cosine_precision@3 | 0.1767 | +| cosine_precision@5 | 0.1289 | +| cosine_precision@10 | 0.0735 | | cosine_recall@1 | 0.2952 | -| cosine_recall@3 | 0.5542 | -| cosine_recall@5 | 0.6506 | -| cosine_recall@10 | 0.747 | -| cosine_ndcg@10 | 0.5199 | -| cosine_mrr@10 | 0.4472 | -| **cosine_map@100** | **0.4541** | +| cosine_recall@3 | 0.5301 | +| cosine_recall@5 | 0.6446 | +| cosine_recall@10 | 0.7349 | +| cosine_ndcg@10 | 0.5127 | +| cosine_mrr@10 | 0.4414 | +| **cosine_map@100** | **0.4487** | #### Information Retrieval * Dataset: `dim_64` @@ -1366,21 +1615,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.2831 | +| cosine_accuracy@1 | 0.2771 | | cosine_accuracy@3 | 0.5181 | -| cosine_accuracy@5 | 0.5843 | -| cosine_accuracy@10 | 0.6747 | -| cosine_precision@1 | 0.2831 | +| cosine_accuracy@5 | 0.5542 | +| cosine_accuracy@10 | 0.6627 | +| cosine_precision@1 | 0.2771 | | cosine_precision@3 | 0.1727 | -| cosine_precision@5 | 0.1169 | -| cosine_precision@10 | 0.0675 | -| cosine_recall@1 | 0.2831 | +| cosine_precision@5 | 0.1108 | +| cosine_precision@10 | 0.0663 | +| cosine_recall@1 | 0.2771 | | cosine_recall@3 | 0.5181 | -| cosine_recall@5 | 0.5843 | -| cosine_recall@10 | 0.6747 | -| cosine_ndcg@10 | 0.4799 | -| cosine_mrr@10 | 0.4178 | -| **cosine_map@100** | **0.4262** | +| cosine_recall@5 | 0.5542 | +| cosine_recall@10 | 0.6627 | +| cosine_ndcg@10 | 0.4672 | +| cosine_mrr@10 | 0.4053 | +| **cosine_map@100** | **0.4125** |