diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -31,908 +31,1019 @@ tags: - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss widget: -- source_sentence: How can I deploy the ZenML server in different environments and - manage pipelines with the new commands? +- source_sentence: What is the error message related to the blob-container for the + azure-generic subscription in ZenML? sentences: - - 'ed to update the way they are registered in ZenML.the updated ZenML server provides - a new and improved collaborative experience. When connected to a ZenML server, - you can now share your ZenML Stacks and Stack Components with other users. If - you were previously using the ZenML Profiles or the ZenML server to share your - ZenML Stacks, you should switch to the new ZenML server and Dashboard and update - your existing workflows to reflect the new features. + - '─────────────────────────────────────────────────┨┃ 🇦 azure-generic │ ZenML + Subscription ┃ - ZenML takes over the Metadata Store role + ┠───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┨ - ZenML can now run as a server that can be accessed via a REST API and also comes - with a visual user interface (called the ZenML Dashboard). This server can be - deployed in arbitrary environments (local, on-prem, via Docker, on AWS, GCP, Azure - etc.) and supports user management, workspace scoping, and more. + ┃ 📦 blob-container │ 💥 error: connector authorization failure: the ''access-token'' + authentication method is not supported for blob storage resources ┃ - The release introduces a series of commands to facilitate managing the lifecycle - of the ZenML server and to access the pipeline and pipeline run information: + ┠───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┨ - zenml connect / disconnect / down / up / logs / status can be used to configure - your client to connect to a ZenML server, to start a local ZenML Dashboard or - to deploy a ZenML server to a cloud environment. For more information on how to - use these commands, see the ZenML deployment documentation. + ┃ 🌀 kubernetes-cluster │ demo-zenml-demos/demo-zenml-terraform-cluster ┃ - zenml pipeline list / runs / delete can be used to display information and about - and manage your pipelines and pipeline runs. + ┠───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┨ - In ZenML 0.13.2 and earlier versions, information about pipelines and pipeline - runs used to be stored in a separate stack component called the Metadata Store. - Starting with 0.20.0, the role of the Metadata Store is now taken over by ZenML - itself. This means that the Metadata Store is no longer a separate component in - the ZenML architecture, but rather a part of the ZenML core, located wherever - ZenML is deployed: locally on your machine or running remotely as a server.' - - 'ntainer │ service-principal │ │ ┃┃ │ │ - 🌀 kubernetes-cluster │ access-token │ │ ┃ + ┃ 🐳 docker-registry │ demozenmlcontainerregistry.azurecr.io ┃ - ┃ │ │ 🐳 docker-registry │ │ │ ┃ + ┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ - ┠──────────────────────────────┼───────────────┼───────────────────────┼───────────────────┼───────┼────────┨ + zenml service-connector describe azure-session-token - ┃ AWS Service Connector │ 🔶 aws │ 🔶 aws-generic │ implicit │ - ✅ │ ✅ ┃ + Example Command Output - ┃ │ │ 📦 s3-bucket │ secret-key │ │ ┃ + Service connector ''azure-session-token'' of type ''azure'' with id ''94d64103-9902-4aa5-8ce4-877061af89af'' + is owned by user ''default'' and is ''private''. - ┃ │ │ 🌀 kubernetes-cluster │ sts-token │ │ ┃ + ''azure-session-token'' azure Service Connector Details - ┃ │ │ 🐳 docker-registry │ iam-role │ │ ┃ + ┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ │ │ │ session-token │ │ ┃ + ┃ PROPERTY │ VALUE ┃ - ┃ │ │ │ federation-token │ │ ┃ + ┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ - ┠──────────────────────────────┼───────────────┼───────────────────────┼───────────────────┼───────┼────────┨ + ┃ ID │ 94d64103-9902-4aa5-8ce4-877061af89af ┃' + - '🪆Use the Model Control Plane - ┃ GCP Service Connector │ 🔵 gcp │ 🔵 gcp-generic │ implicit │ - ✅ │ ✅ ┃ + A Model is simply an entity that groups pipelines, artifacts, metadata, and other + crucial business data into a unified entity. A ZenML Model is a concept that more + broadly encapsulates your ML products business logic. You may even think of a + ZenML Model as a "project" or a "workspace" - ┃ │ │ 📦 gcs-bucket │ user-account │ │ ┃ + Please note that one of the most common artifacts that is associated with a Model + in ZenML is the so-called technical model, which is the actually model file/files + that holds the weight and parameters of a machine learning training result. However, + this is not the only artifact that is relevant; artifacts such as the training + data and the predictions this model produces in production are also linked inside + a ZenML Model. - ┃ │ │ 🌀 kubernetes-cluster │ service-account │ │ ┃ + Models are first-class citizens in ZenML and as such viewing and using them is + unified and centralized in the ZenML API, client as well as on the ZenML Pro dashboard. - ┃ │ │ 🐳 docker-registry │ oauth2-token │ │ ┃ + A Model captures lineage information and more. Within a Model, different Model + versions can be staged. For example, you can rely on your predictions at a specific + stage, like Production, and decide whether the Model version should be promoted + based on your business rules during training. Plus, accessing data from other + Models and their versions is just as simple. - ┃ │ │ │ impersonation │ │ ┃ + The Model Control Plane is how you manage your models through this unified interface. + It allows you to combine the logic of your pipelines, artifacts and crucial business + data along with the actual ''technical model''. - ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━┛' - - 'tional) Which Metadata to Extract for the ArtifactOptionally, you can override - the extract_metadata() method to track custom metadata for all artifacts saved - by your materializer. Anything you extract here will be displayed in the dashboard - next to your artifacts. + To see an end-to-end example, please refer to the starter guide. - src.zenml.metadata.metadata_types that are displayed in a dedicated way in the - dashboard. See + PreviousDisabling visualizations - src.zenml.metadata.metadata_types.MetadataType for more details. + NextRegistering a Model - By default, this method will only extract the storage size of an artifact, but - you can overwrite it to track anything you wish. E.g., the zenml.materializers.NumpyMaterializer - overwrites this method to track the shape, dtype, and some statistical properties - of each np.ndarray that it saves. + Last updated 12 days ago' + - 'turns: - If you would like to disable artifact metadata extraction altogether, you can - set enable_artifact_metadata at either pipeline or step level via @pipeline(enable_artifact_metadata=False) - or @step(enable_artifact_metadata=False). + The Docker image repo digest or name. - Skipping materialization + """This is a slimmed-down version of the base implementation which aims to highlight + the abstraction layer. In order to see the full implementation and get the complete + docstrings, please check the source code on GitHub . - Skipping materialization might have unintended consequences for downstream tasks - that rely on materialized artifacts. Only skip materialization if there is no - other way to do what you want to do. + Build your own custom image builder - While materializers should in most cases be used to control how artifacts are - returned and consumed from pipeline steps, you might sometimes need to have a - completely unmaterialized artifact in a step, e.g., if you need to know the exact - path to where your artifact is stored. + If you want to create your own custom flavor for an image builder, you can follow + the following steps: - An unmaterialized artifact is a zenml.materializers.UnmaterializedArtifact. Among - others, it has a property uri that points to the unique path in the artifact store - where the artifact is persisted. One can use an unmaterialized artifact by specifying - UnmaterializedArtifact as the type in the step: + Create a class that inherits from the BaseImageBuilder class and implement the + abstract build method. This method should use the given build context and build + a Docker image with it. If additionally a container registry is passed to the + build method, the image builder is also responsible for pushing the image there. - from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact + If you need to provide any configuration, create a class that inherits from the + BaseImageBuilderConfig class and adds your configuration parameters. - from zenml import step + Bring both the implementation and the configuration together by inheriting from + the BaseImageBuilderFlavor class. Make sure that you give a name to the flavor + through its abstract property. - @step + Once you are done with the implementation, you can register it through the CLI. + Please ensure you point to the flavor class via dot notation: + + + zenml image-builder flavor register + + For example, if your flavor class MyImageBuilderFlavor is defined in flavors/my_flavor.py, + you''d register it by doing: - def my_step(my_artifact: UnmaterializedArtifact): # rather than pd.DataFrame + zenml image-builder flavor register flavors.my_flavor.MyImageBuilderFlavor - pass + ZenML resolves the flavor class by taking the path where you initialized zenml + (via zenml init) as the starting point of resolution. Therefore, please ensure + you follow the best practice of initializing zenml at the root of your repository. - Example' -- source_sentence: How is the verification process different for multi-instance and - single-instance Service Connectors? + + If ZenML does not find an initialized ZenML repository in any parent directory, + it will default to the current working directory, but usually it''s better to + not have to rely on this mechanism, and initialize zenml at the root. + + + Afterward, you should see the new flavor in the list of available flavors:' +- source_sentence: Where can I find more information on configuring the Spark step + operator in ZenML? sentences: - - 'Develop a Custom Annotator + - 'upplied a custom value while creating the cluster.Run the following command. + aws eks update-kubeconfig --name --region - Learning how to develop a custom annotator. + Get the name of the deployed cluster. - Before diving into the specifics of this component type, it is beneficial to familiarize - yourself with our general guide to writing custom component flavors in ZenML. - This guide provides an essential understanding of ZenML''s component flavor concepts. + zenml stack recipe output gke-cluster-name\ - Annotators are a stack component that enables the use of data annotation as part - of your ZenML stack and pipelines. You can use the associated CLI command to launch - annotation, configure your datasets and get stats on how many labeled tasks you - have ready for use. + Figure out the region that the cluster is deployed to. By default, the region + is set to europe-west1, which you should use in the next step if you haven''t + supplied a custom value while creating the cluster.\ - Base abstraction in progress! + Figure out the project that the cluster is deployed to. You must have passed in + a project ID while creating a GCP resource for the first time.\ - We are actively working on the base abstraction for the annotators, which will - be available soon. As a result, their extension is not possible at the moment. - If you would like to use an annotator in your stack, please check the list of - already available feature stores down below. + Run the following command. - PreviousProdigy + gcloud container clusters get-credentials --region --project - NextModel Registries + You may already have your kubectl client configured with your cluster. Check by + running kubectl get nodes before proceeding. - Last updated 15 days ago' - - 'ld be accessible to larger audiences. + Get the name of the deployed cluster. - TerminologyAs with any high-level abstraction, some terminology is needed to express - the concepts and operations involved. In spite of the fact that Service Connectors - cover such a large area of application as authentication and authorization for - a variety of resources from a range of different vendors, we managed to keep this - abstraction clean and simple. In the following expandable sections, you''ll learn - more about Service Connector Types, Resource Types, Resource Names, and Service - Connectors. + zenml stack recipe output k3d-cluster-name\ - This term is used to represent and identify a particular Service Connector implementation - and answer questions about its capabilities such as "what types of resources does - this Service Connector give me access to", "what authentication methods does it - support" and "what credentials and other information do I need to configure for - it". This is analogous to the role Flavors play for Stack Components in that the - Service Connector Type acts as the template from which one or more Service Connectors - are created. + Set the KUBECONFIG env variable to the kubeconfig file from the cluster. - For example, the built-in AWS Service Connector Type shipped with ZenML supports - a rich variety of authentication methods and provides access to AWS resources - such as S3 buckets, EKS clusters and ECR registries. + export KUBECONFIG=$(k3d kubeconfig get )\ - The zenml service-connector list-types and zenml service-connector describe-type - CLI commands can be used to explore the Service Connector Types available with - your ZenML deployment. Extensive documentation is included covering supported - authentication methods and Resource Types. The following are just some examples: + You can now use the kubectl client to talk to the cluster. - zenml service-connector list-types + Stack Recipe Deploy - Example Command Output + The steps for the stack recipe case should be the same as the ones listed above. + The only difference that you need to take into account is the name of the outputs + that contain your cluster name and the default regions. + + + Each recipe might have its own values and here''s how you can ascertain those + values. + + + For the cluster name, go into the outputs.tf file in the root directory and search + for the output that exposes the cluster name. + + + For the region, check out the variables.tf or the locals.tf file for the default + value assigned to it. + + + PreviousTroubleshoot the deployed server + + + NextCustom secret stores + + Last updated 10 months ago' + - 'ettings to specify AzureML step operator settings.Difference between stack component + settings at registration-time vs real-time - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━��━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━┓ + For stack-component-specific settings, you might be wondering what the difference + is between these and the configuration passed in while doing zenml stack-component + register --config1=configvalue --config2=configvalue, etc. The answer is + that the configuration passed in at registration time is static and fixed throughout + all pipeline runs, while the settings can change. - ┃ NAME │ TYPE │ RESOURCE TYPES │ AUTH - METHODS │ LOCAL │ REMOTE ┃ + A good example of this is the MLflow Experiment Tracker, where configuration which + remains static such as the tracking_url is sent through at registration time, + while runtime configuration such as the experiment_name (which might change every + pipeline run) is sent through as runtime settings. - ┠──────────────────────────────┼───────────────┼───────────────────────┼───────────────────┼───────┼────────┨' - - 'ing resources: + Even though settings can be overridden at runtime, you can also specify default + values for settings while configuring a stack component. For example, you could + set a default value for the nested setting of your MLflow experiment tracker: + zenml experiment-tracker register --flavor=mlflow --nested=True - ┏━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓┃ RESOURCE TYPE │ RESOURCE NAMES ┃ + This means that all pipelines that run using this experiment tracker use nested + MLflow runs unless overridden by specifying settings for the pipeline at runtime. - ┠───────────────┼────────────────┨ + Using the right key for Stack-component-specific settings - ┃ 📦 s3-bucket │ s3://zenfiles ┃ + When specifying stack-component-specific settings, a key needs to be passed. This + key should always correspond to the pattern: . - ┗━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛ + For example, the SagemakerStepOperator supports passing in estimator_args. The + way to specify this would be to use the key step_operator.sagemaker - The following might help understand the difference between scopes: + @step(step_operator="nameofstepoperator", settings= {"step_operator.sagemaker": + {"estimator_args": {"instance_type": "m7g.medium"}}}) - the difference between a multi-instance and a multi-type Service Connector is - that the Resource Type scope is locked to a particular value during configuration - for the multi-instance Service Connector + def my_step(): - similarly, the difference between a multi-instance and a multi-type Service Connector - is that the Resource Name (Resource ID) scope is locked to a particular value - during configuration for the single-instance Service Connector + ... - Service Connector Verification + # Using the class - When registering Service Connectors, the authentication configuration and credentials - are automatically verified to ensure that they can indeed be used to gain access - to the target resources: + @step(step_operator="nameofstepoperator", settings= {"step_operator.sagemaker": + SagemakerStepOperatorSettings(instance_type="m7g.medium")}) - for multi-type Service Connectors, this verification means checking that the configured - credentials can be used to authenticate successfully to the remote service, as - well as listing all resources that the credentials have permission to access for - each Resource Type supported by the Service Connector Type. + def my_step(): - for multi-instance Service Connectors, this verification step means listing all - resources that the credentials have permission to access in addition to validating - that the credentials can be used to authenticate to the target service or platform. + ... - for single-instance Service Connectors, the verification step simply checks that - the configured credentials have permission to access the target resource. + or in YAML: - The verification can also be performed later on an already registered Service - Connector. Furthermore, for multi-type and multi-instance Service Connectors, - the verification operation can be scoped to a Resource Type and a Resource Name. + steps: - The following shows how a multi-type, a multi-instance and a single-instance Service - Connector can be verified with multiple scopes after registration.' -- source_sentence: How long did it take to generate 1800+ questions from documentation - chunks using the local model on a GPU-enabled machine? + + my_step:' + - '_operator + + + @step(step_operator=step_operator.name)def step_on_spark(...) -> ...: + + + ... + + + Additional configuration + + + For additional configuration of the Spark step operator, you can pass SparkStepOperatorSettings + when defining or running your pipeline. Check out the SDK docs for a full list + of available attributes and this docs page for more information on how to specify + settings. + + + PreviousAzureML + + + NextDevelop a Custom Step Operator + + + Last updated 19 days ago' +- source_sentence: How can I register an Azure Service Connector for an ACR registry + in ZenML using the CLI? sentences: - - 'ns, especially using the basic setup we have here.To give you an indication of - how long this process takes, generating 1800+ questions from an equivalent number - of documentation chunks took a little over 45 minutes using the local model on - a GPU-enabled machine with Ollama. + - 'ure Container Registry to the remote ACR registry.To set up the Azure Container + Registry to authenticate to Azure and access an ACR registry, it is recommended + to leverage the many features provided by the Azure Service Connector such as + auto-configuration, local login, best security practices regarding long-lived + credentials and reusing the same credentials across multiple stack components. - You can view the generated dataset on the Hugging Face Hub here. This dataset - contains the original document chunks, the generated questions, and the URL reference - for the original document. + If you don''t already have an Azure Service Connector configured in your ZenML + deployment, you can register one using the interactive CLI command. You have the + option to configure an Azure Service Connector that can be used to access a ACR + registry or even more than one type of Azure resource: - Once we have the generated questions, we can then pass them to the retrieval component - and check the results. For convenience we load the data from the Hugging Face - Hub and then pass it to the retrieval component for evaluation. We shuffle the - data and select a subset of it to speed up the evaluation process, but for a more - thorough evaluation you could use the entire dataset. (The best practice of keeping - a separate set of data for evaluation purposes is also recommended here, though - we''re not doing that in this example.) + zenml service-connector register --type azure -i - @step + A non-interactive CLI example that uses Azure Service Principal credentials to + configure an Azure Service Connector targeting a single ACR registry is: - def retrieval_evaluation_full( + zenml service-connector register --type azure --auth-method service-principal + --tenant_id= --client_id= --client_secret= + --resource-type docker-registry --resource-id - sample_size: int = 50, + Example Command Output - ) -> Annotated[float, "full_failure_rate_retrieval"]: + $ zenml service-connector register azure-demo --type azure --auth-method service-principal + --tenant_id=a79f3633-8f45-4a74-a42e-68871c17b7fb --client_id=8926254a-8c3f-430a-a2fd-bdab234d491e + --client_secret=AzureSuperSecret --resource-type docker-registry --resource-id + demozenmlcontainerregistry.azurecr.io - dataset = load_dataset("zenml/rag_qa_embedding_questions", split="train") + ⠸ Registering service connector ''azure-demo''... - sampled_dataset = dataset.shuffle(seed=42).select(range(sample_size)) + Successfully registered service connector `azure-demo` with access to the following + resources: - total_tests = len(sampled_dataset) + ┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - failures = 0 + ┃ RESOURCE TYPE │ RESOURCE NAMES ┃ - for item in sampled_dataset: + ┠────────────────────┼───────────────────────────────────────┨ - generated_questions = item["generated_questions"] + ┃ 🐳 docker-registry │ demozenmlcontainerregistry.azurecr.io ┃ - question = generated_questions[ + ┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛' + - 'Default Container Registry - ] # Assuming only one question per item + Storing container images locally. - url_ending = item["filename"].split("/")[ + The Default container registry is a container registry flavor that comes built-in + with ZenML and allows container registry URIs of any format. - 1 + When to use it - ] # Extract the URL ending from the filename + You should use the Default container registry if you want to use a local container + registry or when using a remote container registry that is not covered by other + container registry flavors. - _, _, urls = query_similar_docs(question, url_ending) + Local registry URI format - if all(url_ending not in url for url in urls): + To specify a URI for a local container registry, use the following format: - logging.error( + localhost: - f"Failed for question: {question}. Expected URL ending: {url_ending}. Got: {urls}" + # Examples: - failures += 1 + localhost:5000 - logging.info(f"Total tests: {total_tests}. Failures: {failures}") + localhost:8000 - failure_rate = (failures / total_tests) * 100 + localhost:9999 - return round(failure_rate, 2)' - - '😸Set up a project repository + How to use it - Setting your team up for success with a project repository. + To use the Default container registry, we need: - ZenML code typically lives in a git repository. Setting this repository up correctly - can make a huge impact on collaboration and getting the maximum out of your ZenML - deployment. This section walks users through some of the options available to - create a project repository with ZenML. + Docker installed and running. - PreviousFinetuning LLMs with ZenML + The registry URI. If you''re using a local container registry, check out - NextConnect your git repository + the previous section on the URI format. - Last updated 15 days ago' - - 'GCP Service Connector + We can then register the container registry and use it in our active stack: - Configuring GCP Service Connectors to connect ZenML to GCP resources such as GCS - buckets, GKE Kubernetes clusters, and GCR container registries. + zenml container-registry register \ - The ZenML GCP Service Connector facilitates the authentication and access to managed - GCP services and resources. These encompass a range of resources, including GCS - buckets, GCR container repositories, and GKE clusters. The connector provides - support for various authentication methods, including GCP user accounts, service - accounts, short-lived OAuth 2.0 tokens, and implicit authentication. + --flavor=default \ - To ensure heightened security measures, this connector always issues short-lived - OAuth 2.0 tokens to clients instead of long-lived credentials unless explicitly - configured to do otherwise. Furthermore, it includes automatic configuration and - detection of credentials locally configured through the GCP CLI. + --uri= - This connector serves as a general means of accessing any GCP service by issuing - OAuth 2.0 credential objects to clients. Additionally, the connector can handle - specialized authentication for GCS, Docker, and Kubernetes Python clients. It - also allows for the configuration of local Docker and Kubernetes CLIs. + # Add the container registry to the active stack - $ zenml service-connector list-types --type gcp + zenml stack update -c - ┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━┓ + You may also need to set up authentication required to log in to the container + registry. - ┃ NAME │ TYPE │ RESOURCE TYPES │ AUTH METHODS │ - LOCAL │ REMOTE ┃ + Authentication Methods - ┠───────────────────────┼────────┼───────────────────────┼──────────────────┼───────┼────────┨ + If you are using a private container registry, you will need to configure some + form of authentication to login to the registry. If you''re looking for a quick + way to get started locally, you can use the Local Authentication method. However, + the recommended way to authenticate to a remote private container registry is + through a Docker Service Connector. - ┃ GCP Service Connector │ 🔵 gcp │ 🔵 gcp-generic │ implicit │ ✅ │ - ✅ ┃ + If your target private container registry comes from a cloud provider like AWS, + GCP or Azure, you should use the container registry flavor targeted at that cloud + provider. For example, if you''re using AWS, you should use the AWS Container + Registry flavor. These cloud provider flavors also use specialized cloud provider + Service Connectors to authenticate to the container registry.' + - 'egister gcp-demo-multi --type gcp --auto-configureExample Command Output - ┃ │ │ 📦 gcs-bucket │ user-account │ │ ┃ + ```text - ┃ │ │ 🌀 kubernetes-cluster │ service-account │ │ ┃ + Successfully registered service connector `gcp-demo-multi` with access to the + following resources: - ┃ │ │ 🐳 docker-registry │ external-account │ │ ┃ + ┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ │ │ │ oauth2-token │ │ ┃' -- source_sentence: How can I load and render reports in a Jupyter notebook using ZenML? - sentences: - - '❗Alerters + ┃ RESOURCE TYPE │ RESOURCE NAMES ┃ + + + ┠───────────────────────┼─────────────────────────────────────────────────┨ + + + ┃ 🔵 gcp-generic │ zenml-core ┃ + + + ┠───────────────────────┼─────────────────────────────────────────────────┨ - Sending automated alerts to chat services. + ┃ 📦 gcs-bucket │ gs://zenml-bucket-sl ┃ - Alerters allow you to send messages to chat services (like Slack, Discord, Mattermost, - etc.) from within your pipelines. This is useful to immediately get notified when - failures happen, for general monitoring/reporting, and also for building human-in-the-loop - ML. + ┃ │ gs://zenml-core.appspot.com ┃ - Alerter Flavors + ┃ │ gs://zenml-core_cloudbuild ┃ - Currently, the SlackAlerter and DiscordAlerter are the available alerter integrations. - However, it is straightforward to extend ZenML and build an alerter for other - chat services. + ┃ │ gs://zenml-datasets ┃ - Alerter Flavor Integration Notes Slack slack slack Interacts with a Slack channel - Discord discord discord Interacts with a Discord channel Custom Implementation - custom Extend the alerter abstraction and provide your own implementation + ┠───────────────────────┼─────────────────────────────────────────────────┨ - If you would like to see the available flavors of alerters in your terminal, you - can use the following command: + ┃ 🌀 kubernetes-cluster │ zenml-test-cluster ┃ - zenml alerter flavor list + ┠───────────────────────┼─────────────────────────────────────────────────┨ - How to use Alerters with ZenML + ┃ 🐳 docker-registry │ gcr.io/zenml-core ┃ - Each alerter integration comes with specific standard steps that you can use out - of the box. + ┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ - However, you first need to register an alerter component in your terminal: + ``` - zenml alerter register ... + **NOTE**: from this point forward, we don''t need the local GCP CLI credentials + or the local GCP CLI at all. The steps that follow can be run on any machine regardless + of whether it has been configured and authorized to access the GCP project. - Then you can add it to your stack using + 4. find out which GCS buckets, GCR registries, and GKE Kubernetes clusters we + can gain access to. We''ll use this information to configure the Stack Components + in our minimal GCP stack: a GCS Artifact Store, a Kubernetes Orchestrator, and + a GCP Container Registry. - zenml stack register ... -al + ```sh - Afterward, you can import the alerter standard steps provided by the respective - integration and directly use them in your pipelines. + zenml service-connector list-resources --resource-type gcs-bucket - PreviousDevelop a Custom Step Operator + ``` - NextDiscord Alerter + Example Command Output + + + ```text + + + The following ''gcs-bucket'' resources can be accessed by service connectors configured + in your workspace:' +- source_sentence: What resources does the `gcp-demo-multi` service connector have + access to after registration? + sentences: + - 'Find out which configuration was used for a run + + + Sometimes you might want to extract the used configuration from a pipeline that + has already run. You can do this simply by loading the pipeline run and accessing + its config attribute. + + + from zenml.client import Client + + + pipeline_run = Client().get_pipeline_run("") + + + configuration = pipeline_run.config + + + PreviousConfiguration hierarchy + + + NextAutogenerate a template yaml file Last updated 15 days ago' - - 'ry_similar_docs( + - 'onfig class and add your configuration parameters.Bring both the implementation + and the configuration together by inheriting from the BaseModelDeployerFlavor + class. Make sure that you give a name to the flavor through its abstract property. + + Create a service class that inherits from the BaseService class and implements + the abstract methods. This class will be used to represent the deployed model + server in ZenML. - question: str, + Once you are done with the implementation, you can register it through the CLI. + Please ensure you point to the flavor class via dot notation: - url_ending: str,use_reranking: bool = False, + zenml model-deployer flavor register - returned_sample_size: int = 5, + For example, if your flavor class MyModelDeployerFlavor is defined in flavors/my_flavor.py, + you''d register it by doing: - ) -> Tuple[str, str, List[str]]: + zenml model-deployer flavor register flavors.my_flavor.MyModelDeployerFlavor - """Query similar documents for a given question and URL ending.""" + ZenML resolves the flavor class by taking the path where you initialized zenml + (via zenml init) as the starting point of resolution. Therefore, please ensure + you follow the best practice of initializing zenml at the root of your repository. - embedded_question = get_embeddings(question) + If ZenML does not find an initialized ZenML repository in any parent directory, + it will default to the current working directory, but usually, it''s better to + not have to rely on this mechanism and initialize zenml at the root. - db_conn = get_db_conn() + Afterward, you should see the new flavor in the list of available flavors: - num_docs = 20 if use_reranking else returned_sample_size + zenml model-deployer flavor list - # get (content, url) tuples for the top n similar documents + It is important to draw attention to when and how these base abstractions are + coming into play in a ZenML workflow. - top_similar_docs = get_topn_similar_docs( + The CustomModelDeployerFlavor class is imported and utilized upon the creation + of the custom flavor through the CLI. - embedded_question, db_conn, n=num_docs, include_metadata=True + The CustomModelDeployerConfig class is imported when someone tries to register/update + a stack component with this custom flavor. Especially, during the registration + process of the stack component, the config will be used to validate the values + given by the user. As Config objects are inherently pydantic objects, you can + also add your own custom validators here.' + - 'egister gcp-demo-multi --type gcp --auto-configureExample Command Output - if use_reranking: + ```text - reranked_docs_and_urls = rerank_documents(question, top_similar_docs)[ + Successfully registered service connector `gcp-demo-multi` with access to the + following resources: - :returned_sample_size + ┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - urls = [doc[1] for doc in reranked_docs_and_urls] + + ┃ RESOURCE TYPE │ RESOURCE NAMES ┃ - else: + ┠───────────────────────┼─────────────────────────────────────────────────┨ - urls = [doc[1] for doc in top_similar_docs] # Unpacking URLs + ┃ 🔵 gcp-generic │ zenml-core ┃ - return (question, url_ending, urls) + ┠───────────────────────┼─────────────────────────────────────────────────┨ - We get the embeddings for the question being passed into the function and connect - to our PostgreSQL database. If we''re using reranking, we get the top 20 documents - similar to our query and rerank them using the rerank_documents helper function. - We then extract the URLs from the reranked documents and return them. Note that - we only return 5 URLs, but in the case of reranking we get a larger number of - documents and URLs back from the database to pass to our reranker, but in the - end we always choose the top five reranked documents to return. + ┃ 📦 gcs-bucket │ gs://zenml-bucket-sl ┃ - Now that we''ve added reranking to our pipeline, we can evaluate the performance - of our reranker and see how it affects the quality of the retrieved documents. + ┃ │ gs://zenml-core.appspot.com ┃ - Code Example + ┃ │ gs://zenml-core_cloudbuild ┃ - To explore the full code, visit the Complete Guide repository and for this section, - particularly the eval_retrieval.py file. + ┃ │ gs://zenml-datasets ┃ - PreviousUnderstanding reranking + ┠───────────────────────┼─────────────────────────────────────────────────┨ - NextEvaluating reranking performance + ┃ 🌀 kubernetes-cluster │ zenml-test-cluster ┃ - Last updated 1 month ago' - - 'n the respective artifact in the pipeline run DAG.Alternatively, if you are running - inside a Jupyter notebook, you can load and render the reports using the artifact.visualize() - method, e.g.: + ┠───────────────────────┼─────────────────────────────────────────────────┨ - from zenml.client import Client + ┃ 🐳 docker-registry │ gcr.io/zenml-core ┃ - def visualize_results(pipeline_name: str, step_name: str) -> None: + ┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ - pipeline = Client().get_pipeline(pipeline=pipeline_name) + ``` - evidently_step = pipeline.last_run.steps[step_name] + **NOTE**: from this point forward, we don''t need the local GCP CLI credentials + or the local GCP CLI at all. The steps that follow can be run on any machine regardless + of whether it has been configured and authorized to access the GCP project. - evidently_step.visualize() + 4. find out which GCS buckets, GCR registries, and GKE Kubernetes clusters we + can gain access to. We''ll use this information to configure the Stack Components + in our minimal GCP stack: a GCS Artifact Store, a Kubernetes Orchestrator, and + a GCP Container Registry. - if __name__ == "__main__": + ```sh - visualize_results("text_data_report_pipeline", "text_report") + zenml service-connector list-resources --resource-type gcs-bucket - visualize_results("text_data_test_pipeline", "text_test") + ``` - PreviousDeepchecks + Example Command Output - NextWhylogs + ```text - Last updated 19 days ago' -- source_sentence: How do you deploy the Comet Experiment Tracker flavor provided - by ZenML integration? + The following ''gcs-bucket'' resources can be accessed by service connectors configured + in your workspace:' +- source_sentence: What is the result of executing a Deepchecks test suite in ZenML? sentences: - - 'Comet + - 'urns: - Logging and visualizing experiments with Comet. + Deepchecks test suite execution result - The Comet Experiment Tracker is an Experiment Tracker flavor provided with the - Comet ZenML integration that uses the Comet experiment tracking platform to log - and visualize information from your pipeline steps (e.g., models, parameters, - metrics). + """# validation pre-processing (e.g. dataset preparation) can take place here - When would you want to use it? + data_validator = DeepchecksDataValidator.get_active_data_validator() - Comet is a popular platform that you would normally use in the iterative ML experimentation - phase to track and visualize experiment results. That doesn''t mean that it cannot - be repurposed to track and visualize the results produced by your automated pipeline - runs, as you make the transition towards a more production-oriented workflow. + suite = data_validator.data_validation( - You should use the Comet Experiment Tracker: + dataset=dataset, - if you have already been using Comet to track experiment results for your project - and would like to continue doing so as you are incorporating MLOps workflows and - best practices in your project through ZenML. + check_list=[ - if you are looking for a more visually interactive way of navigating the results - produced from your ZenML pipeline runs (e.g., models, metrics, datasets) + DeepchecksDataIntegrityCheck.TABULAR_OUTLIER_SAMPLE_DETECTION, - if you would like to connect ZenML to Comet to share the artifacts and metrics - logged by your pipelines with your team, organization, or external stakeholders + DeepchecksDataIntegrityCheck.TABULAR_STRING_LENGTH_OUT_OF_BOUNDS, - You should consider one of the other Experiment Tracker flavors if you have never - worked with Comet before and would rather use another experiment tracking tool - that you are more familiar with. + ], - How do you deploy it? + # validation post-processing (e.g. interpret results, take actions) can happen + here - The Comet Experiment Tracker flavor is provided by the Comet ZenML integration. - You need to install it on your local machine to be able to register a Comet Experiment - Tracker and add it to your stack: + return suite - zenml integration install comet -y + The arguments that the Deepchecks Data Validator methods can take in are the same + as those used for the Deepchecks standard steps. - The Comet Experiment Tracker needs to be configured with the credentials required - to connect to the Comet platform using one of the available authentication methods. + Have a look at the complete list of methods and parameters available in the DeepchecksDataValidator + API in the SDK docs. - Authentication Methods + Call Deepchecks directly - You need to configure the following credentials for authentication to the Comet - platform:' - - 'guration set up by the GCP CLI on your local host.The following is an example - of lifting GCP user credentials granting access to the same set of GCP resources - and services that the local GCP CLI is allowed to access. The GCP CLI should already - be configured with valid credentials (i.e. by running gcloud auth application-default - login). In this case, the GCP user account authentication method is automatically - detected: + You can use the Deepchecks library directly in your custom pipeline steps, and + only leverage ZenML''s capability of serializing, versioning and storing the SuiteResult + objects in its Artifact Store, e.g.: - zenml service-connector register gcp-auto --type gcp --auto-configure + import pandas as pd - Example Command Output + import deepchecks.tabular.checks as tabular_checks - Successfully registered service connector `gcp-auto` with access to the following - resources: + from deepchecks.core.suite import SuiteResult - ┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + from deepchecks.tabular import Suite - ┃ RESOURCE TYPE │ RESOURCE NAMES ┃ + from deepchecks.tabular import Dataset - ┠───────────────────────┼─────────────────────────────────────────────────┨ + from zenml import step - ┃ 🔵 gcp-generic │ zenml-core ┃ + @step - ┠───────────────────────┼─────────────────────────────────────────────────┨ + def data_integrity_check( - ┃ 📦 gcs-bucket │ gs://zenml-bucket-sl ┃ + dataset: pd.DataFrame, - ┃ │ gs://zenml-core.appspot.com ┃ + ) -> SuiteResult: - ┃ │ gs://zenml-core_cloudbuild ┃ + """Custom data integrity check step with Deepchecks - ┃ │ gs://zenml-datasets ┃ + Args: - ┃ │ gs://zenml-internal-artifact-store ┃ + dataset: a Pandas DataFrame - ┃ │ gs://zenml-kubeflow-artifact-store ┃ + Returns: - ┃ │ gs://zenml-project-time-series-bucket ┃ + Deepchecks test suite execution result - ┠───────────────────────┼─────────────────────────────────────────────────┨ + """ - ┃ 🌀 kubernetes-cluster │ zenml-test-cluster ┃ + # validation pre-processing (e.g. dataset preparation) can take place here - ┠───────────────────────┼─────────────────────────────────────────────────┨ + train_dataset = Dataset( - ┃ 🐳 docker-registry │ gcr.io/zenml-core ┃ + dataset, - ┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + label=''class'', + + cat_features=[''country'', ''state''] - zenml service-connector describe gcp-auto + suite = Suite(name="custom") - Example Command Output' - - 'er Image Builder stack component, or the Vertex AIOrchestrator and Step Operator. - It should be accompanied by a matching set of + check = tabular_checks.OutlierSampleDetection( - GCP permissions that allow access to the set of remote resources required by the + nearest_neighbors_percent=0.01, - client and Stack Component. + extent_parameter=3, - The resource name represents the GCP project that the connector is authorized - to + check.add_condition_outlier_ratio_less_or_equal( - access. + max_outliers_ratio=0.007, - 📦 GCP GCS bucket (resource type: gcs-bucket) + outlier_score_threshold=0.5, - Authentication methods: implicit, user-account, service-account, oauth2-token, + suite.add(check) - impersonation + check = tabular_checks.StringLengthOutOfBounds( - Supports resource instances: True + num_percentiles=1000, - Authentication methods: + min_unique_values=3, - 🔒 implicit + check.add_condition_number_of_outliers_less_or_equal( - 🔒 user-account + max_outliers=3,' + - 'ervice-principal - 🔒 service-account + ``` - 🔒 oauth2-token + Example Command Output - 🔒 impersonation + ```Successfully connected orchestrator `aks-demo-cluster` to the following resources: - Allows Stack Components to connect to GCS buckets. When used by Stack + ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - Components, they are provided a pre-configured GCS Python client instance. + ┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE + │ RESOURCE TYPE │ RESOURCE NAMES ┃ - The configured credentials must have at least the following GCP permissions + ┠──────────────────────────────────────┼─────────────────────────┼────────────────┼───────────────────────┼───────────────────────────────────────────────┨ - associated with the GCS buckets that it can access: + ┃ f2316191-d20b-4348-a68b-f5e347862196 │ azure-service-principal │ 🇦 azure │ + 🌀 kubernetes-cluster │ demo-zenml-demos/demo-zenml-terraform-cluster ┃ - storage.buckets.list + ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ - storage.buckets.get + ``` - storage.objects.create + Register and connect an Azure Container Registry Stack Component to an ACR container + registry:Copyzenml container-registry register acr-demo-registry --flavor azure + --uri=demozenmlcontainerregistry.azurecr.io - storage.objects.delete + Example Command Output - storage.objects.get + ``` - storage.objects.list + Successfully registered container_registry `acr-demo-registry`. - storage.objects.update + ``` - For example, the GCP Storage Admin role includes all of the required + ```sh - permissions, but it also includes additional permissions that are not required + zenml container-registry connect acr-demo-registry --connector azure-service-principal - by the connector. + ``` - If set, the resource name must identify a GCS bucket using one of the following + Example Command Output - formats: + ``` - GCS bucket URI: gs://{bucket-name} + Successfully connected container registry `acr-demo-registry` to the following + resources: - GCS bucket name: {bucket-name} + ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - [...] + ┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE + │ RESOURCE TYPE │ RESOURCE NAMES ┃ - ──────────────────────────────────────────────────────────────────────────────── + ┠──────────────────────────────────────┼─────────────────────────┼────────────────┼────────────────────┼───────────────────────────────────────┨ - Please select a resource type or leave it empty to create a connector that can - be used to access any of the supported resource types (gcp-generic, gcs-bucket, - kubernetes-cluster, docker-registry). []: gcs-bucket + ┃ f2316191-d20b-4348-a68b-f5e347862196 │ azure-service-principal │ 🇦 azure │ + 🐳 docker-registry │ demozenmlcontainerregistry.azurecr.io ┃' + - 'r │ zenhacks-cluster ┃┠───────────────────────┼──────────────────────────────────────────────┨ - Would you like to attempt auto-configuration to extract the authentication configuration - from your local environment ? [y/N]: y + ┃ 🐳 docker-registry │ 715803424590.dkr.ecr.us-east-1.amazonaws.com ┃ - Service connector auto-configured successfully with the following configuration: + ┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ - Service connector ''gcp-interactive'' of type ''gcp'' is ''private''. + The Service Connector configuration shows long-lived credentials were lifted from + the local environment and the AWS Session Token authentication method was configured: + + + zenml service-connector describe aws-session-token + + + Example Command Output - ''gcp-interactive'' gcp Service + Service connector ''aws-session-token'' of type ''aws'' with id ''3ae3e595-5cbc-446e-be64-e54e854e0e3f'' + is owned by user ''default'' and is ''private''. - Connector Details + ''aws-session-token'' aws Service Connector Details - ┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━┓' + + ┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + + + ┃ PROPERTY │ VALUE ┃ + + + ┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ + + + ┃ ID │ c0f8e857-47f9-418b-a60f-c3b03023da54 ┃ + + + ┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ + + + ┃ NAME │ aws-session-token ┃ + + + ┠──────────────────┼────────────────────────────────────────────────────────���────────────────┨ + + + ┃ TYPE │ 🔶 aws ┃ + + + ┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ + + + ┃ AUTH METHOD │ session-token ┃ + + + ┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ + + + ┃ RESOURCE TYPES │ 🔶 aws-generic, 📦 s3-bucket, 🌀 kubernetes-cluster, 🐳 docker-registry + ┃ + + + ┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨' model-index: - name: zenml/finetuned-snowflake-arctic-embed-m results: @@ -944,49 +1055,49 @@ model-index: type: dim_384 metrics: - type: cosine_accuracy@1 - value: 0.28313253012048195 + value: 0.3614457831325301 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.572289156626506 + value: 0.6987951807228916 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6807228915662651 + value: 0.7530120481927711 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.8012048192771084 + value: 0.8554216867469879 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.28313253012048195 + value: 0.3614457831325301 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.19076305220883527 + value: 0.23293172690763048 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.13614457831325297 + value: 0.15060240963855417 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.08012048192771083 + value: 0.08554216867469877 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.28313253012048195 + value: 0.3614457831325301 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.572289156626506 + value: 0.6987951807228916 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6807228915662651 + value: 0.7530120481927711 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.8012048192771084 + value: 0.8554216867469879 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5407472416922913 + value: 0.6194049451779184 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.45774765729585015 + value: 0.5427878179384205 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.46523155503040436 + value: 0.5472907234693755 name: Cosine Map@100 - task: type: information-retrieval @@ -996,49 +1107,49 @@ model-index: type: dim_256 metrics: - type: cosine_accuracy@1 - value: 0.29518072289156627 + value: 0.3433734939759036 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.6024096385542169 + value: 0.6807228915662651 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6807228915662651 + value: 0.7650602409638554 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7951807228915663 + value: 0.8373493975903614 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.29518072289156627 + value: 0.3433734939759036 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.2008032128514056 + value: 0.2269076305220883 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.13614457831325297 + value: 0.15301204819277103 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.0795180722891566 + value: 0.08373493975903612 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.29518072289156627 + value: 0.3433734939759036 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.6024096385542169 + value: 0.6807228915662651 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6807228915662651 + value: 0.7650602409638554 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7951807228915663 + value: 0.8373493975903614 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5458001676537428 + value: 0.602546157610675 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.46605230445591894 + value: 0.525891661885638 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4728738562350596 + value: 0.5310273317942533 name: Cosine Map@100 - task: type: information-retrieval @@ -1048,49 +1159,49 @@ model-index: type: dim_128 metrics: - type: cosine_accuracy@1 - value: 0.2469879518072289 + value: 0.3132530120481928 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5843373493975904 + value: 0.6265060240963856 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6265060240963856 + value: 0.7168674698795181 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7409638554216867 + value: 0.7891566265060241 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.2469879518072289 + value: 0.3132530120481928 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.19477911646586343 + value: 0.20883534136546178 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.12530120481927706 + value: 0.1433734939759036 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07409638554216866 + value: 0.0789156626506024 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.2469879518072289 + value: 0.3132530120481928 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5843373493975904 + value: 0.6265060240963856 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6265060240963856 + value: 0.7168674698795181 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7409638554216867 + value: 0.7891566265060241 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.4994853551416632 + value: 0.5630057581169484 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.421791929623255 + value: 0.4893144004589788 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4323899020969096 + value: 0.4960510164414996 name: Cosine Map@100 - task: type: information-retrieval @@ -1100,49 +1211,49 @@ model-index: type: dim_64 metrics: - type: cosine_accuracy@1 - value: 0.23493975903614459 + value: 0.25903614457831325 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5 + value: 0.5120481927710844 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.5783132530120482 + value: 0.6325301204819277 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.6927710843373494 + value: 0.7168674698795181 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.23493975903614459 + value: 0.25903614457831325 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.16666666666666666 + value: 0.17068273092369476 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.11566265060240961 + value: 0.12650602409638553 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.06927710843373491 + value: 0.07168674698795179 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.23493975903614459 + value: 0.25903614457831325 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5 + value: 0.5120481927710844 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.5783132530120482 + value: 0.6325301204819277 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.6927710843373494 + value: 0.7168674698795181 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.4607453075643617 + value: 0.48618223058871674 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.38742589405240024 + value: 0.41233027347485207 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.3969546791348258 + value: 0.42094598177412385 name: Cosine Map@100 --- @@ -1196,9 +1307,9 @@ from sentence_transformers import SentenceTransformer model = SentenceTransformer("zenml/finetuned-snowflake-arctic-embed-m") # Run inference sentences = [ - 'How do you deploy the Comet Experiment Tracker flavor provided by ZenML integration?', - "Comet\n\nLogging and visualizing experiments with Comet.\n\nThe Comet Experiment Tracker is an Experiment Tracker flavor provided with the Comet ZenML integration that uses the Comet experiment tracking platform to log and visualize information from your pipeline steps (e.g., models, parameters, metrics).\n\nWhen would you want to use it?\n\nComet is a popular platform that you would normally use in the iterative ML experimentation phase to track and visualize experiment results. That doesn't mean that it cannot be repurposed to track and visualize the results produced by your automated pipeline runs, as you make the transition towards a more production-oriented workflow.\n\nYou should use the Comet Experiment Tracker:\n\nif you have already been using Comet to track experiment results for your project and would like to continue doing so as you are incorporating MLOps workflows and best practices in your project through ZenML.\n\nif you are looking for a more visually interactive way of navigating the results produced from your ZenML pipeline runs (e.g., models, metrics, datasets)\n\nif you would like to connect ZenML to Comet to share the artifacts and metrics logged by your pipelines with your team, organization, or external stakeholders\n\nYou should consider one of the other Experiment Tracker flavors if you have never worked with Comet before and would rather use another experiment tracking tool that you are more familiar with.\n\nHow do you deploy it?\n\nThe Comet Experiment Tracker flavor is provided by the Comet ZenML integration. You need to install it on your local machine to be able to register a Comet Experiment Tracker and add it to your stack:\n\nzenml integration install comet -y\n\nThe Comet Experiment Tracker needs to be configured with the credentials required to connect to the Comet platform using one of the available authentication methods.\n\nAuthentication Methods\n\nYou need to configure the following credentials for authentication to the Comet platform:", - "er Image Builder stack component, or the Vertex AIOrchestrator and Step Operator. It should be accompanied by a matching set of\n\nGCP permissions that allow access to the set of remote resources required by the\n\nclient and Stack Component.\n\nThe resource name represents the GCP project that the connector is authorized to\n\naccess.\n\n📦 GCP GCS bucket (resource type: gcs-bucket)\n\nAuthentication methods: implicit, user-account, service-account, oauth2-token,\n\nimpersonation\n\nSupports resource instances: True\n\nAuthentication methods:\n\n🔒 implicit\n\n🔒 user-account\n\n🔒 service-account\n\n🔒 oauth2-token\n\n🔒 impersonation\n\nAllows Stack Components to connect to GCS buckets. When used by Stack\n\nComponents, they are provided a pre-configured GCS Python client instance.\n\nThe configured credentials must have at least the following GCP permissions\n\nassociated with the GCS buckets that it can access:\n\nstorage.buckets.list\n\nstorage.buckets.get\n\nstorage.objects.create\n\nstorage.objects.delete\n\nstorage.objects.get\n\nstorage.objects.list\n\nstorage.objects.update\n\nFor example, the GCP Storage Admin role includes all of the required\n\npermissions, but it also includes additional permissions that are not required\n\nby the connector.\n\nIf set, the resource name must identify a GCS bucket using one of the following\n\nformats:\n\nGCS bucket URI: gs://{bucket-name}\n\nGCS bucket name: {bucket-name}\n\n[...]\n\n────────────────────────────────────────────────────────────────────────────────\n\nPlease select a resource type or leave it empty to create a connector that can be used to access any of the supported resource types (gcp-generic, gcs-bucket, kubernetes-cluster, docker-registry). []: gcs-bucket\n\nWould you like to attempt auto-configuration to extract the authentication configuration from your local environment ? [y/N]: y\n\nService connector auto-configured successfully with the following configuration:\n\nService connector 'gcp-interactive' of type 'gcp' is 'private'.\n\n'gcp-interactive' gcp Service\n\nConnector Details\n\n┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━┓", + 'What is the result of executing a Deepchecks test suite in ZenML?', + 'urns:\n\nDeepchecks test suite execution result\n\n"""# validation pre-processing (e.g. dataset preparation) can take place here\n\ndata_validator = DeepchecksDataValidator.get_active_data_validator()\n\nsuite = data_validator.data_validation(\n\ndataset=dataset,\n\ncheck_list=[\n\nDeepchecksDataIntegrityCheck.TABULAR_OUTLIER_SAMPLE_DETECTION,\n\nDeepchecksDataIntegrityCheck.TABULAR_STRING_LENGTH_OUT_OF_BOUNDS,\n\n],\n\n# validation post-processing (e.g. interpret results, take actions) can happen here\n\nreturn suite\n\nThe arguments that the Deepchecks Data Validator methods can take in are the same as those used for the Deepchecks standard steps.\n\nHave a look at the complete list of methods and parameters available in the DeepchecksDataValidator API in the SDK docs.\n\nCall Deepchecks directly\n\nYou can use the Deepchecks library directly in your custom pipeline steps, and only leverage ZenML\'s capability of serializing, versioning and storing the SuiteResult objects in its Artifact Store, e.g.:\n\nimport pandas as pd\n\nimport deepchecks.tabular.checks as tabular_checks\n\nfrom deepchecks.core.suite import SuiteResult\n\nfrom deepchecks.tabular import Suite\n\nfrom deepchecks.tabular import Dataset\n\nfrom zenml import step\n\n@step\n\ndef data_integrity_check(\n\ndataset: pd.DataFrame,\n\n) -> SuiteResult:\n\n"""Custom data integrity check step with Deepchecks\n\nArgs:\n\ndataset: a Pandas DataFrame\n\nReturns:\n\nDeepchecks test suite execution result\n\n"""\n\n# validation pre-processing (e.g. dataset preparation) can take place here\n\ntrain_dataset = Dataset(\n\ndataset,\n\nlabel=\'class\',\n\ncat_features=[\'country\', \'state\']\n\nsuite = Suite(name="custom")\n\ncheck = tabular_checks.OutlierSampleDetection(\n\nnearest_neighbors_percent=0.01,\n\nextent_parameter=3,\n\ncheck.add_condition_outlier_ratio_less_or_equal(\n\nmax_outliers_ratio=0.007,\n\noutlier_score_threshold=0.5,\n\nsuite.add(check)\n\ncheck = tabular_checks.StringLengthOutOfBounds(\n\nnum_percentiles=1000,\n\nmin_unique_values=3,\n\ncheck.add_condition_number_of_outliers_less_or_equal(\n\nmax_outliers=3,', + "r │ zenhacks-cluster ┃┠───────────────────────┼──────────────────────────────────────────────┨\n\n┃ 🐳 docker-registry │ 715803424590.dkr.ecr.us-east-1.amazonaws.com ┃\n\n┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛\n\nThe Service Connector configuration shows long-lived credentials were lifted from the local environment and the AWS Session Token authentication method was configured:\n\nzenml service-connector describe aws-session-token\n\nExample Command Output\n\nService connector 'aws-session-token' of type 'aws' with id '3ae3e595-5cbc-446e-be64-e54e854e0e3f' is owned by user 'default' and is 'private'.\n\n'aws-session-token' aws Service Connector Details\n\n┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n\n┃ PROPERTY │ VALUE ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ ID │ c0f8e857-47f9-418b-a60f-c3b03023da54 ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ NAME │ aws-session-token ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ TYPE │ 🔶 aws ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ AUTH METHOD │ session-token ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ RESOURCE TYPES │ 🔶 aws-generic, 📦 s3-bucket, 🌀 kubernetes-cluster, 🐳 docker-registry ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨", ] embeddings = model.encode(sentences) print(embeddings.shape) @@ -1244,43 +1355,43 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.2831 | -| cosine_accuracy@3 | 0.5723 | -| cosine_accuracy@5 | 0.6807 | -| cosine_accuracy@10 | 0.8012 | -| cosine_precision@1 | 0.2831 | -| cosine_precision@3 | 0.1908 | -| cosine_precision@5 | 0.1361 | -| cosine_precision@10 | 0.0801 | -| cosine_recall@1 | 0.2831 | -| cosine_recall@3 | 0.5723 | -| cosine_recall@5 | 0.6807 | -| cosine_recall@10 | 0.8012 | -| cosine_ndcg@10 | 0.5407 | -| cosine_mrr@10 | 0.4577 | -| **cosine_map@100** | **0.4652** | +| cosine_accuracy@1 | 0.3614 | +| cosine_accuracy@3 | 0.6988 | +| cosine_accuracy@5 | 0.753 | +| cosine_accuracy@10 | 0.8554 | +| cosine_precision@1 | 0.3614 | +| cosine_precision@3 | 0.2329 | +| cosine_precision@5 | 0.1506 | +| cosine_precision@10 | 0.0855 | +| cosine_recall@1 | 0.3614 | +| cosine_recall@3 | 0.6988 | +| cosine_recall@5 | 0.753 | +| cosine_recall@10 | 0.8554 | +| cosine_ndcg@10 | 0.6194 | +| cosine_mrr@10 | 0.5428 | +| **cosine_map@100** | **0.5473** | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) -| Metric | Value | -|:--------------------|:-----------| -| cosine_accuracy@1 | 0.2952 | -| cosine_accuracy@3 | 0.6024 | -| cosine_accuracy@5 | 0.6807 | -| cosine_accuracy@10 | 0.7952 | -| cosine_precision@1 | 0.2952 | -| cosine_precision@3 | 0.2008 | -| cosine_precision@5 | 0.1361 | -| cosine_precision@10 | 0.0795 | -| cosine_recall@1 | 0.2952 | -| cosine_recall@3 | 0.6024 | -| cosine_recall@5 | 0.6807 | -| cosine_recall@10 | 0.7952 | -| cosine_ndcg@10 | 0.5458 | -| cosine_mrr@10 | 0.4661 | -| **cosine_map@100** | **0.4729** | +| Metric | Value | +|:--------------------|:----------| +| cosine_accuracy@1 | 0.3434 | +| cosine_accuracy@3 | 0.6807 | +| cosine_accuracy@5 | 0.7651 | +| cosine_accuracy@10 | 0.8373 | +| cosine_precision@1 | 0.3434 | +| cosine_precision@3 | 0.2269 | +| cosine_precision@5 | 0.153 | +| cosine_precision@10 | 0.0837 | +| cosine_recall@1 | 0.3434 | +| cosine_recall@3 | 0.6807 | +| cosine_recall@5 | 0.7651 | +| cosine_recall@10 | 0.8373 | +| cosine_ndcg@10 | 0.6025 | +| cosine_mrr@10 | 0.5259 | +| **cosine_map@100** | **0.531** | #### Information Retrieval * Dataset: `dim_128` @@ -1288,43 +1399,43 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.247 | -| cosine_accuracy@3 | 0.5843 | -| cosine_accuracy@5 | 0.6265 | -| cosine_accuracy@10 | 0.741 | -| cosine_precision@1 | 0.247 | -| cosine_precision@3 | 0.1948 | -| cosine_precision@5 | 0.1253 | -| cosine_precision@10 | 0.0741 | -| cosine_recall@1 | 0.247 | -| cosine_recall@3 | 0.5843 | -| cosine_recall@5 | 0.6265 | -| cosine_recall@10 | 0.741 | -| cosine_ndcg@10 | 0.4995 | -| cosine_mrr@10 | 0.4218 | -| **cosine_map@100** | **0.4324** | +| cosine_accuracy@1 | 0.3133 | +| cosine_accuracy@3 | 0.6265 | +| cosine_accuracy@5 | 0.7169 | +| cosine_accuracy@10 | 0.7892 | +| cosine_precision@1 | 0.3133 | +| cosine_precision@3 | 0.2088 | +| cosine_precision@5 | 0.1434 | +| cosine_precision@10 | 0.0789 | +| cosine_recall@1 | 0.3133 | +| cosine_recall@3 | 0.6265 | +| cosine_recall@5 | 0.7169 | +| cosine_recall@10 | 0.7892 | +| cosine_ndcg@10 | 0.563 | +| cosine_mrr@10 | 0.4893 | +| **cosine_map@100** | **0.4961** | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) -| Metric | Value | -|:--------------------|:----------| -| cosine_accuracy@1 | 0.2349 | -| cosine_accuracy@3 | 0.5 | -| cosine_accuracy@5 | 0.5783 | -| cosine_accuracy@10 | 0.6928 | -| cosine_precision@1 | 0.2349 | -| cosine_precision@3 | 0.1667 | -| cosine_precision@5 | 0.1157 | -| cosine_precision@10 | 0.0693 | -| cosine_recall@1 | 0.2349 | -| cosine_recall@3 | 0.5 | -| cosine_recall@5 | 0.5783 | -| cosine_recall@10 | 0.6928 | -| cosine_ndcg@10 | 0.4607 | -| cosine_mrr@10 | 0.3874 | -| **cosine_map@100** | **0.397** | +| Metric | Value | +|:--------------------|:-----------| +| cosine_accuracy@1 | 0.259 | +| cosine_accuracy@3 | 0.512 | +| cosine_accuracy@5 | 0.6325 | +| cosine_accuracy@10 | 0.7169 | +| cosine_precision@1 | 0.259 | +| cosine_precision@3 | 0.1707 | +| cosine_precision@5 | 0.1265 | +| cosine_precision@10 | 0.0717 | +| cosine_recall@1 | 0.259 | +| cosine_recall@3 | 0.512 | +| cosine_recall@5 | 0.6325 | +| cosine_recall@10 | 0.7169 | +| cosine_ndcg@10 | 0.4862 | +| cosine_mrr@10 | 0.4123 | +| **cosine_map@100** | **0.4209** |