Interface InputDataConfigOrBuilder

  • All Superinterfaces:
    com.google.protobuf.MessageLiteOrBuilder, com.google.protobuf.MessageOrBuilder
    All Known Implementing Classes:
    InputDataConfig, InputDataConfig.Builder

    public interface InputDataConfigOrBuilder
    extends com.google.protobuf.MessageOrBuilder
    • Method Detail

      • hasFractionSplit

        boolean hasFractionSplit()
         Split based on fractions defining the size of each set.
         
        .google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
        Returns:
        Whether the fractionSplit field is set.
      • getFractionSplit

        FractionSplit getFractionSplit()
         Split based on fractions defining the size of each set.
         
        .google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
        Returns:
        The fractionSplit.
      • getFractionSplitOrBuilder

        FractionSplitOrBuilder getFractionSplitOrBuilder()
         Split based on fractions defining the size of each set.
         
        .google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
      • hasFilterSplit

        boolean hasFilterSplit()
         Split based on the provided filters for each set.
         
        .google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
        Returns:
        Whether the filterSplit field is set.
      • getFilterSplit

        FilterSplit getFilterSplit()
         Split based on the provided filters for each set.
         
        .google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
        Returns:
        The filterSplit.
      • getFilterSplitOrBuilder

        FilterSplitOrBuilder getFilterSplitOrBuilder()
         Split based on the provided filters for each set.
         
        .google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
      • hasPredefinedSplit

        boolean hasPredefinedSplit()
         Supported only for tabular Datasets.
        
         Split based on a predefined key.
         
        .google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
        Returns:
        Whether the predefinedSplit field is set.
      • getPredefinedSplit

        PredefinedSplit getPredefinedSplit()
         Supported only for tabular Datasets.
        
         Split based on a predefined key.
         
        .google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
        Returns:
        The predefinedSplit.
      • getPredefinedSplitOrBuilder

        PredefinedSplitOrBuilder getPredefinedSplitOrBuilder()
         Supported only for tabular Datasets.
        
         Split based on a predefined key.
         
        .google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
      • hasTimestampSplit

        boolean hasTimestampSplit()
         Supported only for tabular Datasets.
        
         Split based on the timestamp of the input data pieces.
         
        .google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
        Returns:
        Whether the timestampSplit field is set.
      • getTimestampSplit

        TimestampSplit getTimestampSplit()
         Supported only for tabular Datasets.
        
         Split based on the timestamp of the input data pieces.
         
        .google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
        Returns:
        The timestampSplit.
      • getTimestampSplitOrBuilder

        TimestampSplitOrBuilder getTimestampSplitOrBuilder()
         Supported only for tabular Datasets.
        
         Split based on the timestamp of the input data pieces.
         
        .google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
      • hasStratifiedSplit

        boolean hasStratifiedSplit()
         Supported only for tabular Datasets.
        
         Split based on the distribution of the specified column.
         
        .google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
        Returns:
        Whether the stratifiedSplit field is set.
      • getStratifiedSplit

        StratifiedSplit getStratifiedSplit()
         Supported only for tabular Datasets.
        
         Split based on the distribution of the specified column.
         
        .google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
        Returns:
        The stratifiedSplit.
      • getStratifiedSplitOrBuilder

        StratifiedSplitOrBuilder getStratifiedSplitOrBuilder()
         Supported only for tabular Datasets.
        
         Split based on the distribution of the specified column.
         
        .google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
      • hasGcsDestination

        boolean hasGcsDestination()
         The Cloud Storage location where the training data is to be
         written to. In the given directory a new directory is created with
         name:
         `dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>`
         where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
         All training input data is written into that directory.
        
         The Vertex AI environment variables representing Cloud Storage
         data URIs are represented in the Cloud Storage wildcard
         format to support sharded data. e.g.: "gs://.../training-*.jsonl"
        
         * AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
         * AIP_TRAINING_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
        
         * AIP_VALIDATION_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
        
         * AIP_TEST_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
         
        .google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
        Returns:
        Whether the gcsDestination field is set.
      • getGcsDestination

        GcsDestination getGcsDestination()
         The Cloud Storage location where the training data is to be
         written to. In the given directory a new directory is created with
         name:
         `dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>`
         where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
         All training input data is written into that directory.
        
         The Vertex AI environment variables representing Cloud Storage
         data URIs are represented in the Cloud Storage wildcard
         format to support sharded data. e.g.: "gs://.../training-*.jsonl"
        
         * AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
         * AIP_TRAINING_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
        
         * AIP_VALIDATION_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
        
         * AIP_TEST_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
         
        .google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
        Returns:
        The gcsDestination.
      • getGcsDestinationOrBuilder

        GcsDestinationOrBuilder getGcsDestinationOrBuilder()
         The Cloud Storage location where the training data is to be
         written to. In the given directory a new directory is created with
         name:
         `dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>`
         where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
         All training input data is written into that directory.
        
         The Vertex AI environment variables representing Cloud Storage
         data URIs are represented in the Cloud Storage wildcard
         format to support sharded data. e.g.: "gs://.../training-*.jsonl"
        
         * AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
         * AIP_TRAINING_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}"
        
         * AIP_VALIDATION_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}"
        
         * AIP_TEST_DATA_URI =
         "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
         
        .google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
      • hasBigqueryDestination

        boolean hasBigqueryDestination()
         Only applicable to custom training with tabular Dataset with BigQuery
         source.
        
         The BigQuery project location where the training data is to be written
         to. In the given project a new dataset is created with name
         `dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>`
         where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
         input data is written into that dataset. In the dataset three
         tables are created, `training`, `validation` and `test`.
        
         * AIP_DATA_FORMAT = "bigquery".
         * AIP_TRAINING_DATA_URI  =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.training"
        
         * AIP_VALIDATION_DATA_URI =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.validation"
        
         * AIP_TEST_DATA_URI =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.test"
         
        .google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
        Returns:
        Whether the bigqueryDestination field is set.
      • getBigqueryDestination

        BigQueryDestination getBigqueryDestination()
         Only applicable to custom training with tabular Dataset with BigQuery
         source.
        
         The BigQuery project location where the training data is to be written
         to. In the given project a new dataset is created with name
         `dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>`
         where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
         input data is written into that dataset. In the dataset three
         tables are created, `training`, `validation` and `test`.
        
         * AIP_DATA_FORMAT = "bigquery".
         * AIP_TRAINING_DATA_URI  =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.training"
        
         * AIP_VALIDATION_DATA_URI =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.validation"
        
         * AIP_TEST_DATA_URI =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.test"
         
        .google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
        Returns:
        The bigqueryDestination.
      • getBigqueryDestinationOrBuilder

        BigQueryDestinationOrBuilder getBigqueryDestinationOrBuilder()
         Only applicable to custom training with tabular Dataset with BigQuery
         source.
        
         The BigQuery project location where the training data is to be written
         to. In the given project a new dataset is created with name
         `dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>`
         where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
         input data is written into that dataset. In the dataset three
         tables are created, `training`, `validation` and `test`.
        
         * AIP_DATA_FORMAT = "bigquery".
         * AIP_TRAINING_DATA_URI  =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.training"
        
         * AIP_VALIDATION_DATA_URI =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.validation"
        
         * AIP_TEST_DATA_URI =
         "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.test"
         
        .google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
      • getDatasetId

        String getDatasetId()
         Required. The ID of the Dataset in the same Project and Location which data
         will be used to train the Model. The Dataset must use schema compatible
         with Model being trained, and what is compatible should be described in the
         used TrainingPipeline's [training_task_definition]
         [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition].
         For tabular Datasets, all their data is exported to training, to pick
         and choose from.
         
        string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
        Returns:
        The datasetId.
      • getDatasetIdBytes

        com.google.protobuf.ByteString getDatasetIdBytes()
         Required. The ID of the Dataset in the same Project and Location which data
         will be used to train the Model. The Dataset must use schema compatible
         with Model being trained, and what is compatible should be described in the
         used TrainingPipeline's [training_task_definition]
         [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition].
         For tabular Datasets, all their data is exported to training, to pick
         and choose from.
         
        string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
        Returns:
        The bytes for datasetId.
      • getAnnotationsFilter

        String getAnnotationsFilter()
         Applicable only to Datasets that have DataItems and Annotations.
        
         A filter on Annotations of the Dataset. Only Annotations that both
         match this filter and belong to DataItems not ignored by the split method
         are used in respectively training, validation or test role, depending on
         the role of the DataItem they are on (for the auto-assigned that role is
         decided by Vertex AI). A filter with same syntax as the one used in
         [ListAnnotations][google.cloud.aiplatform.v1.DatasetService.ListAnnotations]
         may be used, but note here it filters across all Annotations of the
         Dataset, and not just within a single DataItem.
         
        string annotations_filter = 6;
        Returns:
        The annotationsFilter.
      • getAnnotationsFilterBytes

        com.google.protobuf.ByteString getAnnotationsFilterBytes()
         Applicable only to Datasets that have DataItems and Annotations.
        
         A filter on Annotations of the Dataset. Only Annotations that both
         match this filter and belong to DataItems not ignored by the split method
         are used in respectively training, validation or test role, depending on
         the role of the DataItem they are on (for the auto-assigned that role is
         decided by Vertex AI). A filter with same syntax as the one used in
         [ListAnnotations][google.cloud.aiplatform.v1.DatasetService.ListAnnotations]
         may be used, but note here it filters across all Annotations of the
         Dataset, and not just within a single DataItem.
         
        string annotations_filter = 6;
        Returns:
        The bytes for annotationsFilter.
      • getAnnotationSchemaUri

        String getAnnotationSchemaUri()
         Applicable only to custom training with Datasets that have DataItems and
         Annotations.
        
         Cloud Storage URI that points to a YAML file describing the annotation
         schema. The schema is defined as an OpenAPI 3.0.2 [Schema
         Object](https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#schemaObject).
         The schema files that can be used here are found in
         gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the
         chosen schema must be consistent with
         [metadata][google.cloud.aiplatform.v1.Dataset.metadata_schema_uri] of the
         Dataset specified by
         [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id].
        
         Only Annotations that both match this schema and belong to DataItems not
         ignored by the split method are used in respectively training, validation
         or test role, depending on the role of the DataItem they are on.
        
         When used in conjunction with
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter],
         the Annotations used for training are filtered by both
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter]
         and
         [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri].
         
        string annotation_schema_uri = 9;
        Returns:
        The annotationSchemaUri.
      • getAnnotationSchemaUriBytes

        com.google.protobuf.ByteString getAnnotationSchemaUriBytes()
         Applicable only to custom training with Datasets that have DataItems and
         Annotations.
        
         Cloud Storage URI that points to a YAML file describing the annotation
         schema. The schema is defined as an OpenAPI 3.0.2 [Schema
         Object](https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#schemaObject).
         The schema files that can be used here are found in
         gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the
         chosen schema must be consistent with
         [metadata][google.cloud.aiplatform.v1.Dataset.metadata_schema_uri] of the
         Dataset specified by
         [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id].
        
         Only Annotations that both match this schema and belong to DataItems not
         ignored by the split method are used in respectively training, validation
         or test role, depending on the role of the DataItem they are on.
        
         When used in conjunction with
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter],
         the Annotations used for training are filtered by both
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter]
         and
         [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri].
         
        string annotation_schema_uri = 9;
        Returns:
        The bytes for annotationSchemaUri.
      • getSavedQueryId

        String getSavedQueryId()
         Only applicable to Datasets that have SavedQueries.
        
         The ID of a SavedQuery (annotation set) under the Dataset specified by
         [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id] used
         for filtering Annotations for training.
        
         Only Annotations that are associated with this SavedQuery are used in
         respectively training. When used in conjunction with
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter],
         the Annotations used for training are filtered by both
         [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id]
         and
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter].
        
         Only one of
         [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id]
         and
         [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri]
         should be specified as both of them represent the same thing: problem type.
         
        string saved_query_id = 7;
        Returns:
        The savedQueryId.
      • getSavedQueryIdBytes

        com.google.protobuf.ByteString getSavedQueryIdBytes()
         Only applicable to Datasets that have SavedQueries.
        
         The ID of a SavedQuery (annotation set) under the Dataset specified by
         [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id] used
         for filtering Annotations for training.
        
         Only Annotations that are associated with this SavedQuery are used in
         respectively training. When used in conjunction with
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter],
         the Annotations used for training are filtered by both
         [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id]
         and
         [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter].
        
         Only one of
         [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id]
         and
         [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri]
         should be specified as both of them represent the same thing: problem type.
         
        string saved_query_id = 7;
        Returns:
        The bytes for savedQueryId.
      • getPersistMlUseAssignment

        boolean getPersistMlUseAssignment()
         Whether to persist the ML use assignment to data item system labels.
         
        bool persist_ml_use_assignment = 11;
        Returns:
        The persistMlUseAssignment.