Package com.google.cloud.aiplatform.v1
Interface InputDataConfigOrBuilder
-
- All Superinterfaces:
com.google.protobuf.MessageLiteOrBuilder
,com.google.protobuf.MessageOrBuilder
- All Known Implementing Classes:
InputDataConfig
,InputDataConfig.Builder
public interface InputDataConfigOrBuilder extends com.google.protobuf.MessageOrBuilder
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description String
getAnnotationSchemaUri()
Applicable only to custom training with Datasets that have DataItems and Annotations.com.google.protobuf.ByteString
getAnnotationSchemaUriBytes()
Applicable only to custom training with Datasets that have DataItems and Annotations.String
getAnnotationsFilter()
Applicable only to Datasets that have DataItems and Annotations.com.google.protobuf.ByteString
getAnnotationsFilterBytes()
Applicable only to Datasets that have DataItems and Annotations.BigQueryDestination
getBigqueryDestination()
Only applicable to custom training with tabular Dataset with BigQuery source.BigQueryDestinationOrBuilder
getBigqueryDestinationOrBuilder()
Only applicable to custom training with tabular Dataset with BigQuery source.String
getDatasetId()
Required.com.google.protobuf.ByteString
getDatasetIdBytes()
Required.InputDataConfig.DestinationCase
getDestinationCase()
FilterSplit
getFilterSplit()
Split based on the provided filters for each set.FilterSplitOrBuilder
getFilterSplitOrBuilder()
Split based on the provided filters for each set.FractionSplit
getFractionSplit()
Split based on fractions defining the size of each set.FractionSplitOrBuilder
getFractionSplitOrBuilder()
Split based on fractions defining the size of each set.GcsDestination
getGcsDestination()
The Cloud Storage location where the training data is to be written to.GcsDestinationOrBuilder
getGcsDestinationOrBuilder()
The Cloud Storage location where the training data is to be written to.boolean
getPersistMlUseAssignment()
Whether to persist the ML use assignment to data item system labels.PredefinedSplit
getPredefinedSplit()
Supported only for tabular Datasets.PredefinedSplitOrBuilder
getPredefinedSplitOrBuilder()
Supported only for tabular Datasets.String
getSavedQueryId()
Only applicable to Datasets that have SavedQueries.com.google.protobuf.ByteString
getSavedQueryIdBytes()
Only applicable to Datasets that have SavedQueries.InputDataConfig.SplitCase
getSplitCase()
StratifiedSplit
getStratifiedSplit()
Supported only for tabular Datasets.StratifiedSplitOrBuilder
getStratifiedSplitOrBuilder()
Supported only for tabular Datasets.TimestampSplit
getTimestampSplit()
Supported only for tabular Datasets.TimestampSplitOrBuilder
getTimestampSplitOrBuilder()
Supported only for tabular Datasets.boolean
hasBigqueryDestination()
Only applicable to custom training with tabular Dataset with BigQuery source.boolean
hasFilterSplit()
Split based on the provided filters for each set.boolean
hasFractionSplit()
Split based on fractions defining the size of each set.boolean
hasGcsDestination()
The Cloud Storage location where the training data is to be written to.boolean
hasPredefinedSplit()
Supported only for tabular Datasets.boolean
hasStratifiedSplit()
Supported only for tabular Datasets.boolean
hasTimestampSplit()
Supported only for tabular Datasets.-
Methods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getDefaultInstanceForType, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof
-
-
-
-
Method Detail
-
hasFractionSplit
boolean hasFractionSplit()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
- Returns:
- Whether the fractionSplit field is set.
-
getFractionSplit
FractionSplit getFractionSplit()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
- Returns:
- The fractionSplit.
-
getFractionSplitOrBuilder
FractionSplitOrBuilder getFractionSplitOrBuilder()
Split based on fractions defining the size of each set.
.google.cloud.aiplatform.v1.FractionSplit fraction_split = 2;
-
hasFilterSplit
boolean hasFilterSplit()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
- Returns:
- Whether the filterSplit field is set.
-
getFilterSplit
FilterSplit getFilterSplit()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
- Returns:
- The filterSplit.
-
getFilterSplitOrBuilder
FilterSplitOrBuilder getFilterSplitOrBuilder()
Split based on the provided filters for each set.
.google.cloud.aiplatform.v1.FilterSplit filter_split = 3;
-
hasPredefinedSplit
boolean hasPredefinedSplit()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
- Returns:
- Whether the predefinedSplit field is set.
-
getPredefinedSplit
PredefinedSplit getPredefinedSplit()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
- Returns:
- The predefinedSplit.
-
getPredefinedSplitOrBuilder
PredefinedSplitOrBuilder getPredefinedSplitOrBuilder()
Supported only for tabular Datasets. Split based on a predefined key.
.google.cloud.aiplatform.v1.PredefinedSplit predefined_split = 4;
-
hasTimestampSplit
boolean hasTimestampSplit()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
- Returns:
- Whether the timestampSplit field is set.
-
getTimestampSplit
TimestampSplit getTimestampSplit()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
- Returns:
- The timestampSplit.
-
getTimestampSplitOrBuilder
TimestampSplitOrBuilder getTimestampSplitOrBuilder()
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.
.google.cloud.aiplatform.v1.TimestampSplit timestamp_split = 5;
-
hasStratifiedSplit
boolean hasStratifiedSplit()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
- Returns:
- Whether the stratifiedSplit field is set.
-
getStratifiedSplit
StratifiedSplit getStratifiedSplit()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
- Returns:
- The stratifiedSplit.
-
getStratifiedSplitOrBuilder
StratifiedSplitOrBuilder getStratifiedSplitOrBuilder()
Supported only for tabular Datasets. Split based on the distribution of the specified column.
.google.cloud.aiplatform.v1.StratifiedSplit stratified_split = 12;
-
hasGcsDestination
boolean hasGcsDestination()
The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: `dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>` where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory. The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl" * AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data * AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}" * AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}" * AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
- Returns:
- Whether the gcsDestination field is set.
-
getGcsDestination
GcsDestination getGcsDestination()
The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: `dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>` where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory. The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl" * AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data * AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}" * AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}" * AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
- Returns:
- The gcsDestination.
-
getGcsDestinationOrBuilder
GcsDestinationOrBuilder getGcsDestinationOrBuilder()
The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: `dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call>` where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory. The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl" * AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data * AIP_TRAINING_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/training-*.${AIP_DATA_FORMAT}" * AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/validation-*.${AIP_DATA_FORMAT}" * AIP_TEST_DATA_URI = "gcs_destination/dataset-<dataset-id>-<annotation-type>-<time>/test-*.${AIP_DATA_FORMAT}"
.google.cloud.aiplatform.v1.GcsDestination gcs_destination = 8;
-
hasBigqueryDestination
boolean hasBigqueryDestination()
Only applicable to custom training with tabular Dataset with BigQuery source. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name `dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>` where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, `training`, `validation` and `test`. * AIP_DATA_FORMAT = "bigquery". * AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.training" * AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.validation" * AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.test"
.google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
- Returns:
- Whether the bigqueryDestination field is set.
-
getBigqueryDestination
BigQueryDestination getBigqueryDestination()
Only applicable to custom training with tabular Dataset with BigQuery source. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name `dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>` where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, `training`, `validation` and `test`. * AIP_DATA_FORMAT = "bigquery". * AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.training" * AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.validation" * AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.test"
.google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
- Returns:
- The bigqueryDestination.
-
getBigqueryDestinationOrBuilder
BigQueryDestinationOrBuilder getBigqueryDestinationOrBuilder()
Only applicable to custom training with tabular Dataset with BigQuery source. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name `dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call>` where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, `training`, `validation` and `test`. * AIP_DATA_FORMAT = "bigquery". * AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.training" * AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.validation" * AIP_TEST_DATA_URI = "bigquery_destination.dataset_<dataset-id>_<annotation-type>_<time>.test"
.google.cloud.aiplatform.v1.BigQueryDestination bigquery_destination = 10;
-
getDatasetId
String getDatasetId()
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
- Returns:
- The datasetId.
-
getDatasetIdBytes
com.google.protobuf.ByteString getDatasetIdBytes()
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
string dataset_id = 1 [(.google.api.field_behavior) = REQUIRED];
- Returns:
- The bytes for datasetId.
-
getAnnotationsFilter
String getAnnotationsFilter()
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in [ListAnnotations][google.cloud.aiplatform.v1.DatasetService.ListAnnotations] may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
string annotations_filter = 6;
- Returns:
- The annotationsFilter.
-
getAnnotationsFilterBytes
com.google.protobuf.ByteString getAnnotationsFilterBytes()
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in [ListAnnotations][google.cloud.aiplatform.v1.DatasetService.ListAnnotations] may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
string annotations_filter = 6;
- Returns:
- The bytes for annotationsFilter.
-
getAnnotationSchemaUri
String getAnnotationSchemaUri()
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 [Schema Object](https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#schemaObject). The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with [metadata][google.cloud.aiplatform.v1.Dataset.metadata_schema_uri] of the Dataset specified by [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id]. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter], the Annotations used for training are filtered by both [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter] and [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri].
string annotation_schema_uri = 9;
- Returns:
- The annotationSchemaUri.
-
getAnnotationSchemaUriBytes
com.google.protobuf.ByteString getAnnotationSchemaUriBytes()
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 [Schema Object](https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.2.md#schemaObject). The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with [metadata][google.cloud.aiplatform.v1.Dataset.metadata_schema_uri] of the Dataset specified by [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id]. Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter], the Annotations used for training are filtered by both [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter] and [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri].
string annotation_schema_uri = 9;
- Returns:
- The bytes for annotationSchemaUri.
-
getSavedQueryId
String getSavedQueryId()
Only applicable to Datasets that have SavedQueries. The ID of a SavedQuery (annotation set) under the Dataset specified by [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id] used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter], the Annotations used for training are filtered by both [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id] and [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter]. Only one of [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id] and [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri] should be specified as both of them represent the same thing: problem type.
string saved_query_id = 7;
- Returns:
- The savedQueryId.
-
getSavedQueryIdBytes
com.google.protobuf.ByteString getSavedQueryIdBytes()
Only applicable to Datasets that have SavedQueries. The ID of a SavedQuery (annotation set) under the Dataset specified by [dataset_id][google.cloud.aiplatform.v1.InputDataConfig.dataset_id] used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter], the Annotations used for training are filtered by both [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id] and [annotations_filter][google.cloud.aiplatform.v1.InputDataConfig.annotations_filter]. Only one of [saved_query_id][google.cloud.aiplatform.v1.InputDataConfig.saved_query_id] and [annotation_schema_uri][google.cloud.aiplatform.v1.InputDataConfig.annotation_schema_uri] should be specified as both of them represent the same thing: problem type.
string saved_query_id = 7;
- Returns:
- The bytes for savedQueryId.
-
getPersistMlUseAssignment
boolean getPersistMlUseAssignment()
Whether to persist the ML use assignment to data item system labels.
bool persist_ml_use_assignment = 11;
- Returns:
- The persistMlUseAssignment.
-
getSplitCase
InputDataConfig.SplitCase getSplitCase()
-
getDestinationCase
InputDataConfig.DestinationCase getDestinationCase()
-
-