Suboptimal Vertex Model Type

CER:

CER-0265

Service Category

Cloud Provider

GCP

Service Name

GCP Vertex AI

Inefficiency Type

Outdated Model Selection

Explanation

Vertex AI model families evolve rapidly. New model versions (e.g., transitions within the Gemini family) frequently introduce improvements in efficiency, quality, and capability. When workloads continue using older, legacy, or deprecated models, they may consume more tokens, produce lower-quality results, or experience higher latency than necessary. Because generative workloads often scale quickly, even small efficiency gaps between generations can materially increase token consumption and cost. Teams that do not actively track model updates, or that set model types once and never revisit them, often miss opportunities to improve performance-per-dollar by upgrading to the most current supported model.

Relevant Billing Model

Vertex AI Generative AI usage is billed per input and output token. While newer model versions may have similar per-token pricing, they often deliver more accurate outputs, require fewer tokens to achieve the same results, and provide better latency and throughput. Continuing to run older models can increase overall cost and degrade output quality.

Detection

Review Vertex AI deployments using older or deprecated model versions
Assess token usage patterns to determine whether newer models deliver comparable results with fewer tokens
Evaluate latency or accuracy concerns that may stem from older model behavior
Check Vertex AI model lifecycle updates to confirm whether a more efficient successor model is available

Remediation

Migrate workloads to the latest suitable model version offering improved efficiency and performance
Establish periodic review processes to ensure deployed models stay aligned with current Vertex AI offerings
Incorporate model lifecycle awareness into architecture and deployment standards
Validate accuracy and compatibility after upgrading to newer model versions to confirm expected benefits

Relevant Documentation

Submit Feedback