Skip to content

create_training_job fails in region eu-central-1 with Invalid DNS suffix 'amazonaws.com' for region 'us-east-1' in training image #9

@peter-vandenabeele-axa

Description

@peter-vandenabeele-axa

When following the tutorial for the built-in model and deploying in eu-central-1 (Frankfurt), the lambda function in
/aws/lambda/MLOps-BIA-TrainModel-pva fails with:

...
[INFO]Container Path 811284229777.dkr.ecr.us-east-1.amazonaws.com/xgboost:1
...
An error occurred (ValidationException) when calling the CreateTrainingJob operation: Invalid DNS suffix 'amazonaws.com' for region 'us-east-1' in training image. Please provide the valid <region>.<dns-suffix>: 'eu-central-1.amazonaws.com'

I presume this was caused by an incorrect value supplied for the environment variable
ecr_path = os.environ['AlgoECR']

at line
https://github.com/aws-samples/amazon-sagemaker-devops-with-ml/blob/abac90b15b438f00c0deab4470cf162410c5d600/1-Built-In-Algorithm/lambda-code/MLOps-BIA-TrainModel.py#L70

As a proof of this, when I forced the value of ecr_path to be the correct path for eu-central-1, with the code below (adapted in the lambda function), it works:

        #Get ECR information for BIA
        algo_version = user_param['Algorithm']
        
        #ecr_path = os.environ['AlgoECR']
        # HARD CODE OVERRIDE by peter_v
        ecr_path = '813361260812.dkr.ecr.eu-central-1.amazonaws.com'
        
        container_path = ecr_path + '/' + algo_version
        print('[INFO]Container Path', container_path)

I got that specific value from this page

https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html

for the XGBoost algorithm in eu-central-1.

Maybe there is a way to set the environment variable AlgoECR value correctly, but I did not see that immediately in the tutorial README ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions