Error on GCP Deploy

Hello everyone!

First, I would like to thank you for the amazing work you've been doing.

I need your help! I'm a newbie on GCP and I'm trying to deploy the rag-stack falcon7b to it.

I'm getting an error on deploy-gcp.sh. Below is the trace I'm getting:

guilhermedomingues@cloudshell:~/rag-stack/scripts/gcp (llama-rag-test)$ sh deploy-gcp.sh
    ____              _____ __             __  
   / __ \____ _____ _/ ___// /_____ ______/ /__
  / /_/ / __ `/ __ `/\__ \/ __/ __ `/ ___/ //_/
 / _, _/ /_/ / /_/ /___/ / /_/ /_/ / /__/ ,<   
/_/ |_|\__,_/\__, //____/\__/\__,_/\___/_/|_|  
            /____/   
_______________________________________________

Enter your GCP project ID: llama-rag-test
(https://cloud.google.com/iam/docs/keys-create-delete#creating) Enter the path to your GCP service account key file: llama-rag-test-f40c5f7db02f.json
Enter the GCP region (default: us-west1): us-central1-c
Enter your Huggingface API Token: MY_HUGGING_API
Model to deploy (llama2-7b or falcon7b): falcon7b

Initializing the backend...
Initializing modules...

Initializing provider plugins...
- Reusing previous version of hashicorp/kubernetes from the dependency lock file
- Reusing previous version of hashicorp/google from the dependency lock file
- Using previously-installed hashicorp/kubernetes v2.23.0
- Using previously-installed hashicorp/google v4.51.0

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Success! The configuration is valid.

module.gke-cluster.google_container_cluster.gpu_cluster: Refreshing state... [id=projects/llama-rag-test/locations/us-central1-c/clusters/gpu-cluster]
module.gke-cluster.google_container_node_pool.primary_preemptible_nodes: Refreshing state... [id=projects/llama-rag-test/locations/us-central1-c/clusters/gpu-cluster/nodePools/gpu-node-pool]
data.google_client_config.default: Reading...
data.google_container_cluster.default: Reading...
data.google_client_config.default: Read complete after 0s [id=projects/llama-rag-test/regions/us-central1-c/zones/]
data.google_container_cluster.default: Read complete after 0s [id=projects/llama-rag-test/locations/us-central1-c/clusters/gpu-cluster]
kubernetes_service.falcon7b_service[0]: Refreshing state... [id=default/falcon7b-service]
kubernetes_deployment.falcon7b[0]: Refreshing state... [id=default/falcon7b]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_cloud_run_service.qdrant will be created
  + resource "google_cloud_run_service" "qdrant" {
      + autogenerate_revision_name = false
      + id                         = (known after apply)
      + location                   = "us-central1-c"
      + name                       = "qdrant"
      + project                    = (known after apply)
      + status                     = (known after apply)

      + template {
          + spec {
              + container_concurrency = (known after apply)
              + service_account_name  = (known after apply)
              + serving_state         = (known after apply)
              + timeout_seconds       = (known after apply)

              + containers {
                  + image = "qdrant/qdrant:v1.3.0"

                  + ports {
                      + container_port = 6333
                      + name           = (known after apply)
                    }
                }
            }
        }

      + traffic {
          + latest_revision = true
          + percent         = 100
          + url             = (known after apply)
        }
    }

  # google_cloud_run_service.ragstack-server will be created
  + resource "google_cloud_run_service" "ragstack-server" {
      + autogenerate_revision_name = false
      + id                         = (known after apply)
      + location                   = "us-central1-c"
      + name                       = "ragstack-server"
      + project                    = (known after apply)
      + status                     = (known after apply)

      + template {
          + spec {
              + container_concurrency = (known after apply)
              + service_account_name  = (known after apply)
              + serving_state         = (known after apply)
              + timeout_seconds       = (known after apply)

              + containers {
                  + image = "jfan001/ragstack-server:latest"

                  + env {
                      + name  = "LLM_URL"
                      + value = "http://35.193.123.142"
                    }
                  + env {
                      + name  = "QDRANT_PORT"
                      + value = "443"
                    }
                  + env {
                      + name  = "QDRANT_URL"
                      + value = (known after apply)
                    }

                  + resources {
                      + limits = {
                          + "memory" = "2Gi"
                        }
                    }
                }
            }
        }

      + traffic {
          + latest_revision = true
          + percent         = 100
          + url             = (known after apply)
        }
    }

  # google_cloud_run_service_iam_member.public will be created
  + resource "google_cloud_run_service_iam_member" "public" {
      + etag     = (known after apply)
      + id       = (known after apply)
      + location = "us-central1-c"
      + member   = "allUsers"
      + project  = (known after apply)
      + role     = "roles/run.invoker"
      + service  = "qdrant"
    }

Plan: 3 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_cloud_run_service.qdrant: Creating...
╷
│ Error: Error creating Service: googleapi: got HTTP response code 404 with body: <!DOCTYPE html>
│ <html lang=en>
│   <meta charset=utf-8>
│   <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
│   <title>Error 404 (Not Found)!!1</title>
│   <style>
│     *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
│   </style>
│   <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
│   <p><b>404.</b> <ins>That’s an error.</ins>
│   <p>The requested URL <code>/apis/serving.knative.dev/v1/namespaces/llama-rag-test/services</code> was not found on this server.  <ins>That’s all we know.</ins>
│ 
│ 
│   with google_cloud_run_service.qdrant,
│   on main.tf line 195, in resource "google_cloud_run_service" "qdrant":
│  195: resource "google_cloud_run_service" "qdrant" {

Can you help me on this?

Thank you again!

Have a nice weekend! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on GCP Deploy #40

google_cloud_run_service.qdrant will be created

google_cloud_run_service.ragstack-server will be created

google_cloud_run_service_iam_member.public will be created

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error on GCP Deploy #40

Description

google_cloud_run_service.qdrant will be created

google_cloud_run_service.ragstack-server will be created

google_cloud_run_service_iam_member.public will be created

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions