-
Notifications
You must be signed in to change notification settings - Fork 140
Description
Hello everyone!
First, I would like to thank you for the amazing work you've been doing.
I need your help! I'm a newbie on GCP and I'm trying to deploy the rag-stack falcon7b to it.
I'm getting an error on deploy-gcp.sh. Below is the trace I'm getting:
guilhermedomingues@cloudshell:~/rag-stack/scripts/gcp (llama-rag-test)$ sh deploy-gcp.sh
____ _____ __ __
/ __ ____ _____ / // /_ / /
/ // / __ / __ /_ / _/ __ `/ / ///
/ , / // / // // / // // / // ,<
// ||_,/_, ///_/_,/_//||
/____/
Enter your GCP project ID: llama-rag-test
(https://cloud.google.com/iam/docs/keys-create-delete#creating) Enter the path to your GCP service account key file: llama-rag-test-f40c5f7db02f.json
Enter the GCP region (default: us-west1): us-central1-c
Enter your Huggingface API Token: MY_HUGGING_API
Model to deploy (llama2-7b or falcon7b): falcon7b
Initializing the backend...
Initializing modules...
Initializing provider plugins...
- Reusing previous version of hashicorp/kubernetes from the dependency lock file
- Reusing previous version of hashicorp/google from the dependency lock file
- Using previously-installed hashicorp/kubernetes v2.23.0
- Using previously-installed hashicorp/google v4.51.0
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Success! The configuration is valid.
module.gke-cluster.google_container_cluster.gpu_cluster: Refreshing state... [id=projects/llama-rag-test/locations/us-central1-c/clusters/gpu-cluster]
module.gke-cluster.google_container_node_pool.primary_preemptible_nodes: Refreshing state... [id=projects/llama-rag-test/locations/us-central1-c/clusters/gpu-cluster/nodePools/gpu-node-pool]
data.google_client_config.default: Reading...
data.google_container_cluster.default: Reading...
data.google_client_config.default: Read complete after 0s [id=projects/llama-rag-test/regions/us-central1-c/zones/]
data.google_container_cluster.default: Read complete after 0s [id=projects/llama-rag-test/locations/us-central1-c/clusters/gpu-cluster]
kubernetes_service.falcon7b_service[0]: Refreshing state... [id=default/falcon7b-service]
kubernetes_deployment.falcon7b[0]: Refreshing state... [id=default/falcon7b]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
- create
Terraform will perform the following actions:
google_cloud_run_service.qdrant will be created
- resource "google_cloud_run_service" "qdrant" {
-
autogenerate_revision_name = false
-
id = (known after apply)
-
location = "us-central1-c"
-
name = "qdrant"
-
project = (known after apply)
-
status = (known after apply)
-
template {
- spec {
-
container_concurrency = (known after apply)
-
service_account_name = (known after apply)
-
serving_state = (known after apply)
-
timeout_seconds = (known after apply)
-
containers {
-
image = "qdrant/qdrant:v1.3.0"
-
ports {
- container_port = 6333
- name = (known after apply)
}
}
}
}
-
-
- spec {
-
traffic {
- latest_revision = true
- percent = 100
- url = (known after apply)
}
}
-
google_cloud_run_service.ragstack-server will be created
- resource "google_cloud_run_service" "ragstack-server" {
-
autogenerate_revision_name = false
-
id = (known after apply)
-
location = "us-central1-c"
-
name = "ragstack-server"
-
project = (known after apply)
-
status = (known after apply)
-
template {
- spec {
-
container_concurrency = (known after apply)
-
service_account_name = (known after apply)
-
serving_state = (known after apply)
-
timeout_seconds = (known after apply)
-
containers {
-
image = "jfan001/ragstack-server:latest"
-
env {
- name = "LLM_URL"
- value = "http://35.193.123.142"
}
-
env {
- name = "QDRANT_PORT"
- value = "443"
}
-
env {
- name = "QDRANT_URL"
- value = (known after apply)
}
-
resources {
- limits = {
- "memory" = "2Gi"
}
}
}
}
}
- "memory" = "2Gi"
- limits = {
-
-
- spec {
-
traffic {
- latest_revision = true
- percent = 100
- url = (known after apply)
}
}
-
google_cloud_run_service_iam_member.public will be created
- resource "google_cloud_run_service_iam_member" "public" {
- etag = (known after apply)
- id = (known after apply)
- location = "us-central1-c"
- member = "allUsers"
- project = (known after apply)
- role = "roles/run.invoker"
- service = "qdrant"
}
Plan: 3 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_cloud_run_service.qdrant: Creating...
╷
│ Error: Error creating Service: googleapi: got HTTP response code 404 with body:
│
│
│
│ <title>Error 404 (Not Found)!!1</title>
│ <style>
│ {margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px} > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
│ </style>
│
│
404. That’s an error.
│
The requested URL /apis/serving.knative.dev/v1/namespaces/llama-rag-test/services was not found on this server. That’s all we know.
│
│
│ with google_cloud_run_service.qdrant,
│ on main.tf line 195, in resource "google_cloud_run_service" "qdrant":
│ 195: resource "google_cloud_run_service" "qdrant" {
Can you help me on this?
Thank you again!
Have a nice weekend! :)