Skip to content

Commit 130ebf6

Browse files
authored
MAJOR: DEV-15132 integrate with config module (#19)
1 parent 9fc86cd commit 130ebf6

File tree

9 files changed

+70
-438
lines changed

9 files changed

+70
-438
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# GCP Tamr VM module
22

3+
## v2.0.0 - July 18th 2022
4+
* Remove overlap with config module
5+
36
## v1.0.0 - June 1st 2022
47
* Set minimum terraform to 1.0.0 and minimum google provider to 4.6.0
58

README.md

Lines changed: 2 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -64,77 +64,33 @@ This modules creates:
6464

6565
| Name | Description | Type | Default | Required |
6666
|------|-------------|------|---------|:--------:|
67-
| tamr\_bigtable\_cluster\_id | Bigtable cluster ID | `string` | n/a | yes |
68-
| tamr\_bigtable\_instance\_id | Bigtable instance ID | `string` | n/a | yes |
69-
| tamr\_bigtable\_max\_nodes | Max number of nodes to scale up to | `string` | n/a | yes |
70-
| tamr\_bigtable\_min\_nodes | Min number of nodes to scale down to | `string` | n/a | yes |
71-
| tamr\_cloud\_sql\_location | location for cloud sql instance. NOTE: this is either a region or a zone. | `string` | n/a | yes |
72-
| tamr\_cloud\_sql\_name | name of cloud sql instance | `string` | n/a | yes |
73-
| tamr\_dataproc\_bucket | GCS bucket to use for the tamr dataproc cluster | `string` | n/a | yes |
74-
| tamr\_dataproc\_region | Region the dataproc uses. | `string` | n/a | yes |
67+
| tamr\_config\_file | full tamr config file | `string` | n/a | yes |
7568
| tamr\_filesystem\_bucket | GCS bucket to use for the tamr default file system | `string` | n/a | yes |
7669
| tamr\_instance\_image | Image to use for boot disk | `string` | n/a | yes |
7770
| tamr\_instance\_project | The project to launch the tamr VM instance in. | `string` | n/a | yes |
7871
| tamr\_instance\_service\_account | email of service account to attach to the tamr instance | `string` | n/a | yes |
7972
| tamr\_instance\_subnet | subnetwork to attach instance too | `string` | n/a | yes |
8073
| tamr\_instance\_zone | zone to deploy tamr vm | `string` | n/a | yes |
81-
| tamr\_sql\_password | password for the cloud sql user | `string` | n/a | yes |
8274
| tamr\_zip\_uri | gcs location to download tamr zip from | `string` | n/a | yes |
8375
| labels | labels to attach to created resources | `map(string)` | `{}` | no |
84-
| tamr\_bigtable\_project\_id | The google project that the bigtable instance lives in. If not set will use the tamr\_instance\_project as the default value. | `string` | `""` | no |
85-
| tamr\_cloud\_sql\_project | project containing cloudsql instance. If not set will use the tamr\_instance\_project as the default value. | `string` | `""` | no |
86-
| tamr\_config | Override generated tamr configuration. The tamr configuration is specified using a yaml file, in the format that is documented (https://docs.tamr.com/previous/docs/configuration-configuring-unify#section-setting-configuration-variables) for configuring “many variables” at once. | `string` | `""` | no |
87-
| tamr\_dataproc\_cluster\_config | If you do not want to use the default dataproc configuration template, pass in a complete dataproc configuration file to variable.<br>If you are passing in a dataproc configure it should not be left padded, we will handle that inside of our template. It is expected to<br>a yaml document of a dataproc cluster config<br>Refrence spec is https://cloud.google.com/dataproc/docs/reference/rest/v1/ClusterConfig | `string` | `""` | no |
88-
| tamr\_dataproc\_cluster\_enable\_stackdriver\_logging | Enabled stackdriver logging on dataproc clusters. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `bool` | `true` | no |
89-
| tamr\_dataproc\_cluster\_master\_disk\_size | Size of disk to use on dataproc master disk This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `number` | `1000` | no |
90-
| tamr\_dataproc\_cluster\_master\_instance\_type | Instance type to use as dataproc master This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `string` | `"n1-highmem-4"` | no |
91-
| tamr\_dataproc\_cluster\_service\_account | Service account to attach to dataproc workers. If not set will use the tamr\_instance\_service\_account as the default value. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `string` | `""` | no |
92-
| tamr\_dataproc\_cluster\_subnetwork\_uri | Subnetwork URI for dataproc to use. If not set will use the tamr\_instance\_subnet as the default value. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `string` | `""` | no |
93-
| tamr\_dataproc\_cluster\_version | Version of dataproc to use. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `string` | `"1.4"` | no |
94-
| tamr\_dataproc\_cluster\_worker\_machine\_type | machine type of default worker pool. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `string` | `"n1-standard-16"` | no |
95-
| tamr\_dataproc\_cluster\_worker\_num\_instances | Number of default workers to use. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `number` | `4` | no |
96-
| tamr\_dataproc\_cluster\_worker\_num\_local\_ssds | Number of localssds to attach to each worker node. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `number` | `2` | no |
97-
| tamr\_dataproc\_cluster\_worker\_preemptible\_machine\_type | machine type of preemptible worker pool. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `string` | `"n1-standard-16"` | no |
98-
| tamr\_dataproc\_cluster\_worker\_preemptible\_num\_instances | Number of preemptible workers to use. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `number` | `0` | no |
99-
| tamr\_dataproc\_cluster\_worker\_preemptible\_num\_local\_ssds | Number of localssds to attach to each preemptible worker node. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `number` | `2` | no |
100-
| tamr\_dataproc\_cluster\_zone | Zone to launch dataproc cluster into. If not set will use the tamr\_instance\_zone as the default value. This only used if using the built in tamr\_dataproc\_cluster\_config configuration | `string` | `""` | no |
101-
| tamr\_dataproc\_project\_id | Project for the dataproc cluster. If not set will use the tamr\_instance\_project as the default value. | `string` | `""` | no |
102-
| tamr\_es\_apihost | The hostname and port of the REST API endpoint of the Elasticsearch cluster to use. If unset will use < ip of vm>:9200 | `string` | `""` | no |
103-
| tamr\_es\_enabled | Whether Tamr will index user data in Elasticsearch or not. Elasticsearch is used to power Tamr's interactive data UI, so when this is set to false Tamr will run 'headless,' that is, without its core UI capabilities. It can be useful to disable Elasticsearch in production settings where the models are trained on a separate instance and the goal is to maximize pipeline throughput. | `bool` | `true` | no |
104-
| tamr\_es\_number\_of\_shards | The number of shards to set when creating the Tamr index in Elasticsearch. Default value is the number of cores on the local host machine, so this should be overridden when using a remote Elasticsearch cluster. Note: this value is only applied when the index is created. | `number` | `1` | no |
105-
| tamr\_es\_password | Password to use to authenticate to Elasticsearch, using basic authentication. Not required unless the Elasticsearch cluster you're using has security and authentication enabled. The value passed in may be encrypted. | `string` | `""` | no |
106-
| tamr\_es\_socket\_timeout | Defines the socket timeout for Elasticsearch clients, in milliseconds. This is the timeout for waiting for data or, put differently, a maximum period of inactivity between two consecutive data packets. A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined (system default). The default value is 900000, i.e., fifteen minutes. | `number` | `900000` | no |
107-
| tamr\_es\_ssl\_enabled | Whether to connect to Elasticsearch over https or not. Default is false (http). | `bool` | `false` | no |
108-
| tamr\_es\_user | Username to use to authenticate to Elasticsearch. Not required unless the Elasticsearch cluster you're using has security and authentication enabled. | `string` | `""` | no |
10976
| tamr\_external\_ip | Create and attach an external ip to tamr VM | `bool` | `false` | no |
110-
| tamr\_hbase\_namespace | HBase namespace to user, for bigtable this will be the table prefix. | `string` | `"ns0"` | no |
11177
| tamr\_instance\_deletion\_protection | Enabled deletion protection for the tamr VM | `bool` | `true` | no |
11278
| tamr\_instance\_disk\_size | size of the boot disk | `number` | `100` | no |
11379
| tamr\_instance\_disk\_type | boot disk type | `string` | `"pd-ssd"` | no |
11480
| tamr\_instance\_install\_directory | directory to install tamr into | `string` | `"/data/tamr"` | no |
11581
| tamr\_instance\_machine\_type | machine type to use for tamr vm | `string` | `"n1-highmem-8"` | no |
11682
| tamr\_instance\_name | Name of the VM running tamr | `string` | `"tamr"` | no |
11783
| tamr\_instance\_tags | list of network tags to attach to instance | `list(string)` | `[]` | no |
118-
| tamr\_json\_logging | Toggle json formatting for tamr logs. | `bool` | `false` | no |
119-
| tamr\_license\_key | Set a tamr license key | `string` | `""` | no |
120-
| tamr\_spark\_driver\_memory | Amount of memory spark should allocate to spark driver | `string` | `"12G"` | no |
121-
| tamr\_spark\_executor\_cores | Amount of cores spark should allocate to each spark executor | `number` | `5` | no |
122-
| tamr\_spark\_executor\_instances | number of spark executor instances | `number` | `12` | no |
123-
| tamr\_spark\_executor\_memory | Amount of memory spark should allocate to each spark executor | `string` | `"13G"` | no |
124-
| tamr\_spark\_properties\_override | json blob of spark properties to override, if not set will use a default set of properties that should work for most use cases | `string` | `""` | no |
125-
| tamr\_sql\_user | username for the cloud sql user | `string` | `"tamr"` | no |
12684

12785
## Outputs
12886

12987
| Name | Description |
13088
|------|-------------|
131-
| tamr\_config\_file | full tamr config file |
13289
| tamr\_instance\_internal\_ip | internal ip of tamr vm |
13390
| tamr\_instance\_name | name of the tamr vm |
13491
| tamr\_instance\_self\_link | full self link of created tamr vm |
13592
| tamr\_instance\_zone | zone of the tamr vm |
136-
| tmpl\_dataproc\_config | dataproc config |
137-
| tmpl\_statup\_script | rendered metadata startup script |
93+
| tmpl\_startup\_script | rendered metadata startup script |
13894

13995
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
14096

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.1
1+
2.0.0

examples/basic/config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# currently single node
2+
TAMR_LICENSE_KEY: placeholder

examples/basic/main.tf

Lines changed: 16 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,23 @@
1-
locals {
2-
gcp_project = "tamr-deployment"
3-
default_region = "us-east1"
4-
default_zone = "us-east1-b"
1+
data "google_compute_subnetwork" "project_subnet" {
2+
name = var.subnet_name
3+
region = var.region_id
4+
project = "tamr-networking"
55
}
66

77
module "sample" {
88
source = "../../"
9-
# bigtable config
10-
tamr_bigtable_project_id = local.gcp_project
11-
tamr_bigtable_instance_id = "tamr-bigtable-instance"
12-
tamr_bigtable_cluster_id = "TAMR_BIGTABLE_CLUSTER_ID"
13-
tamr_bigtable_min_nodes = 1
14-
tamr_bigtable_max_nodes = 10
15-
# dataproc
16-
tamr_dataproc_project_id = local.gcp_project
17-
tamr_dataproc_bucket = "tamr_dataproc_home"
18-
tamr_dataproc_region = local.default_region
19-
# dataproc_cluster_config
20-
tamr_dataproc_cluster_subnetwork_uri = "projects/${local.gcp_project}/regions/${local.default_region}/subnetworks/default"
21-
tamr_dataproc_cluster_service_account = "tamr-instance@${local.gcp_project}.iam.gserviceaccount.com"
22-
tamr_dataproc_cluster_zone = local.default_zone
23-
# cloud sql
24-
tamr_cloud_sql_project = local.gcp_project
25-
tamr_cloud_sql_location = local.default_region
26-
tamr_cloud_sql_name = "tamr-db"
27-
tamr_sql_user = "tamr"
28-
tamr_sql_password = "super_secure_password" # tfsec:ignore:GEN003
9+
10+
tamr_instance_project = var.project_id
11+
tamr_instance_name = var.instance_id
12+
tamr_instance_zone = var.zone_id
13+
tamr_instance_image = var.instance_image
14+
tamr_instance_subnet = replace(data.google_compute_subnetwork.project_subnet.self_link, "https://www.googleapis.com/compute/v1/", "")
15+
tamr_zip_uri = var.zip_url
16+
tamr_instance_disk_size = 600
17+
tamr_instance_service_account = var.service_account
2918
# filesystem
30-
tamr_filesystem_bucket = "tamr_application_home"
31-
}
19+
tamr_filesystem_bucket = var.filesystem_bucket
3220

33-
output "tamr_config" {
34-
value = module.sample.tamr_config_file
21+
tamr_config_file = file("config.yaml")
3522
}
23+

examples/basic/variables.tf

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
variable "project_id" {
2+
type = string
3+
}
4+
5+
variable "instance_id" {
6+
default = "tamr-vm-example"
7+
type = string
8+
}
9+
10+
variable "region_id" {
11+
default = "us-east1"
12+
type = string
13+
}
14+
15+
variable "zone_id" {
16+
default = "us-east1-b"
17+
type = string
18+
}
19+
20+
variable "service_account" {
21+
default = ""
22+
type = string
23+
}
24+
25+
variable "filesystem_bucket" {
26+
type = string
27+
}
28+
29+
variable "instance_image" {
30+
default = "tamr-private-images/bionic-base-1644877703"
31+
type = string
32+
}
33+
34+
variable "zip_url" {
35+
default = "gs://tamr-releases/v2022.005.0/unify.zip"
36+
type = string
37+
}
38+
39+
variable "subnet_name" {
40+
type = string
41+
}

main.tf

Lines changed: 2 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,9 @@
11
locals {
2-
tamr_dataproc_cluster_zone = var.tamr_dataproc_cluster_zone == "" ? var.tamr_instance_zone : var.tamr_dataproc_cluster_zone
3-
tamr_dataproc_cluster_subnetwork_uri = var.tamr_dataproc_cluster_subnetwork_uri == "" ? var.tamr_instance_subnet : var.tamr_dataproc_cluster_subnetwork_uri
4-
tamr_dataproc_cluster_service_account = var.tamr_dataproc_cluster_service_account == "" ? var.tamr_instance_service_account : var.tamr_dataproc_cluster_service_account
5-
6-
tamr_es_apihost = var.tamr_es_apihost == "" ? "${google_compute_address.tamr_ip.address}:9200" : var.tamr_es_apihost
7-
remote_es = var.tamr_es_apihost == "" ? false : true
8-
9-
tamr_bigtable_project_id = var.tamr_bigtable_project_id == "" ? var.tamr_instance_project : var.tamr_bigtable_project_id
10-
tamr_cloud_sql_project = var.tamr_cloud_sql_project == "" ? var.tamr_instance_project : var.tamr_cloud_sql_project
11-
tamr_dataproc_project_id = var.tamr_dataproc_project_id == "" ? var.tamr_instance_project : var.tamr_dataproc_project_id
12-
13-
dataproc_config = var.tamr_dataproc_cluster_config == "" ? local.default_dataproc : var.tamr_dataproc_cluster_config
14-
tamr_config = var.tamr_config == "" ? local.default_tamr_config : var.tamr_config
15-
external_ip = var.tamr_external_ip == true ? 1 : 0
16-
spark_properties = var.tamr_spark_properties_override == "" ? file("${path.module}/files/spark_properties.json") : var.tamr_spark_properties_override
17-
18-
default_dataproc = templatefile("${path.module}/templates/dataproc.yaml.tmpl", {
19-
subnetwork_uri = local.tamr_dataproc_cluster_subnetwork_uri
20-
service_account = local.tamr_dataproc_cluster_service_account
21-
zone = local.tamr_dataproc_cluster_zone
22-
region = var.tamr_dataproc_region
23-
stackdriver_logging = var.tamr_dataproc_cluster_enable_stackdriver_logging
24-
version = var.tamr_dataproc_cluster_version
25-
tamr_dataproc_bucket = var.tamr_dataproc_bucket
26-
27-
master_instance_type = var.tamr_dataproc_cluster_master_instance_type
28-
master_disk_size = var.tamr_dataproc_cluster_master_disk_size
29-
30-
worker_machine_type = var.tamr_dataproc_cluster_worker_machine_type
31-
worker_num_instances = var.tamr_dataproc_cluster_worker_num_instances
32-
worker_num_local_ssds = var.tamr_dataproc_cluster_worker_num_local_ssds
33-
34-
worker_preemptible_machine_type = var.tamr_dataproc_cluster_worker_preemptible_machine_type
35-
worker_preemptible_num_instances = var.tamr_dataproc_cluster_worker_preemptible_num_instances
36-
worker_preemptible_num_local_ssds = var.tamr_dataproc_cluster_worker_preemptible_num_local_ssds
37-
})
38-
39-
40-
default_tamr_config = templatefile("${path.module}/templates/tamr_config.yaml.tmpl", {
41-
# Bigtable
42-
tamr_hbase_namespace = var.tamr_hbase_namespace
43-
tamr_bigtable_project_id = local.tamr_bigtable_project_id
44-
tamr_bigtable_instance_id = var.tamr_bigtable_instance_id
45-
tamr_bigtable_cluster_id = var.tamr_bigtable_cluster_id
46-
tamr_bigtable_min_nodes = var.tamr_bigtable_min_nodes
47-
tamr_bigtable_max_nodes = var.tamr_bigtable_max_nodes
48-
# dataproc
49-
tamr_dataproc_project_id = local.tamr_dataproc_project_id
50-
tamr_dataproc_region = var.tamr_dataproc_region
51-
# NOTE: indent does not indent the first line of a variable, so we prefix it
52-
# with a new file
53-
tamr_dataproc_cluster_config = indent(2, "\n${local.dataproc_config}")
54-
tamr_dataproc_bucket = var.tamr_dataproc_bucket
55-
# spark
56-
tamr_spark_driver_memory = var.tamr_spark_driver_memory
57-
tamr_spark_executor_memory = var.tamr_spark_executor_memory
58-
tamr_spark_executor_cores = var.tamr_spark_executor_cores
59-
tamr_spark_executor_instances = var.tamr_spark_executor_instances
60-
# ditto, comment about indent() above
61-
tamr_spark_properties_override = indent(4, "\n${local.spark_properties}")
62-
# sql
63-
tamr_cloud_sql_project = local.tamr_cloud_sql_project
64-
tamr_cloud_sql_location = var.tamr_cloud_sql_location
65-
tamr_cloud_sql_name = var.tamr_cloud_sql_name
66-
tamr_sql_user = var.tamr_sql_user
67-
tamr_sql_password = var.tamr_sql_password
68-
# elastic
69-
remote_es = local.remote_es
70-
tamr_es_enabled = var.tamr_es_enabled
71-
tamr_es_apihost = local.tamr_es_apihost
72-
tamr_es_user = var.tamr_es_user
73-
tamr_es_password = var.tamr_es_password
74-
tamr_es_ssl_enabled = var.tamr_es_ssl_enabled
75-
tamr_es_number_of_shards = var.tamr_es_number_of_shards
76-
tamr_es_socket_timeout = var.tamr_es_socket_timeout
77-
# file system
78-
tamr_filesystem_bucket = var.tamr_filesystem_bucket
79-
# miscellaneous
80-
tamr_license_key = var.tamr_license_key
81-
tamr_json_logging = var.tamr_json_logging
82-
})
2+
external_ip = var.tamr_external_ip == true ? 1 : 0
833

844
startup_script = templatefile("${path.module}/templates/startup_script.sh.tmpl", {
855
tamr_zip_uri = var.tamr_zip_uri
86-
tamr_config = local.tamr_config
6+
tamr_config = var.tamr_config_file
877
tamr_home_directory = var.tamr_instance_install_directory
888
})
899

outputs.tf

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ output "tamr_instance_self_link" {
55
}
66

77
output "tamr_instance_internal_ip" {
8-
value = google_compute_instance.tamr.network_interface.0.network_ip
8+
value = google_compute_address.tamr_ip.address
99
description = "internal ip of tamr vm"
1010
}
1111

@@ -19,19 +19,7 @@ output "tamr_instance_zone" {
1919
description = "zone of the tamr vm"
2020
}
2121

22-
# config files
23-
# NOTE: these are very useful for debugging
24-
output "tamr_config_file" {
25-
value = local.tamr_config
26-
description = "full tamr config file"
27-
}
28-
29-
output "tmpl_dataproc_config" {
30-
value = local.default_dataproc
31-
description = "dataproc config"
32-
}
33-
34-
output "tmpl_statup_script" {
22+
output "tmpl_startup_script" {
3523
value = local.startup_script
3624
description = "rendered metadata startup script"
3725
}

0 commit comments

Comments
 (0)