- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.4k
Description
Our development and test environments use MinIO as an S3-compatible object store for Iceberg. Since our production environment uses GCS, this requires us to install heavy Amazon S3 libraries for the Hive Metastore (HMS) in our development setup. This adds unnecessary dependencies, increases build size, and creates a divergence between our development and production environments.
We could replace MinIO with a lightweight, GCS-compatible emulator to better align our development environment with production. The recommended solution is fsouza/fake-gcs-server, which can be easily integrated into our existing containerized setup.
The new Trino native GCS file system (fs.native-gcs.enabled=true) does not provide a mechanism to disable authentication, making it impossible to use with local, unauthenticated GCS emulators like fake-gcs-server.
Even when gcs.endpoint is configured to point to a local emulator, the underlying Google Cloud client library still attempts to perform a real OAuth2 token exchange with the public Google endpoint (https://oauth2.googleapis.com/token). This fails because a dummy credential file must be provided (to avoid an "ADC not found" error), but this dummy credential is invalid for the real authentication service.
There should be a way to use the recommended native client for local testing without it attempting to contact external authentication services.
Environment
- Trino Version: latest
- Connector: Iceberg (iceberg.catalog.type=hive_metastore)
- Storage Emulator: fsouza/fake-gcs-server
- Metastore: Hive Metastore 4.1
- Setup: Docker Compose
Configuration
- 
docker-compose.ymltemplate:services: gcs-emulator: image: fsouza/fake-gcs-server container_name: gcs-emulator ports: - "4443:4443" command: -scheme http -public-host gcs-emulator:4443 mysql-db: image: mysql:8.0 container_name: mysql-db environment: - MYSQL_ROOT_PASSWORD=secret - MYSQL_DATABASE=metastore - MYSQL_USER=hive - MYSQL_PASSWORD=hive hive-metastore: image: apache/hive:4.0.0 container_name: hive-metastore depends_on: [mysql-db] ports: ["9083:9083"] volumes: - ./hive/conf/metastore-site.xml:/opt/hive/conf/metastore-site.xml trino: image: trinodb/trino:latest container_name: trino ports: ["8080:8080"] volumes: - ./trino/catalog:/etc/trino/catalog 
- 
metastore-site.xml:<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://mysql-db:3306/metastore</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> </property> <property> <name>fs.gs.project.id</name> <value>dummy</value> </property> <property> <name>fs.gs.auth.type</name> <value>NONE</value> <!-- or UNATHENTICATED, see https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v3.1.8/gcs/CONFIGURATION.md#user-credentials --> </property> <property> <name>google.cloud.storage.api.endpoint</name> <value>http://gcs-emulator:4443</value> </property> </configuration> 
- 
iceberg.properties(the problematic configuration):connector.name=iceberg iceberg.catalog.type=hive_metastore hive.metastore.uri=thrift://hive-metastore:9083 # Configuration for Trino's Native GCS Client, see: https://trino.io/docs/current/object-storage/file-system-gcs.html#general-configuration fs.native-gcs.enabled=true gcs.endpoint=http://gcs-emulator:4443 # A dummy key file is required to prevent "ADC not found" errors, # but this key triggers a real auth attempt. # The content of the file is a syntactically valid but fake key. gcs.json-key-file-path=/etc/trino/dummy-credentials.json (Note: The user will need to create and mount a dummy credentials file for this step, but the key is that even with it, the process fails). # Generate the key openssl genrsa -out private_key.pem 2048 openssl pkcs8 -topk8 -inform PEM -outform PEM -nocrypt -in private_key.pem # Then copy the output and format for JSON # Replace each newline with \n in the JSON file 
- 
Start the services and prepare the emulator: # Start all containers docker-compose up -d # Create a bucket in the emulator curl -X POST -H "Content-Type: application/json" \ --data '{"name": "test-bucket"}' \ http://localhost:4443/storage/v1/b 
- 
Run the failing SQL command via the Trino CLI or any client: CREATE SCHEMA iceberg.test_schema WITH ( location = 'gs://test-bucket/test_schema' ); 
Actual Result
The query fails with an error indicating an authentication failure with the real Google OAuth2 endpoint.
Query failed: GCS service error listing files: gs://test-bucket/test_schema.db/
...
Caused by: com.google.cloud.storage.StorageException: Error getting access token for service account: 400 Bad Request
POST https://oauth2.googleapis.com/token
{"error":"invalid_grant","error_description":"Invalid grant: account not found"}
...
Analysis and Suggested Solution
The root cause is that the native GCS client's authentication logic is tightly coupled with the official Google authentication libraries, which do not recognize the gcs.endpoint property for authentication calls. The client sees a service account key and immediately attempts to exchange it for a real token.
A new configuration property is needed to explicitly disable this behavior for local testing. I suggest adding a property like gcs.authentication-type with a possible value of NONE.
Refs:
- https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v3.1.8/gcs/CONFIGURATION.md#authentication
- https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v3.1.8/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFileSystemConfiguration.java
- https://github.com/trinodb/trino/blob/master/lib/trino-filesystem-gcs/src/main/java/io/trino/filesystem/gcs/GcsFileSystemModule.java