Skip to content

feat:Add hadoop catalog mode(include s3,s3a(minio) and hdfs ) #1313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

awol2005ex
Copy link

feat:Add hadoop catalog mode(include s3,s3a(minio) and hdfs )

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @awol2005ex thanks for the contribution. The community direction is to deprecate pure file system catalog(like hadoop catalog). If you need to query iceberg table on file systems without relying on a catalog, you could use StaticTable.

@awol2005ex
Copy link
Author

Hi, @awol2005ex thanks for the contribution. The community direction is to deprecate pure file system catalog(like hadoop catalog). If you need to query iceberg table on file systems without relying on a catalog, you could use StaticTable.

I have many Iceberg tables on Hadoop HDFS catalog, and I don't want to migrate them. StaticTable is very difficult to use and lacks practical utility.

@liurenjie1024
Copy link
Contributor

Hi, @awol2005ex thanks for the contribution. The community direction is to deprecate pure file system catalog(like hadoop catalog). If you need to query iceberg table on file systems without relying on a catalog, you could use StaticTable.

I have many Iceberg tables on Hadoop HDFS catalog, and I don't want to migrate them. StaticTable is very difficult to use and lacks practical utility.

Hi, @awol2005ex The Catalog trait if an open trait, so you can maintain customized implementation outside of community if you need one. But the community will deprecate file system base catalog.

@awol2005ex
Copy link
Author

Hi, @awol2005ex thanks for the contribution. The community direction is to deprecate pure file system catalog(like hadoop catalog). If you need to query iceberg table on file systems without relying on a catalog, you could use StaticTable.

I have many Iceberg tables on Hadoop HDFS catalog, and I don't want to migrate them. StaticTable is very difficult to use and lacks practical utility.

Hi, @awol2005ex The Catalog trait if an open trait, so you can maintain customized implementation outside of community if you need one. But the community will deprecate file system base catalog.

My main goal is to query Iceberg tables using iceberg-python, but this library depends on iceberg-rust. However, since iceberg-rust doesn't support HDFS or Hadoop catalog, it's essentially unusable for my needs.

@liurenjie1024
Copy link
Contributor

Hi, @awol2005ex thanks for the contribution. The community direction is to deprecate pure file system catalog(like hadoop catalog). If you need to query iceberg table on file systems without relying on a catalog, you could use StaticTable.

I have many Iceberg tables on Hadoop HDFS catalog, and I don't want to migrate them. StaticTable is very difficult to use and lacks practical utility.

Hi, @awol2005ex The Catalog trait if an open trait, so you can maintain customized implementation outside of community if you need one. But the community will deprecate file system base catalog.

My main goal is to query Iceberg tables using iceberg-python, but this library depends on iceberg-rust. However, since iceberg-rust doesn't support HDFS or Hadoop catalog, it's essentially unusable for my needs.

I don't remember iceberg-python has migrated all its catalog implementation to iceberg-rust? cc @Fokko @kevinjqliu

@awol2005ex
Copy link
Author

Hi, @awol2005ex thanks for the contribution. The community direction is to deprecate pure file system catalog(like hadoop catalog). If you need to query iceberg table on file systems without relying on a catalog, you could use StaticTable.

I have many Iceberg tables on Hadoop HDFS catalog, and I don't want to migrate them. StaticTable is very difficult to use and lacks practical utility.

Hi, @awol2005ex The Catalog trait if an open trait, so you can maintain customized implementation outside of community if you need one. But the community will deprecate file system base catalog.

My main goal is to query Iceberg tables using iceberg-python, but this library depends on iceberg-rust. However, since iceberg-rust doesn't support HDFS or Hadoop catalog, it's essentially unusable for my needs.

I don't remember iceberg-python has migrated all its catalog implementation to iceberg-rust? cc @Fokko @kevinjqliu

iceberg-python is just a wrapper for pyiceberg-core, and pyiceberg-core is located within the iceberg-rust repository

@Xuanwo
Copy link
Member

Xuanwo commented May 13, 2025

iceberg-python is just a wrapper for pyiceberg-core, and pyiceberg-core is located within the iceberg-rust repository

Hi, this understanding is incorrect.

While pyiceberg did attempt to reuse some iceberg-rust code, everything is entirely optional. As of now, pyiceberg's catalog implementations are written purely in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants