Skip to content

Commit 38429e7

Browse files
committed
minor improvements to the readme
1 parent 7741854 commit 38429e7

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

README.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,21 @@
1-
# pyspark_huggingface
1+
<p align="center">
2+
<img alt="Hugging Face x Spark" src="https://pbs.twimg.com/media/FvN1b_2XwAAWI1H?format=jpg&name=large" width="352" style="max-width: 100%;">
3+
<br/>
4+
<br/>
5+
</p>
26

37
<p align="center">
48
<a href="https://github.com/huggingface/pyspark_huggingface/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/pyspark_huggingface.svg"></a>
59
<a href="https://huggingface.co/datasets/"><img alt="Number of datasets" src="https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/datasets&color=brightgreen"></a>
610
</p>
711

12+
# Spark Data Source for Hugging Face Datasets
13+
814
A Spark Data Source for accessing [🤗 Hugging Face Datasets](https://huggingface.co/datasets):
915

10-
- Stream datasets directly from Hugging Face to your Spark application
11-
- Select subsets and splits
12-
- Apply projection and predicate filters for Parquet datasets
13-
- Push Spark DataFrames as Parquet files the Hugging Face Dataset Hub
16+
- Stream datasets from Hugging Face as Spark DataFrames
17+
- Select subsets and splits, apply projection and predicate filters
18+
- Save Spark DataFrames as Parquet files to Hugging Face
1419
- Fully distributed
1520
- Authentication via `huggingface-cli login` or tokens
1621
- Compatible with Spark 4 (with auto-import)

0 commit comments

Comments
 (0)