Skip to content

Commit 62f9df2

Browse files
committed
Updated the Arrow streaming documentation to describe incremental execution, remove the note block, and highlight lazy batch retrieval when using __arrow_c_stream__
1 parent e9994cf commit 62f9df2

File tree

1 file changed

+4
-10
lines changed

1 file changed

+4
-10
lines changed

docs/source/user-guide/io/arrow.rst

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -60,16 +60,10 @@ Exporting from DataFusion
6060
DataFusion DataFrames implement ``__arrow_c_stream__`` PyCapsule interface, so any
6161
Python library that accepts these can import a DataFusion DataFrame directly.
6262

63-
.. note::
64-
Invoking ``__arrow_c_stream__`` still triggers execution of the underlying
65-
query, but batches are yielded incrementally rather than materialized all at
66-
once in memory. Consumers can process the stream as it arrives, avoiding the
67-
memory overhead of a full
68-
:py:func:`datafusion.dataframe.DataFrame.collect`.
69-
70-
For an example of this streamed execution and its memory safety, see the
71-
``test_arrow_c_stream_large_dataset`` unit test in
72-
:mod:`python.tests.test_io`.
63+
Invoking ``__arrow_c_stream__`` triggers execution of the underlying query, but
64+
batches are yielded incrementally rather than materialized all at once in memory.
65+
Consumers can process the stream as it arrives. The stream executes lazily,
66+
letting downstream readers pull batches on demand.
7367

7468

7569
.. ipython:: python

0 commit comments

Comments
 (0)