Skip to content

Commit 90e0cd8

Browse files
committed
Add documentation
1 parent 5ecb1ba commit 90e0cd8

File tree

2 files changed

+89
-4
lines changed

2 files changed

+89
-4
lines changed

docs/source/user-guide/configuration.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
.. specific language governing permissions and limitations
1616
.. under the License.
1717
18+
.. _configuration:
19+
1820
Configuration
1921
=============
2022

docs/source/user-guide/sql.rst

Lines changed: 87 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,100 @@ DataFusion also offers a SQL API, read the full reference `here <https://arrow.a
2323
.. ipython:: python
2424
2525
import datafusion
26-
from datafusion import col
27-
import pyarrow
26+
from datafusion import DataFrame, SessionContext
2827
2928
# create a context
3029
ctx = datafusion.SessionContext()
3130
3231
# register a CSV
33-
ctx.register_csv('pokemon', 'pokemon.csv')
32+
ctx.register_csv("pokemon", "pokemon.csv")
3433
3534
# create a new statement via SQL
3635
df = ctx.sql('SELECT "Attack"+"Defense", "Attack"-"Defense" FROM pokemon')
3736
3837
# collect and convert to pandas DataFrame
39-
df.to_pandas()
38+
df.to_pandas()
39+
40+
Parameterized queries
41+
---------------------
42+
43+
In DataFusion-Python 51.0.0 we introduced the ability to pass parameters
44+
in a SQL query. These are similar in concept to
45+
`prepared statements <https://datafusion.apache.org/user-guide/sql/prepared_statements.html>`_,
46+
but allow passing named parameters into a SQL query. Consider this simple
47+
example.
48+
49+
.. ipython:: python
50+
51+
def show_attacks(ctx: SessionContext, threshold: int) -> None:
52+
ctx.sql(
53+
'SELECT "Name", "Attack" FROM pokemon WHERE "Attack" > $val', val=threshold
54+
).show(num=5)
55+
show_attacks(ctx, 75)
56+
57+
When passing parameters like the example above we convert the Python objects
58+
into their string representation. We also have special case handling
59+
for :py:class:`~datafusion.dataframe.DataFrame` objects, since they cannot simply
60+
be turned into string representations for an SQL query. In these cases we
61+
will register a temporary view in the :py:class:`~datafusion.context.SessionContext`
62+
using a generated table name.
63+
64+
The formatting for passing string replacement objects is to precede the
65+
variable name with a single ``$``. This works for all dialects in
66+
the SQL parser except ``hive`` and ``mysql``. Since these dialects do not
67+
support named placeholders, we are unable to do this type of replacement.
68+
We recommend either switching to another dialect or using Python
69+
f-string style replacement.
70+
71+
.. warning::
72+
73+
To support DataFrame parameterized queries, your session must support
74+
registration of temporary views. The default
75+
:py:class:`~datafusion.catalog.CatalogProvider` and
76+
:py:class:`~datafusion.catalog.SchemaProvider` do have this capability.
77+
If you have implemented custom providers, it is important that temporary
78+
views do not persist across :py:class:`~datafusion.context.SessionContext`
79+
or you may get unintended consequences.
80+
81+
The following example shows passing in both a :py:class:`~datafusion.dataframe.DataFrame`
82+
object as well as a Python object to be used in parameterized replacement.
83+
84+
.. ipython:: python
85+
86+
def show_column(
87+
ctx: SessionContext, column: str, df: DataFrame, threshold: int
88+
) -> None:
89+
ctx.sql(
90+
'SELECT "Name", $col FROM $df WHERE $col > $val',
91+
col=column,
92+
df=df,
93+
val=threshold,
94+
).show(num=5)
95+
df = ctx.table("pokemon")
96+
show_column(ctx, '"Defense"', df, 75)
97+
98+
The approach implemented for conversion of variables into a SQL query
99+
relies on string conversion. This has the potential for data loss,
100+
specifically for cases like floating point numbers. If you need to pass
101+
variables into a parameterized query and it is important to maintain the
102+
original value without conversion to a string, then you can use the
103+
optional parameter ``param_values`` to specify these. This parameter
104+
expects a dictionary mapping from the parameter name to a Python
105+
object. Those objects will be cast into a
106+
`PyArrow Scalar Value <https://arrow.apache.org/docs/python/generated/pyarrow.Scalar.html>`_.
107+
108+
Using ``param_values`` will rely on the SQL dialect you have configured
109+
for your session. This can be set using the :ref:`configuration options <configuration>`
110+
of your :py:class:`~datafusion.context.SessionContext`. Similar to how
111+
`prepared statements <https://datafusion.apache.org/user-guide/sql/prepared_statements.html>`_
112+
work, these parameters are limited to places where you would pass in a
113+
scalar value, such as a comparison.
114+
115+
.. ipython:: python
116+
117+
def param_attacks(ctx: SessionContext, threshold: int) -> None:
118+
ctx.sql(
119+
'SELECT "Name", "Attack" FROM pokemon WHERE "Attack" > $val',
120+
param_values={"val": threshold},
121+
).show(num=5)
122+
param_attacks(ctx, 75)

0 commit comments

Comments
 (0)