-
Notifications
You must be signed in to change notification settings - Fork 214
Docstring Best Practice
Dataprep uses a few sphinx packages to accelerate docstring writing, thus brings in additional best practices. Here lists all these best practices and please kindly give it a read.
-
Automatic parameter type inference.
Dataprep strongly enforces typing for all the functions, classes and variables. When writing function parameters, the convention from
NumPy
says you should write the parameter type after a:
. Here, we don't, as long as the type is annotated correctly in the function signature. Takedataprep.eda.basic.plot
as an example: Since we have the function signature typed,def plot( df: Union[pd.DataFrame, dd.DataFrame], x: Optional[str] = None, y: Optional[str] = None, *, bins: int = 10, ngroups: int = 10, largest: bool = True, nsubgroups: int = 5, bandwidth: float = 1.5, sample_size: int = 1000, value_range: Optional[Tuple[float, float]] = None, yscale: str = "linear", tile_size: Optional[float] = None, ) -> Figure: ...
-
No Type for Function Parameters
In the docstring you don't need to write type for a parameter
Parameters ---------- df Dataframe from which plots are to be generated
we already have the type of
df
from the signature. Also, the documentation will be generated correctly as: -
Give the Type for Default Values
Alternatively, you can still write the parameter type to override the auto-generated one. A very good use case would be default values:
Parameters ---------- x: Optional[str], default None A valid column name from the dataframe.
This gives you
Notice that how the parameter type changes from bold to italic - this is the sign of ** overridden** parameter types.
-
No Returns Unless for Comments
We can also infer the function return type from the signature! This means no need for docstrings like this:
Returns ------- Figure An object of figure
, unless you want to write some meaningful comments for the return type:
Returns ------- Figure A meaningful message!!!
-
-
Make class members private by a leading
_
.Remember all the members without a leading underscore will be shown in the documentation!
- Module Docstring: one short description of the main purpose of the file. E.g.,
"""Clean and validate a DataFrame column containing geographic coordinates."""
-
Function Docstring
a. Start with a high-level, one-sentence description of the function. E.g,
""" Clean and standardize latitude and longitude coordinates.
b. Optionally, further relevant information can be given in paragraphs under the first sentence.
c. If there exists an associated User Guide, the last sentence before the parameter descriptions should reference it. E.g.,
Read more in the :ref:`User Guide <clean_lat_long_user_guide>`. Parameters ----------
-
Parameter Descriptions
a. If a parameter defines a format, an example should be given. E.g.
output_format The desired format of the coordinates. - 'dd': decimal degrees (51.4934, 0.0098) - 'ddh': decimal degrees with hemisphere ('51.4934° N, 0.0098° E') - 'dm': degrees minutes ('51° 29.604′ N, 0° 0.588′ E') - 'dms': degrees minutes seconds ('51° 29′ 36.24″ N, 0° 0′ 35.28″ E') (default: 'dd')
b. The default value should be specified after a blank line at the end of the parameter description for example as below
report If True, output the summary report. Otherwise, no report is outputted. (default: True)
c. If a parameter has the exact same functionality as in other functions, the description should be the same. E.g., the
report
parameter above. -
Examples: after defining the parameters, include a short example that demonstrates the function. E.g.
Examples
--------
Split a column containing latitude and longitude strings into separate
columns in decimal degrees format.
>>> df = pd.DataFrame({'coordinates': ['51° 29′ 36.24″ N, 0° 0′ 35.28″ E', '51.4934° N, 0.0098° E']})
>>> clean_lat_long(df, 'coordinates', split=True)
coordinates latitude longitude
0 51° 29′ 36.24″ N, 0° 0′ 35.28″ E 51.4934 0.0098
1 51.4934° N, 0.0098° E 51.4934 0.0098
Notes:
- Each statement should begin with a capital letter and end with a period.
- All internal functions should begin with an underscore so they do not appear in the documentation.
- Be declarative and concise.
To add a file to appear in the API reference section of the documentation, add it in alphabetical order here.
To create a reference between a User Guide and docstring, follow the instructions here.