From raw market data to ML-ready features in five lines of code.
Documentation: finasys Docs
finasys is a toolkit for financial data processing — not manual wrangling — for ML pipelines and AI agents. It lets you go from raw market data to production-ready features in a few lines of code, whether you're building trading models, running portfolio analysis, or powering financial AI agents.
finasys is Polars-first — every indicator and feature runs as a native Polars expression, making it 10-100x faster than pandas-based alternatives with zero C dependencies (no ta-lib build headaches). It supports 37+ international markets, crypto, forex, commodities, and macro indicators out of the box. Learn more via our official documentation or start contributing via this GitHub repo.
import finasys as fs
# Load stock data (auto-cached with DuckDB)
df = fs.load("AAPL", start="2024-01-01")
# Add technical indicators + returns in one call
df = fs.features.add_all(df)
# Generate an LLM-ready summary
print(fs.agents.summarize(df))pip install finasysOptional extras:
pip install finasys[langchain] # LangChain tool integration
pip install finasys[pandas] # Pandas interop
pip install finasys[all] # Everything- Single
fs.load()entry point for Yahoo Finance, CSV, and Parquet files - Standardized OHLCV column names across all sources
- DuckDB-backed local caching (second call is instant)
- Multi-symbol fetching with automatic alignment
df = fs.load("AAPL", start="2024-01-01")
df = fs.load(["AAPL", "GOOGL", "MSFT"], start="2024-01-01")
df = fs.load("./data/prices.csv")- 15+ technical indicators: RSI, MACD, Bollinger Bands, ATR, VWAP, OBV, Stochastic, ADX, CCI, Williams %R, MFI, ROC, Momentum
- Returns: simple, log, cumulative, drawdown
- Rolling statistics: mean, std, min, max, skew, z-score
- Lag features with built-in look-ahead bias protection
- Calendar features: day of week, month, quarter
- Cross-sectional: rank, percentile, z-score across symbols
All implemented in pure Polars expressions -- no ta-lib C dependency, 10-100x faster than pandas-ta.
df = fs.features.rsi(df, period=14)
df = fs.features.macd(df)
df = fs.features.returns(df, periods=[1, 5, 21])- Forward returns for regression targets
- Ternary classification labels (up/flat/down) with configurable thresholds
- Triple-barrier labeling (Lopez de Prado method) -- the gold standard for financial ML
- Volatility-adjusted labels that adapt to the current regime
# Forward returns for regression
df = fs.features.forward_returns(df, periods=[1, 5])
# Classification labels
df = fs.features.classify_returns(df, period=5, thresholds=(-0.01, 0.01))
# Triple-barrier method
df = fs.features.triple_barrier_labels(df, profit_take=0.02, stop_loss=0.02, max_holding=10)
# Volatility-adjusted labels (adapts to regime)
df = fs.features.volatility_adjusted_labels(df, period=5, vol_multiplier=1.0)- Rolling kurtosis, skewness, tail ratio -- capture fat-tail dynamics
- Rolling Jarque-Bera normality test
- Z-score of returns vs rolling distribution
df = fs.features.rolling_kurtosis(df, window=30)
df = fs.features.rolling_skewness(df, window=30)
df = fs.features.tail_ratio(df, window=30)
df = fs.features.zscore_returns(df, window=30)- Sharpe, Sortino, Calmar ratios
- Value at Risk (historical, parametric, Cornish-Fisher)
- Conditional VaR (Expected Shortfall)
- CAPM alpha/beta, information ratio
- Max drawdown duration tracking
- Dual mode: scalar for reporting, rolling columns for ML features
# Scalar metrics (whole-series)
sharpe = fs.stats.sharpe_ratio(df) # => 1.47
var = fs.stats.value_at_risk(df, confidence=0.95) # => -0.0216
cvar = fs.stats.cvar(df, confidence=0.95) # => -0.0285
# Rolling metrics (ML features)
df = fs.stats.sharpe_ratio(df, window=63) # adds sharpe_63
df = fs.stats.value_at_risk(df, window=63) # adds var_63- One-call data quality assessment for financial time series
- Detects: missing dates, price outliers, suspected stock splits, zero-volume days
- Distribution analysis: skewness, kurtosis, Jarque-Bera normality test, tail ratio
- LLM-ready text summaries and JSON-serializable structured reports
# Text summary (great for LLM system prompts)
print(fs.profiler.profile_summary(df))
# DATA PROFILE | 252 rows x 7 columns
# Quality issues: 9 missing dates; 11 price outliers
# Returns distribution: skew=0.501, kurtosis=3.647, non-normal (JB p=0.0000)
# Full structured report
report = fs.profiler.profile(df)
report.quality.missing_dates # ['2024-01-15', '2024-02-19', ...]
report.distribution.is_normal # False
report.to_dict() # JSON-serializable- LLM-ready summaries of financial DataFrames
- Tool definitions in OpenAI function-calling format
- Context extraction for RAG-style usage
- Schema descriptions for system prompts
- LangChain integration (optional)
summary = fs.agents.summarize(df)
tools = fs.agents.tools(symbols=["AAPL", "GOOGL"])
from finasys.agents.langchain import get_tools
lc_tools = get_tools(symbols=["AAPL"])Serializable, reproducible feature pipelines with 17 built-in step classes.
pipeline = fs.FeatureSet([
fs.features.RSI(period=14),
fs.features.Returns(periods=[1, 5, 21]),
fs.features.RollingStats(windows=[5, 21]),
fs.features.RollingKurtosis(window=30),
fs.features.ForwardReturns(periods=[1, 5]),
fs.features.TripleBarrier(profit_take=0.02, stop_loss=0.02),
])
df = pipeline.transform(df)
pipeline.save("pipeline.json") # version control your feature engineering| finasys | pandas-ta | ta-lib | |
|---|---|---|---|
| Engine | Polars (fast) | pandas (slow) | C library |
| Install | pip install finasys |
pip install pandas-ta |
Requires C build tools |
| ML Targets | Triple-barrier, vol-adjusted labels | None | None |
| Risk Metrics | Sharpe, VaR, CVaR, alpha/beta | None | None |
| Data Profiling | Financial-specific quality checks | None | None |
| AI Agent support | Built-in | None | None |
| Caching | DuckDB auto-cache | None | None |
| Look-ahead protection | Built-in | None | None |
Apache-2.0
