Quick Start

Quick Start Guide

Get up and running with PipeFrame in 5 minutes!

Your First Pipeline

from pipeframe import DataFrame, filter, select, arrange

# Create a DataFrame
df = DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 28],
    'salary': [50000, 60000, 70000, 55000],
    'department': ['Sales', 'IT', 'Sales', 'IT']
})

# Build a pipeline
result = (df
    >> filter('age > 27')
    >> select('name', 'salary', 'department')
    >> arrange('-salary')
)

print(result)

Output:

      name  salary department
0  Charlie   70000      Sales
1      Bob   60000         IT
2    David   55000         IT

Core Concepts

1. The Pipe Operator (`>>`)

Read >> as "then" or "pipe to":

df >> filter('x > 5')  # Take df, THEN filter where x > 5

2. Import Everything

from pipeframe import *

This imports all data manipulation functions.

3. String Expressions

Most operations use simple string expressions:

df >> filter('age > 30 & salary > 50000')
df >> define(bonus='salary * 0.1')

Common Operations

Filtering Rows

# Single condition
df >> filter('age > 30')

# Multiple conditions
df >> filter('age > 30 & department == "Sales"')

# String operations
df >> filter('name.str.startswith("A")')

Creating Columns

df >> define(
    bonus='salary * 0.1',
    total='salary + bonus',
    senior='age > 35'
)

Selecting Columns

# Select specific columns
df >> select('name', 'salary')

# Select columns by pattern
df >> select(starts_with('sal'))

Sorting

# Ascending
df >> arrange('age')

# Descending (use minus sign)
df >> arrange('-salary')

# Multiple columns
df >> arrange('department', '-salary')

Grouping and Summarizing

result = (df
    >> group_by('department')
    >> summarize(
       avg_salary='mean(salary)',
        count='count()',
        total='sum(salary)'
    )
)

Complete Example

Here's a realistic data analysis pipeline:

from pipeframe import *

# Sales data analysis
sales_analysis = (sales_data
    # Data cleaning
    >> filter('revenue > 0 & date >= "2024-01-01"')
    >> define(
        quarter='pd.to_datetime(date).dt.quarter',
        profit='revenue - cost',
        margin='(profit / revenue) * 100'
    )
    
    # Grouping and aggregation
    >> group_by('product', 'quarter')
    >> summarize(
        total_revenue='sum(revenue)',
        total_profit='sum(profit)',
        avg_margin='mean(margin)',
        num_sales='count()'
    )
    
    # Final touches
    >> arrange('-total_revenue')
    >> select('product', 'quarter', 'total_revenue', 'total_profit')
)

print(sales_analysis)

Tips for Beginners

Start Simple: Begin with single operations, then chain them
Use peek(): Debug your pipeline with >> peek(n=3)
Read Aloud: Say "then" when you see >>
Test Expressions: Try expressions in Python first if unsure

Next Steps

Examples - See more real-world examples
API Reference - Learn all available functions
FAQ - Common questions answered

Happy piping! 🔄

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start

Quick Start Guide

Your First Pipeline

Core Concepts

1. The Pipe Operator (`>>`)

2. Import Everything

3. String Expressions

Common Operations

Filtering Rows

Creating Columns

Selecting Columns

Sorting

Grouping and Summarizing

Complete Example

Tips for Beginners

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Quick Start

Quick Start Guide

Your First Pipeline

Core Concepts

1. The Pipe Operator (>>)

2. Import Everything

3. String Expressions

Common Operations

Filtering Rows

Creating Columns

Selecting Columns

Sorting

Grouping and Summarizing

Complete Example

Tips for Beginners

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

1. The Pipe Operator (`>>`)