-
Notifications
You must be signed in to change notification settings - Fork 0
[PD1-597] Document pandas DataFrame integration and EAV functions #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
04ecd12
chore: document use with pandas DataFrame
joshuanapoli 300816c
chore: document the EAV functions
joshuanapoli 7f7c399
fix: print complete row
joshuanapoli 414c2f5
fix: align filter and column IDs
joshuanapoli b134adf
fix: clarify list length limit
joshuanapoli 5294192
fix: clarify description of row fields
joshuanapoli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,7 +33,7 @@ import cvec | |
| from datetime import datetime | ||
| ``` | ||
|
|
||
| Construct the CVec client. The host, tenant, and api_key can be given through parameters to the constructor or from the environment variables CVEC_HOST, and CVEC_API_KEY: | ||
| Construct the CVec client. The host and api_key can be given through parameters to the constructor or from the environment variables CVEC_HOST and CVEC_API_KEY: | ||
|
|
||
| ``` | ||
| cvec = cvec.CVec() | ||
|
|
@@ -88,7 +88,7 @@ mygroup/myedge/compressor01/motor/power_kw | |
|
|
||
| ### Metric Data | ||
|
|
||
| The main content for a metric is a set of points where the metric value changed. These are returned as a Pandas Dataframe with columns for name, time, value_double, value_string. | ||
| The main content for a metric is a set of points where the metric value changed. These are returned with columns for name, time, value_double, value_string. | ||
|
|
||
| To get all of the value changes for all metrics at 10am on 2025-05-14, run: | ||
|
|
||
|
|
@@ -114,6 +114,17 @@ Example output: | |
|
|
||
| [46257 rows x 4 columns] | ||
| ``` | ||
| #### Pandas Data Frames | ||
|
|
||
| Use the `get_metric_arrow` function to efficiently load data into a pandas DataFrame like this: | ||
|
|
||
| ```python | ||
| import pandas as pd | ||
| import pyarrow as pa | ||
|
|
||
| reader = pa.ipc.open_file(cvec.get_metric_arrow(names=["tag1", "tag2"])) | ||
| df = reader.read_pandas() | ||
| ``` | ||
|
|
||
| ### Adding Metric Data | ||
|
|
||
|
|
@@ -198,9 +209,9 @@ The script automatically: | |
|
|
||
| The SDK provides an API client class named `CVec` with the following functions. | ||
|
|
||
| ## `__init__(?host, ?tenant, ?api_key, ?default_start_at, ?default_end_at)` | ||
| ## `__init__(?host, ?api_key, ?default_start_at, ?default_end_at)` | ||
|
|
||
| Setup the SDK with the given host and API Key. The host and API key are loaded from environment variables CVEC_HOST, CVEC_API_KEY, if they are not given as arguments to the constructor. The `default_start_at` and `default_end_at` can provide a default query time interval for API methods. | ||
| Setup the SDK with the given host and API Key. The host and API key are loaded from environment variables CVEC_HOST and CVEC_API_KEY if they are not given as arguments to the constructor. The tenant ID is automatically fetched from the host's `/config` endpoint. The `default_start_at` and `default_end_at` can provide a default query time interval for API methods. | ||
|
|
||
| ## `get_spans(name, ?start_at, ?end_at, ?limit)` | ||
|
|
||
|
|
@@ -261,3 +272,114 @@ Fetch actual data values from modeling metrics within a time range in Apache Arr | |
| - `end_at`: Optional end time for the query (uses class default if not specified) | ||
|
|
||
| Returns Arrow IPC format data that can be read using `pyarrow.ipc.open_file()`. | ||
|
|
||
| ## `get_eav_tables()` | ||
|
|
||
| Get all EAV (Entity-Attribute-Value) tables for the tenant. EAV tables store semi-structured data where each row represents an entity with flexible attributes. | ||
|
|
||
| Returns a list of `EAVTable` objects, each containing: | ||
| - `id`: The table's UUID | ||
| - `tenant_id`: The tenant ID | ||
| - `name`: Human-readable table name | ||
| - `created_at`: When the table was created | ||
| - `updated_at`: When the table was last updated | ||
|
Comment on lines
+281
to
+285
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| Example: | ||
| ```python | ||
| tables = cvec.get_eav_tables() | ||
| for table in tables: | ||
| print(f"{table.name} (id: {table.id})") | ||
| ``` | ||
|
|
||
| ## `get_eav_columns(table_id)` | ||
|
|
||
| Get all columns for a specific EAV table. | ||
|
|
||
| - `table_id`: The UUID of the EAV table | ||
|
|
||
| Returns a list of `EAVColumn` objects, each containing: | ||
| - `eav_table_id`: The parent table's UUID | ||
| - `eav_column_id`: The column's ID (used for queries) | ||
| - `name`: Human-readable column name | ||
| - `type`: Data type (`"number"`, `"string"`, or `"boolean"`) | ||
| - `created_at`: When the column was created | ||
|
|
||
| Example: | ||
| ```python | ||
| columns = cvec.get_eav_columns("00000000-0000-0000-0000-000000000000") | ||
| for column in columns: | ||
| print(f" {column.name} ({column.type}, id: {column.eav_column_id})") | ||
| ``` | ||
|
|
||
| ## `select_from_eav(table_name, ?column_names, ?filters)` | ||
|
|
||
| Query pivoted data from EAV tables using human-readable names. This is the recommended method for most use cases as it allows you to work with table and column names instead of UUIDs. | ||
|
|
||
| - `table_name`: Name of the EAV table to query | ||
| - `column_names`: Optional list of column names to include. If `None`, all columns are returned. | ||
| - `filters`: Optional list of `EAVFilter` objects to filter results | ||
|
|
||
| Each `EAVFilter` must use `column_name` and can specify: | ||
| - `column_name`: The column name to filter on (required) | ||
| - `numeric_min`: Minimum numeric value (inclusive) | ||
| - `numeric_max`: Maximum numeric value (exclusive) | ||
| - `string_value`: Exact string value to match | ||
| - `boolean_value`: Boolean value to match | ||
|
|
||
| Returns a list of dictionaries (maximum 1000 rows), each representing a row with an `id` field and fields for each requested column. | ||
|
|
||
| Example: | ||
| ```python | ||
| from cvec import CVec, EAVFilter | ||
|
|
||
| # Query with filters | ||
| filters = [ | ||
| EAVFilter(column_name="Weight", numeric_min=100, numeric_max=200), | ||
| EAVFilter(column_name="Status", string_value="ACTIVE"), | ||
| ] | ||
|
|
||
| rows = cvec.select_from_eav( | ||
| table_name="Production Data", | ||
| column_names=["Date", "Weight", "Status"], | ||
| filters=filters, | ||
| ) | ||
|
|
||
| for row in rows: | ||
| print(row) | ||
| ``` | ||
|
|
||
| ## `select_from_eav_id(table_id, ?column_ids, ?filters)` | ||
|
|
||
| Query pivoted data from EAV tables using table and column IDs directly. This is a lower-level method for cases where you already have the UUIDs and want to avoid name lookups. | ||
|
|
||
| - `table_id`: UUID of the EAV table to query | ||
| - `column_ids`: Optional list of column IDs to include. If `None`, all columns are returned. | ||
| - `filters`: Optional list of `EAVFilter` objects to filter results | ||
|
|
||
| Each `EAVFilter` must use `column_id` and can specify: | ||
| - `column_id`: The column ID to filter on (required) | ||
| - `numeric_min`: Minimum numeric value (inclusive) | ||
| - `numeric_max`: Maximum numeric value (exclusive) | ||
| - `string_value`: Exact string value to match | ||
| - `boolean_value`: Boolean value to match | ||
|
|
||
| Returns a list of dictionaries (up to 1000), each representing a row. Each row has a field for each column plus an `id` (the "row ID"). | ||
|
|
||
| Example: | ||
| ```python | ||
| from cvec import EAVFilter | ||
|
|
||
| filters = [ | ||
| EAVFilter(column_id="abcd", numeric_min=100, numeric_max=200), | ||
| EAVFilter(column_id="efgh", string_value="ACTIVE"), | ||
| ] | ||
|
|
||
| rows = cvec.select_from_eav_id( | ||
| table_id="00000000-0000-0000-0000-000000000000", | ||
| column_ids=["abcd", "efgh", "ijkl"], | ||
joshuanapoli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| filters=filters, | ||
| ) | ||
|
|
||
| for row in rows: | ||
| print(f"ID: {row['id']}, Values: {row}") | ||
| ``` | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this change to be more generic about the return type is a good step, there are still inconsistencies in the documentation for
get_metric_data. The example output immediately following this line, and the function description under theCVec Classsection (line 234), both strongly imply a Pandas DataFrame is returned. However, the function actually returns aList[MetricDataPoint]. To avoid confusion, I recommend updating the other parts of the documentation to be consistent with the function's actual return type.