Skip to content

Add JSONPath parser support#44

Merged
ets merged 3 commits intoets:mainfrom
TyShkan:feature_extended_jsonpath
Oct 27, 2025
Merged

Add JSONPath parser support#44
ets merged 3 commits intoets:mainfrom
TyShkan:feature_extended_jsonpath

Conversation

@TyShkan
Copy link
Copy Markdown
Contributor

@TyShkan TyShkan commented Mar 25, 2023

Closes #42

I struggled to run unittests due to dependencies conflicts, so I ran the test config with Meltano:

    config:
      tables:
        - "name": "json_sample_multiple_records"
          "path": "file://../../data/test/"
          "pattern": "^sample\\.json$"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "format": "json"
        - "name": "json_sample_one_record"
          "path": "file://../../data/test/"
          "pattern": "^one-row-sample\\.json$"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "format": "json"
        - "name": "jsonl_sample_multiple_records"
          "path": "file://../../data/test/"
          "pattern": "sample-jsonl\\.json"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "format": "jsonl"
          "universal_newlines": True
        - "name": "jsonl_sample_one_record"
          "path": "file://../../data/test/"
          "pattern": "one-row-sample-jsonl\\.json"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "format": "jsonl"
        - "name": "jsonl_sample_multiple_records_detect"
          "path": "file://../../data/test/"
          "pattern": "^sample\\.jsonl$"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "format": "detect"
        - "name": "jsonl_sample_one_record_detect"
          "path": "file://../../data/test/"
          "pattern": "^one-row-sample\\.jsonl$"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "format": "detect"
        - "name": "json_sample_nested_path"
          "path": "file://../../data/test/"
          "pattern": "^sample_path\\.json$"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "json_path": "data"
          "format": "json"
        - "name": "json_sample_deep_nested_path"
          "path": "file://../../data/test/"
          "pattern": "sample_deep_path\\.json"
          "start_date": "2017-05-01T00:00:00Z"
          "key_properties": [ "id" ]
          "json_path": "data.response[*]"
          "format": "json"

with some testing files included: test.zip

@menzenski menzenski self-requested a review March 28, 2023 14:48
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these test cases be added as net-new test cases, rather than updating existing tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to clean it up a bit to remove a confusion and also added one new test for JSONPath case to the end of the file.

All of the tests call only one function json_handler.get_row_iterator which expects only one option (json_path) from the configuration. Table specs dict had excel config which is a bit confusing for json handler test, so I updated it, and regrouped specs and their related tests to make them more transparent. And I still missed "badnewlines" name of the first table spec (:

If you consider changes in old tests as a bad practise, I could rollback my changes and add new tests on top of the old ones.

Copy link
Copy Markdown
Contributor Author

@TyShkan TyShkan Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've managed to build env with working dependencies and run tests. Some of the old tests don't work though, but it's not caused by the changes in this pull request and related to Excel-files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean test branch converted to poetry & GitHub workflow running test on different python versions committed here: https://github.com/TyShkan/tap-spreadsheets-anywhere/commits/poetry

@TyShkan TyShkan requested a review from menzenski March 30, 2023 14:31
@jcbmllgn
Copy link
Copy Markdown

jcbmllgn commented May 1, 2023

Hello 👋 any update on when this might make it into main? This would be very useful for me!

@ets ets merged commit a4c8f9c into ets:main Oct 27, 2025
Comment thread setup.py
'xlrd',
'paramiko',
'azure-storage-blob>=12.14.0',
'jsonpath-ng>=1.5.3'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke installation of the package.

#95 should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend "json_path" config option with JSONPath parser for deep nested data

5 participants