Skip to content

Add JSON output formatting#293

Open
NathanDuboisset wants to merge 4 commits intoseddonym:mainfrom
NathanDuboisset:feat/json-formatted-output
Open

Add JSON output formatting#293
NathanDuboisset wants to merge 4 commits intoseddonym:mainfrom
NathanDuboisset:feat/json-formatted-output

Conversation

@NathanDuboisset
Copy link

[ ] Polish a JSON output
[ ] Add tests for the change.
[ ] Add any appropriate documentation.
[ ] Run checks and CI

@NathanDuboisset
Copy link
Author

i wonder if a tag --json is better or a tag in the form --format json to open for other types of formatting later

@NathanDuboisset NathanDuboisset changed the title first command workflow Add JSON output formatting Oct 9, 2025
@seddonym
Copy link
Owner

seddonym commented Oct 9, 2025

Another approach to making this a CLI argument would be to provide a Python API for running Import Linter (something I've been meaning to do anyway) with a structured return value. Then a plugin could wrap that (without needing to parse JSON). Any thoughts on that approach?

@NathanDuboisset
Copy link
Author

That would be pretty nice, but it doesnt solve all the problems ; vs code extensions for example have to be done using JS and node JS ; i think it would be nice in a second time, i would be happy to help you do that

@seddonym
Copy link
Owner

seddonym commented Oct 9, 2025

But you're extension is going to have to run some Python somehow, right? So perhaps a good approach would be to for you to write the wrapper Python CLI in a different package, that would then call Import Linter's Python API and transform the data how you need it to be. That would give you more control over tailoring it specifically to the extension, while providing a more generally useful tool in Import Linter.

I've been meaning to add a Python API anyway so it might be less work for you, too!

@NathanDuboisset
Copy link
Author

Usually you run command lines and get the output i think, but yeah that could do it ; it would be especially helpful, since the extension might not want to rerun everything everytime ; have you ever worked with pydantic ? it could be really helpful : everything would work as normal but JSONification of objects(like report etc..) would be much simpler ; i think ill start with just the json output to familiarize with the codebase, i'll look into the API part after it

idea of what the json could look like ; using pydantic would be way better
@NathanDuboisset
Copy link
Author

i think this works, I'm having trouble with how to make the tests, i don't know what level of testing you want(it should be a formatting of the same results in normal mode)

@NathanDuboisset NathanDuboisset marked this pull request as ready for review October 14, 2025 13:13
Copy link
Owner

@seddonym seddonym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this - I think this is a good start.

If we are to output it as json, we're setting the expectation that people will build tooling on top of it, so we need to be quite careful about the structure as changing it later could cause breaking changes. I think concentrating on that schema is the most important thing to figure out first.

Let's consider a sample output:

{
    "contracts": [
        {
            "kept": "KEPT",
            "name": "Blue",
            "warnings": " (1 warning)"
        },
        {
            "kept": "KEPT",
            "name": "Green",
            "warnings": ""
        }    ],
    "correct_run": true,
    "header": [
        {
            "dependency_count": 286,
            "file_count": 223
        }
    ]
}

For me we could improve upon the structure, e.g.

{
    "contracts": {
        "blue": { # Id for the contract.
            "kept": true,
            "name": "Blue",
            "warnings": [
                  {
                       "code": "IL010",
                       "message": "No matches for ignored import foo -> bar.",
                       "data": {
                         "expression": "foo -> bar"
                       }
                   }
             ]
        },
        "green": {
            "kept": true,
            "name": "Green",
            "warnings": []
        }
    },
    "completed": true,  # Whether or not the contract could run.
    "kept": true,  # Whether all contracts passed.
    "dependency_count": 286,
    "file_count": 223,
    "version": 1 # Version of schema in case tools need to support multiple versions.
}

There's a second pass. I don't think it's quite right yet, e.g. we should probably have richer information about errors and failures. Feel free to iterate.

I think we should concentrate on figuring on the schema first before doing more work on implementation.

warnings_text = _build_warning_text(warnings_count=len(contract_check.warnings))
final_dict: dict[str, Any] = {
"name": contract.name,
"kept": "KEPT" if contract_check.kept else "BROKEN",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to make this a bool?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, way better for integration, i just did a skeleton and choose the same notations as you

final_dict: dict[str, Any] = {
"name": contract.name,
"kept": "KEPT" if contract_check.kept else "BROKEN",
"warnings": warnings_text,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this should be more structured, e.g. a set of warnings with structured data (along, perhaps, with some free text).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed too


def build_json_contract_result(
contract: Contract, contract_check: ContractCheck, duration: Optional[int]
) -> dict:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to define more precisely the result schema, maybe with a TypedDict?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that would be nice, i usually use pydantic for that but typedDict will do ;)

@NathanDuboisset
Copy link
Author

i think your iteration is quite right, no data is lost or formatted too much ; maybe add the time of the contract run ? i dont know exactly where to gather the version number

@seddonym
Copy link
Owner

maybe add the time of the contract run

I don't think this is necessary, unless you have a use case in mind?

i dont know exactly where to gather the version number

This can just be hardcoded as 1, maybe as a constant. If we change the schema we'll bump the version number.

@seddonym seddonym mentioned this pull request Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants