Skip to content

Improved listing of Directory#981

Draft
LanderOtto wants to merge 3 commits intomasterfrom
fix/default-dir-listing
Draft

Improved listing of Directory#981
LanderOtto wants to merge 3 commits intomasterfrom
fix/default-dir-listing

Conversation

@LanderOtto
Copy link
Copy Markdown
Collaborator

@LanderOtto LanderOtto commented Mar 4, 2026

This commit optimizes and fixes the listing field behavior for CWL Directory objects:

  • Improved InitialWorkDir performance by avoiding an unnecessary full directory visit. Removed the job parameter in the build_token method call, as filesystem operations are already handled during _prepare_work_dir function.
  • _get_listing method returned resolved paths when the dirpath was a symbolic link. In this case, the CWLFileToken created had the main path field with the symbolic link path, while the listed files had the resolved paths. This behavior was prone to errors in the remap_token_value function. Now, the _get_listing method returns the symbolic link paths.
  • the update_file_token function populated the listing file when the no_listing was defined and the listing file was not present. In this case, nothing must be done.

@LanderOtto LanderOtto changed the title Fix default loadListing value Avoid loadListing when not necessary Mar 4, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 4, 2026

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
1870 4 1866 8
View the top 3 failed test(s) by shortest run time
tests/test_remotepath.py::test_download[kubernetes]
Stack Traces | 0.582s run time
context = <streamflow.core.context.StreamFlowContext object at 0x7f5f76588550>
connector = <streamflow.deployment.connector.kubernetes.KubernetesConnector object at 0x7f5f7533fa90>
location = <streamflow.core.deployment.ExecutionLocation object at 0x7f5f74db9d90>

    @pytest.mark.asyncio
    async def test_download(
        context: StreamFlowContext, connector: Connector, location: ExecutionLocation
    ) -> None:
        """Test remote file download."""
        urls = [
            "https://raw.githubusercontent..../streamflow/master/LICENSE",
            "https://github..../refs/tags/0.1.6.zip",
        ]
        parent_dir = StreamFlowPath(
            tempfile.gettempdir() if location.local else "/tmp",
            utils.random_name(),
            context=context,
            location=location,
        )
        paths = [
            parent_dir / "LICENSE",
            parent_dir / "streamflow-0.1.6.zip",
        ]
        for i, url in enumerate(urls):
            path = None
            try:
>               path = await remotepath.download(context, location, url, str(parent_dir))

tests/test_remotepath.py:90: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
streamflow/data/remotepath.py:908: in download
    async with session.head(url, allow_redirects=True) as response:
.tox/py3.10-unit/lib/python3.10........./site-packages/aiohttp/client.py:1510: in __aenter__
    self._resp: _RetType = await self._coro
.tox/py3.10-unit/lib/python3.10........./site-packages/aiohttp/client.py:779: in _request
    resp = await handler(req)
.tox/py3.10-unit/lib/python3.10........./site-packages/aiohttp/client.py:757: in _connect_and_send_request
    await resp.start(conn)
.tox/py3.10-unit/lib/python3.10.../site-packages/aiohttp/client_reqrep.py:539: in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
.tox/py3.10-unit/lib/python3.10.../site-packages/aiohttp/streams.py:703: in read
    await self._waiter
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = None
future = <Future finished exception=ServerDisconnectedError('Server disconnected')>

    def __wakeup(self, future):
        try:
>           future.result()
E           aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

.../hostedtoolcache/Python/3.10.19.../x64/lib/python3.10/asyncio/tasks.py:304: ServerDisconnectedError
tests/test_remotepath.py::test_download[ssh]
Stack Traces | 0.596s run time
context = <streamflow.core.context.StreamFlowContext object at 0x7f5f76588550>
connector = <streamflow.deployment.connector.ssh.SSHConnector object at 0x7f5f7533ceb0>
location = <streamflow.core.deployment.ExecutionLocation object at 0x7f5f703aad50>

    @pytest.mark.asyncio
    async def test_download(
        context: StreamFlowContext, connector: Connector, location: ExecutionLocation
    ) -> None:
        """Test remote file download."""
        urls = [
            "https://raw.githubusercontent..../streamflow/master/LICENSE",
            "https://github..../refs/tags/0.1.6.zip",
        ]
        parent_dir = StreamFlowPath(
            tempfile.gettempdir() if location.local else "/tmp",
            utils.random_name(),
            context=context,
            location=location,
        )
        paths = [
            parent_dir / "LICENSE",
            parent_dir / "streamflow-0.1.6.zip",
        ]
        for i, url in enumerate(urls):
            path = None
            try:
                path = await remotepath.download(context, location, url, str(parent_dir))
                assert path == paths[i]
>               assert await path.exists()
E               assert False

tests/test_remotepath.py:92: AssertionError
tests/test_remotepath.py::test_download[local]
Stack Traces | 1.05s run time
context = <streamflow.core.context.StreamFlowContext object at 0x7f5f76588550>
connector = <streamflow.deployment.connector.local.LocalConnector object at 0x7f5f74802170>
location = <streamflow.core.deployment.ExecutionLocation object at 0x7f5f74917990>

    @pytest.mark.asyncio
    async def test_download(
        context: StreamFlowContext, connector: Connector, location: ExecutionLocation
    ) -> None:
        """Test remote file download."""
        urls = [
            "https://raw.githubusercontent..../streamflow/master/LICENSE",
            "https://github..../refs/tags/0.1.6.zip",
        ]
        parent_dir = StreamFlowPath(
            tempfile.gettempdir() if location.local else "/tmp",
            utils.random_name(),
            context=context,
            location=location,
        )
        paths = [
            parent_dir / "LICENSE",
            parent_dir / "streamflow-0.1.6.zip",
        ]
        for i, url in enumerate(urls):
            path = None
            try:
>               path = await remotepath.download(context, location, url, str(parent_dir))

tests/test_remotepath.py:90: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
streamflow/data/remotepath.py:895: in download
    async with session.get(url) as response:
.tox/py3.10-unit/lib/python3.10........./site-packages/aiohttp/client.py:1510: in __aenter__
    self._resp: _RetType = await self._coro
.tox/py3.10-unit/lib/python3.10........./site-packages/aiohttp/client.py:779: in _request
    resp = await handler(req)
.tox/py3.10-unit/lib/python3.10........./site-packages/aiohttp/client.py:757: in _connect_and_send_request
    await resp.start(conn)
.tox/py3.10-unit/lib/python3.10.../site-packages/aiohttp/client_reqrep.py:539: in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
.tox/py3.10-unit/lib/python3.10.../site-packages/aiohttp/streams.py:703: in read
    await self._waiter
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = None
future = <Future finished exception=ServerDisconnectedError('Server disconnected')>

    def __wakeup(self, future):
        try:
>           future.result()
E           aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

.../hostedtoolcache/Python/3.10.19.../x64/lib/python3.10/asyncio/tasks.py:304: ServerDisconnectedError
tests/test_remotepath.py::test_download[docker]
Stack Traces | 1.5s run time
context = <streamflow.core.context.StreamFlowContext object at 0x7f5f76588550>
connector = <streamflow.deployment.connector.container.DockerConnector object at 0x7f5f74801450>
location = <streamflow.core.deployment.ExecutionLocation object at 0x7f5f74a47a70>

    @pytest.mark.asyncio
    async def test_download(
        context: StreamFlowContext, connector: Connector, location: ExecutionLocation
    ) -> None:
        """Test remote file download."""
        urls = [
            "https://raw.githubusercontent..../streamflow/master/LICENSE",
            "https://github..../refs/tags/0.1.6.zip",
        ]
        parent_dir = StreamFlowPath(
            tempfile.gettempdir() if location.local else "/tmp",
            utils.random_name(),
            context=context,
            location=location,
        )
        paths = [
            parent_dir / "LICENSE",
            parent_dir / "streamflow-0.1.6.zip",
        ]
        for i, url in enumerate(urls):
            path = None
            try:
                path = await remotepath.download(context, location, url, str(parent_dir))
                assert path == paths[i]
>               assert await path.exists()
E               assert False

tests/test_remotepath.py:92: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@LanderOtto LanderOtto force-pushed the fix/default-dir-listing branch from cf4c84b to 1babc73 Compare March 4, 2026 13:24
…ated according to the CWL standard. Specifically, when `loadListing` is missing in the `WorkflowInputParameter`, the default is `no_listing`.
@LanderOtto LanderOtto force-pushed the fix/default-dir-listing branch from 1babc73 to 3bb0a9b Compare March 4, 2026 13:52
@LanderOtto LanderOtto force-pushed the fix/default-dir-listing branch from a06a96c to e36a003 Compare March 4, 2026 16:58
@LanderOtto LanderOtto changed the title Avoid loadListing when not necessary Improved listing of Directory Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant