Skip to content

Fix/off by one in search chains#140

Merged
alumi merged 2 commits intomasterfrom
fix/off-by-one-in-search-chains
Mar 12, 2026
Merged

Fix/off by one in search chains#140
alumi merged 2 commits intomasterfrom
fix/off-by-one-in-search-chains

Conversation

@niyarin
Copy link
Copy Markdown
Contributor

@niyarin niyarin commented Mar 4, 2026

This PR fixes an issue where the search range of in-block? was off by one. Similar to search-overlap-block, it needs to search one position further back.

@niyarin niyarin requested a review from alumi March 5, 2026 04:10
Copy link
Copy Markdown
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I looked into this, and it seems the root cause is that we’re mixing half-open and closed intervals.
The intervals/find-overlap-intervals API expects 1-based closed intervals, but the current implementation of cumsum-chain does not fully convert the chain file format (0-based half-open) into 1-based closed intervals.
Strangely, changing (when-let [m (first (intervals/find-overlap-intervals data nil (inc pos) (+ pos 2)))] to (when-let [m (first (intervals/find-overlap-intervals data nil (inc pos) (inc pos)))] still passes the tests.

Since most part of cljam uses closed intervals, I think it's more straightforward to update cumsum-chain so that it produces closed intervals.
Does this approach sound reasonable to you?

diff --git src/varity/chain.clj src/varity/chain.clj
index d46d26e..ce563c3 100644
--- src/varity/chain.clj
+++ src/varity/chain.clj
@@ -80,10 +80,10 @@
       (persistent! results)
       (recur (->> (assoc d :t-start curr-t-start
                          :q-start curr-q-start
-                         :t-end (+ curr-t-start (:size d))
-                         :q-end (+ curr-q-start (:size d))
+                         :t-end (dec (+ curr-t-start (:size d)))
+                         :q-end (dec (+ curr-q-start (:size d)))
                          :start curr-t-start
-                         :end (+ curr-t-start (:size d)))
+                         :end (dec (+ curr-t-start (:size d))))
                   (conj! results))
              (first r)
              (next r)
@@ -114,9 +114,8 @@
 (defn- in-block?
   [pos {:keys [data header] :as chain}]
   (when (<= (inc (:t-start header)) pos (:t-end header))
-    (when-let [m (first (intervals/find-overlap-intervals data nil (inc pos) (+ pos 2)))]
-      (when (<= (:t-start m) pos (dec (+ (:t-start m) (:size m))))
-        (assoc chain :in-block m)))))
+    (when-let [m (first (intervals/find-overlap-intervals data nil pos pos))]
+      (assoc chain :in-block m))))

 (def ^:private normalize-chr (memoize normalize-chromosome-key))

@@ -138,4 +137,4 @@
 (defn search-overlap-blocks
   "Calculates a list of blocks that overlap the given interval."
   [start end blocks-idx]
-  (intervals/find-overlap-intervals blocks-idx nil (inc start) end))
+  (intervals/find-overlap-intervals blocks-idx nil start end))
diff --git test/varity/chain_test.clj test/varity/chain_test.clj
index cb08756..5e577be 100644
--- test/varity/chain_test.clj
+++ test/varity/chain_test.clj
@@ -68,7 +68,7 @@
     15 17 [{:t-start 14, :t-end 16}]
     6 7 [{:t-start 6, :t-end 10}]
     4 4 [{:t-start 2, :t-end 5}]
-    5 5 []
+    5 5 [{:t-start 2, :t-end 5}]
     6 6 [{:t-start 6, :t-end 10}]
     1 1 []
     30 31 []))

@niyarin
Copy link
Copy Markdown
Contributor Author

niyarin commented Mar 5, 2026

Thank you. Your point seems to be correct.
The test you proposed also appears to be correct; this is consistent with the behavior before introducing nc-list.
I will change the way interval endpoints are handled to closed intervals.

@alumi alumi self-requested a review March 10, 2026 03:06
Copy link
Copy Markdown
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niyarin Thank you for updating the PR! Looks good👍
Would you squash the last commit into the second one?

@niyarin niyarin force-pushed the fix/off-by-one-in-search-chains branch from fe6f7f2 to d1d32e1 Compare March 12, 2026 05:34
@alumi alumi added the bug label Mar 12, 2026
@alumi alumi merged commit 14ffa0a into master Mar 12, 2026
63 of 92 checks passed
@alumi alumi deleted the fix/off-by-one-in-search-chains branch March 12, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants