Conversation
alumi
left a comment
There was a problem hiding this comment.
Thank you for the PR!
I looked into this, and it seems the root cause is that we’re mixing half-open and closed intervals.
The intervals/find-overlap-intervals API expects 1-based closed intervals, but the current implementation of cumsum-chain does not fully convert the chain file format (0-based half-open) into 1-based closed intervals.
Strangely, changing (when-let [m (first (intervals/find-overlap-intervals data nil (inc pos) (+ pos 2)))] to (when-let [m (first (intervals/find-overlap-intervals data nil (inc pos) (inc pos)))] still passes the tests.
Since most part of cljam uses closed intervals, I think it's more straightforward to update cumsum-chain so that it produces closed intervals.
Does this approach sound reasonable to you?
diff --git src/varity/chain.clj src/varity/chain.clj
index d46d26e..ce563c3 100644
--- src/varity/chain.clj
+++ src/varity/chain.clj
@@ -80,10 +80,10 @@
(persistent! results)
(recur (->> (assoc d :t-start curr-t-start
:q-start curr-q-start
- :t-end (+ curr-t-start (:size d))
- :q-end (+ curr-q-start (:size d))
+ :t-end (dec (+ curr-t-start (:size d)))
+ :q-end (dec (+ curr-q-start (:size d)))
:start curr-t-start
- :end (+ curr-t-start (:size d)))
+ :end (dec (+ curr-t-start (:size d))))
(conj! results))
(first r)
(next r)
@@ -114,9 +114,8 @@
(defn- in-block?
[pos {:keys [data header] :as chain}]
(when (<= (inc (:t-start header)) pos (:t-end header))
- (when-let [m (first (intervals/find-overlap-intervals data nil (inc pos) (+ pos 2)))]
- (when (<= (:t-start m) pos (dec (+ (:t-start m) (:size m))))
- (assoc chain :in-block m)))))
+ (when-let [m (first (intervals/find-overlap-intervals data nil pos pos))]
+ (assoc chain :in-block m))))
(def ^:private normalize-chr (memoize normalize-chromosome-key))
@@ -138,4 +137,4 @@
(defn search-overlap-blocks
"Calculates a list of blocks that overlap the given interval."
[start end blocks-idx]
- (intervals/find-overlap-intervals blocks-idx nil (inc start) end))
+ (intervals/find-overlap-intervals blocks-idx nil start end))
diff --git test/varity/chain_test.clj test/varity/chain_test.clj
index cb08756..5e577be 100644
--- test/varity/chain_test.clj
+++ test/varity/chain_test.clj
@@ -68,7 +68,7 @@
15 17 [{:t-start 14, :t-end 16}]
6 7 [{:t-start 6, :t-end 10}]
4 4 [{:t-start 2, :t-end 5}]
- 5 5 []
+ 5 5 [{:t-start 2, :t-end 5}]
6 6 [{:t-start 6, :t-end 10}]
1 1 []
30 31 []))|
Thank you. Your point seems to be correct. |
fe6f7f2 to
d1d32e1
Compare
This PR fixes an issue where the search range of
in-block?was off by one. Similar to search-overlap-block, it needs to search one position further back.