Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
61b93da
initial
cschu Dec 15, 2024
6c80f0a
version
cschu Dec 15, 2024
2be12b4
fix: getitem implementation
cschu Dec 15, 2024
e143eb3
fix?: missing counts
cschu Dec 16, 2024
32ce2bd
fix: fixing AlignmentCounter __getitem__/__setitem__ methods
cschu Dec 16, 2024
ed0eaab
merge uniq/ambig seqcounters
cschu Dec 16, 2024
6c32072
fix: AlignmentCounter.has_ambig_counts(), CountManager.has_ambig_coun…
cschu Dec 16, 2024
28f84e8
fix: minor
cschu Dec 16, 2024
b344e9f
updating count annotation
cschu Dec 19, 2024
5ed8cfc
fixed import
cschu Dec 19, 2024
a92722a
added debug message
cschu Dec 19, 2024
515e71f
added debug message
cschu Dec 19, 2024
6efbca6
fixing empty length vector issue?
cschu Dec 19, 2024
ef34cde
fixing empty length vector issue?
cschu Dec 19, 2024
aa09c1a
added debug message
cschu Dec 19, 2024
d3649c2
fixing empty total counts?
cschu Dec 19, 2024
e3bee67
fixing total count issue?
cschu Dec 19, 2024
b466381
debug messaging
cschu Dec 20, 2024
8249d9f
fixed gene writing?
cschu Dec 20, 2024
7fb5c56
fixed gene writing?
cschu Dec 20, 2024
d2226ac
fixed gene writing?
cschu Dec 20, 2024
a1e54c2
fixed gene writing?
cschu Dec 20, 2024
f336493
fixed gene writing?
cschu Dec 21, 2024
cb9ea29
dump seqcounters for debugging
cschu Dec 21, 2024
6ae5c5a
dump seqcounters for debugging
cschu Dec 21, 2024
fbb6a0e
dump seqcounters for debugging
cschu Dec 21, 2024
a70f1a8
changed strand specific order
cschu Dec 21, 2024
bb13e14
debug log
cschu Dec 21, 2024
4579aff
debug log
cschu Dec 21, 2024
a5f7fb1
debug log
cschu Dec 21, 2024
68fa194
fixed gene writing?
cschu Dec 21, 2024
313a061
fixed gene writing?
cschu Dec 21, 2024
e917fe2
starting to replace CountManager
cschu Dec 21, 2024
37b624e
pleasing linters
cschu Dec 22, 2024
9e078ce
removed seq_counter.py
cschu Dec 22, 2024
7976a4d
removed Unique- and AmbiguousRegionCounter classes
cschu Dec 22, 2024
f754653
updated alignment_counter, removed alignment_counter2
cschu Dec 22, 2024
26fbb55
throwing out old code, splitting of regioncount_annotator
cschu Dec 22, 2024
bd1436f
removed count_manager
cschu Dec 22, 2024
b460ebd
removed count_manager references
cschu Dec 22, 2024
1789551
modified gene_count write behaviour in prep of ggroup annotation
cschu Dec 23, 2024
fdd7aaf
modified gene_count write behaviour in prep of ggroup annotation
cschu Dec 23, 2024
4af2f14
change gene group handling during annotation
cschu Dec 23, 2024
39ebd15
change gene group handling during annotation
cschu Dec 23, 2024
c0b4664
change gene group handling during annotation
cschu Dec 23, 2024
b935a24
added debug messaging
cschu Dec 24, 2024
69709f8
solved?
cschu Dec 24, 2024
62c0a52
disabling various logger calls
cschu Dec 24, 2024
2239d29
disabling various logger calls
cschu Dec 24, 2024
413684a
trying to update feature count processing
cschu Dec 25, 2024
1b5cf9c
trying to update feature count processing
cschu Dec 25, 2024
cb6ac1a
trying to update feature count processing
cschu Dec 25, 2024
8977c4d
trying to update feature count processing
cschu Dec 25, 2024
f96e1b3
trying to update feature count processing
cschu Dec 25, 2024
b4fb4d8
trying to update feature count processing
cschu Dec 25, 2024
3594dff
trying to update feature count processing
cschu Dec 25, 2024
0dbd4ae
trying to update feature count processing
cschu Dec 25, 2024
7b8e6bf
debug log
cschu Dec 25, 2024
563b402
trying to fix annotate2
cschu Dec 25, 2024
e9e1715
turn off annotate2 log
cschu Dec 25, 2024
07c6743
trying to update feature count processing
cschu Dec 25, 2024
da8e93b
trying to update feature count processing
cschu Dec 25, 2024
a94d9e7
trying to update feature count processing
cschu Dec 26, 2024
164cc6d
trying to update feature count processing
cschu Dec 26, 2024
0cdf49a
trying to update feature count processing
cschu Dec 26, 2024
3b2b5cb
trying to update feature count processing
cschu Dec 27, 2024
cb9934e
added category scaling comment
cschu Dec 27, 2024
4e119a4
linting + obsolete code removal
cschu Dec 27, 2024
90eb4b5
trying to optimise scaling factors, temp. disabled feature counts
cschu Dec 29, 2024
5b0286f
trying to optimise scaling factors, temp. disabled feature counts
cschu Dec 30, 2024
5baafe0
trying to optimise scaling factors, temp. disabled feature counts
cschu Dec 30, 2024
49e11e3
re-enable feature counts
cschu Dec 30, 2024
b39b4b1
trying to fix scaling factor issue
cschu Dec 30, 2024
b76f168
trying to fix scaling factor issue
cschu Dec 30, 2024
9920404
trying to implement count matrix
cschu Dec 31, 2024
945cf8e
trying to implement count matrix
cschu Dec 31, 2024
07f85f0
trying to implement count matrix
cschu Dec 31, 2024
dbb36da
trying to implement count matrix
cschu Dec 31, 2024
b281a16
trying to implement count matrix
cschu Dec 31, 2024
0b64813
trying to implement count matrix
cschu Dec 31, 2024
7f93e35
trying to implement count matrix
cschu Dec 31, 2024
b8bdf08
trying to implement count matrix
cschu Dec 31, 2024
85d1def
trying to implement count matrix
cschu Dec 31, 2024
e325127
trying to implement count matrix
cschu Dec 31, 2024
c1c93d1
trying to implement count matrix
cschu Dec 31, 2024
21fd963
reactivated feature output
cschu Jan 1, 2025
a47b87e
reactivated feature output
cschu Jan 1, 2025
7723314
reactivated feature output
cschu Jan 1, 2025
27e9e38
reactivated feature output
cschu Jan 1, 2025
2f938f6
reactivated feature output
cschu Jan 1, 2025
03a4a96
reactivated feature output
cschu Jan 1, 2025
74c2992
reactivated feature output
cschu Jan 1, 2025
26a94d8
reactivated feature output
cschu Jan 1, 2025
fb73a0a
reactivated feature output
cschu Jan 1, 2025
d1a2720
reactivated feature output
cschu Jan 1, 2025
d54e3f0
reactivated feature output
cschu Jan 1, 2025
44a773d
reactivated feature output
cschu Jan 1, 2025
6b06a52
reactivated feature output
cschu Jan 1, 2025
a0d1032
reactivated feature output
cschu Jan 1, 2025
8a56e33
pleasing linters, cleanup
cschu Jan 1, 2025
98c6cf4
trying to reduce memory footprint
cschu Jan 2, 2025
7fa8339
trying to reduce memory footprint
cschu Jan 2, 2025
3fde96f
trying to reduce memory footprint
cschu Jan 2, 2025
64845ec
trying to reduce memory footprint
cschu Jan 2, 2025
4745be9
trying to reduce memory footprint
cschu Jan 2, 2025
fa19a80
trying to reduce memory footprint
cschu Jan 2, 2025
54b88a5
trying to reduce memory footprint
cschu Jan 2, 2025
3d4a4f5
trying to reduce memory footprint
cschu Jan 2, 2025
2ffa0c5
trying category-wise processing
cschu Jan 2, 2025
688a081
trying category-wise processing
cschu Jan 2, 2025
64f8149
trying category-wise processing
cschu Jan 2, 2025
f7c16c3
truncated unannotated hash
cschu Jan 3, 2025
2fe0bdc
making adjustments for new db format
cschu Jan 7, 2025
9a0f888
making adjustments for new db format
cschu Jan 7, 2025
b51b48d
making adjustments for new db format
cschu Jan 7, 2025
0e1edae
making adjustments for new db format
cschu Jan 7, 2025
c065163
making adjustments for new db format
cschu Jan 7, 2025
21d4264
added count matrix state dump
cschu Jan 10, 2025
929e211
added count matrix state dump
cschu Jan 10, 2025
86f5f78
added count matrix state dump
cschu Jan 10, 2025
daac8f2
added count matrix state dump
cschu Jan 10, 2025
88fa6f5
added count matrix state dump
cschu Jan 10, 2025
12dbfb3
added count matrix state dump
cschu Jan 10, 2025
166aab4
added count matrix state dump
cschu Jan 10, 2025
a822d12
refactor group_gene_counts to be not in-place
cschu Jan 11, 2025
e3c86c1
refactor group_gene_counts to be not in-place
cschu Jan 11, 2025
c4a8fad
refactor group_gene_counts to be not in-place
cschu Jan 11, 2025
1f295bc
refactor group_gene_counts to be not in-place
cschu Jan 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM ubuntu:22.04

LABEL maintainer="cschu1981@gmail.com"
LABEL version="2.18.0"
LABEL version="2.19.0"
LABEL description="gffquant - functional profiling of metagenomic/transcriptomic wgs samples"


Expand Down
2 changes: 1 addition & 1 deletion gffquant/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from enum import Enum, auto, unique


__version__ = "2.18.0"
__version__ = "2.19.0"
__tool__ = "gffquant"


Expand Down
16 changes: 8 additions & 8 deletions gffquant/annotation/count_annotator.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import numpy as np

from ..counters.count_manager import CountManager

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -198,17 +199,18 @@ def __init__(self, strand_specific, report_scaling_factors=True):
CountAnnotator.__init__(self, strand_specific, report_scaling_factors=report_scaling_factors)

# pylint: disable=R0914,W0613
def annotate(self, refmgr, db, count_manager, gene_group_db=False):
def annotate(self, refmgr, db, count_manager: CountManager, gene_group_db=False):
"""
Annotate a set of region counts via db-lookup.
input:
- bam: bamr.BamFile to use as lookup table for reference names
- db: GffDatabaseManager holding functional annotation database
- count_manager: count_data
"""
for rid in set(count_manager.uniq_regioncounts).union(
count_manager.ambig_regioncounts
):
# for rid in set(count_manager.uniq_regioncounts).union(
# count_manager.ambig_regioncounts
# ):
for rid in count_manager.get_all_regions(region_counts=True):
ref = refmgr.get(rid[0] if isinstance(rid, tuple) else rid)[0]

for region in count_manager.get_regions(rid):
Expand Down Expand Up @@ -273,7 +275,7 @@ class GeneCountAnnotator(CountAnnotator):
def __init__(self, strand_specific, report_scaling_factors=True):
CountAnnotator.__init__(self, strand_specific, report_scaling_factors=report_scaling_factors)

def annotate(self, refmgr, db, count_manager, gene_group_db=False):
def annotate(self, refmgr, db, count_manager: CountManager, gene_group_db=False):
"""
Annotate a set of gene counts via db-iteration.
input:
Expand All @@ -286,9 +288,7 @@ def annotate(self, refmgr, db, count_manager, gene_group_db=False):
if self.strand_specific else None
)

for rid in set(count_manager.uniq_seqcounts).union(
count_manager.ambig_seqcounts
):
for rid in count_manager.get_all_regions():
ref, region_length = refmgr.get(rid[0] if isinstance(rid, tuple) else rid)

uniq_counts, ambig_counts = count_manager.get_counts(
Expand Down
2 changes: 1 addition & 1 deletion gffquant/counters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

"""module docstring"""

from .alignment_counter import AlignmentCounter
from .alignment_counter2 import AlignmentCounter
from .region_counter import RegionCounter
from .seq_counter import UniqueSeqCounter, AmbiguousSeqCounter
from .count_manager import CountManager
91 changes: 91 additions & 0 deletions gffquant/counters/alignment_counter2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
from collections import Counter

import numpy as np

from .. import DistributionMode


class AlignmentCounter:
COUNT_HEADER_ELEMENTS = ("raw", "lnorm", "scaled")
INITIAL_SIZE = 1000

@staticmethod
def normalise_counts(counts, feature_len, scaling_factor):
"""Returns raw, length-normalised, and scaled feature counts."""
normalised = counts / feature_len
scaled = normalised * scaling_factor
return counts, normalised, scaled

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for feature_len and scaling_factor.

The method should validate inputs to prevent division by zero or negative values.

     @staticmethod
     def normalise_counts(counts, feature_len, scaling_factor):
         """Returns raw, length-normalised, and scaled feature counts."""
+        if feature_len <= 0:
+            raise ValueError("Feature length must be positive")
+        if scaling_factor < 0:
+            raise ValueError("Scaling factor cannot be negative")
         normalised = counts / feature_len
         scaled = normalised * scaling_factor
         return counts, normalised, scaled
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def normalise_counts(counts, feature_len, scaling_factor):
"""Returns raw, length-normalised, and scaled feature counts."""
normalised = counts / feature_len
scaled = normalised * scaling_factor
return counts, normalised, scaled
@staticmethod
def normalise_counts(counts, feature_len, scaling_factor):
"""Returns raw, length-normalised, and scaled feature counts."""
if feature_len <= 0:
raise ValueError("Feature length must be positive")
if scaling_factor < 0:
raise ValueError("Scaling factor cannot be negative")
normalised = counts / feature_len
scaled = normalised * scaling_factor
return counts, normalised, scaled

def get_increment(self, n_aln, increment):
# 1overN = lavern. Maya <3
Copy link

@coderabbitai coderabbitai bot Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove non-descriptive comment in get_increment method.

The comment # 1overN = lavern. Maya <3 does not provide meaningful information about the code and includes personal content. It's best to remove it or replace it with a descriptive comment explaining the purpose of the method.

Apply this diff to remove the comment:

-        # 1overN = lavern. Maya <3

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ignore

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cschu, understood, I'll ignore it.

(_ _/)

( •_•)


✏️ Learnings added
Learnt from: cschu
PR: cschu/gff_quantifier#55
File: gffquant/counters/alignment_counter2.py:20-20
Timestamp: 2024-12-16T20:17:48.980Z
Learning: In `gffquant/counters/alignment_counter2.py`, comments like `# 1overN = lavern. Maya <3` are acceptable and should not be flagged as non-descriptive or requiring removal.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

return (increment / n_aln) if self.distribution_mode == DistributionMode.ONE_OVER_N else increment

def __init__(self, distribution_mode=DistributionMode.ONE_OVER_N, strand_specific=False):
self.distribution_mode = distribution_mode
self.strand_specific = strand_specific
self.unannotated_reads = 0

self.index = {}
self.counts = np.zeros(
(AlignmentCounter.INITIAL_SIZE, 2),
)
def dump(self, prefix, refmgr):
...
def get(self, key, default_val):
key_index = self.index.get(key)
if key_index is None:
return Counter()
return Counter({key: self.counts[key_index]})

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Utilize the default_val parameter in the get method.

The default_val parameter in get(self, key, default_val) is not used. Instead, the method returns a new empty Counter() when the key is not found. To adhere to the expected behavior of the get method, consider returning default_val when the key is missing.

Apply this diff to use default_val:

         if key_index is None:
-            return Counter()
+            return default_val
         return Counter({key: self.counts[key_index]})

Committable suggestion skipped: line range outside the PR's diff.

def setdefault(self, key, default_val):
...
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the setdefault method.

The setdefault method is currently a placeholder. Since it's part of the public interface, it should be implemented or removed.

Would you like me to help implement this method based on the typical dictionary-like behavior?


def has_ambig_counts(self):
return bool(self.counts[:, 1].sum() != 0)

def __iter__(self):
yield from self.index.keys()
def __getitem__(self, key):
key_index = self.index.get(key)
if key_index is None:
return 0.0
return self.counts[key_index]
def __setitem__(self, key, value):
key_index = self.index.get(key)
if key_index is not None:
self.counts[key_index] = value
else:
raise KeyError(f"{key=} not found.")

def update_counts(self, count_stream, increment=1, ambiguous_counts=False):
contributed_counts = 0
for hits, aln_count in count_stream:
hit = hits[0]
inc = (
(
self.get_increment(aln_count, increment),
increment,
)
)[aln_count == 1]
key = (
(
(hit.rid, hit.rev_strand),
hit.rid
)
)[self.strand_specific]

key_index = self.index.get(key)
if key_index is None:
nrows = self.counts.shape[0]
if len(self.index) == nrows:
self.counts = np.pad(
self.counts,
((0, AlignmentCounter.INITIAL_SIZE), (0, 0),),
)
# key_index = self.index.setdefault(key, len(self.index))
key_index = self.index[key] = len(self.index)
self.counts[key_index][int(ambiguous_counts)] += inc
contributed_counts += inc

return contributed_counts

122 changes: 84 additions & 38 deletions gffquant/counters/count_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from collections import Counter

from .. import DistributionMode
from .alignment_counter import AlignmentCounter
from .alignment_counter2 import AlignmentCounter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Migration from alignment_counter.py to alignment_counter2.py requires attention

The verification reveals that both the old alignment_counter.py and new alignment_counter2.py modules coexist in the codebase. While imports have been updated to use the new module, the presence of both files could lead to confusion and maintenance issues:

  • All imports now correctly reference alignment_counter2.py
  • The old AlignmentCounter class in alignment_counter.py still exists and inherits from Counter
  • The new AlignmentCounter class in alignment_counter2.py is a standalone class
  • Several classes inherit from the new AlignmentCounter: UniqueSeqCounter, AmbiguousSeqCounter, and RegionCounter

Consider:

  • Removing the deprecated alignment_counter.py if it's no longer needed
  • Adding a deprecation notice if backward compatibility is required
  • Documenting the migration in the codebase
🔗 Analysis chain

Verify the impact of the AlignmentCounter import change.

The change from alignment_counter to alignment_counter2 suggests a significant refactor. Let's verify that all dependent code has been updated accordingly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining imports of the old module
rg "from.*alignment_counter\s+import" --type py

# Search for any direct usage of alignment_counter
rg "alignment_counter\." --type py

Length of output: 86


Script:

#!/bin/bash
# Let's try a broader search for alignment_counter (case-insensitive)
rg -i "alignmentcounter" --type py

# Also search for the specific class name
rg "class AlignmentCounter" --type py

# Check if the old module file still exists
fd "alignment_counter.py"

Length of output: 1706

from .region_counter import RegionCounter


Expand Down Expand Up @@ -50,32 +50,47 @@ def __init__(
self.increments = [1.0, 1.0]
self.increments_auto_detect = [1.0, self.paired_end_count / 2.0]

self.uniq_seqcounts, self.ambig_seqcounts = None, None
self.uniq_regioncounts, self.ambig_regioncounts = None, None
# self.uniq_seqcounts, self.ambig_seqcounts = None, None
# self.uniq_regioncounts, self.ambig_regioncounts = None, None
self.seqcounts, self.regioncounts = None, None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove commented out code and document the architectural change.

The consolidation of separate counters into single instances is a good architectural improvement. However:

  1. The commented-out code should be removed as it can be retrieved from version control if needed.
  2. Consider adding a docstring explaining the new counter structure and its advantages.
-        # self.uniq_seqcounts, self.ambig_seqcounts = None, None
-        # self.uniq_regioncounts, self.ambig_regioncounts = None, None
+        """
+        Initialize counters that handle both unique and ambiguous counts.
+        This consolidation improves maintainability and reduces code duplication.
+        """
         self.seqcounts, self.regioncounts = None, None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# self.uniq_seqcounts, self.ambig_seqcounts = None, None
# self.uniq_regioncounts, self.ambig_regioncounts = None, None
self.seqcounts, self.regioncounts = None, None
"""
Initialize counters that handle both unique and ambiguous counts.
This consolidation improves maintainability and reduces code duplication.
"""
self.seqcounts, self.regioncounts = None, None


if region_counts:
self.uniq_regioncounts = RegionCounter(strand_specific=strand_specific)
self.ambig_regioncounts = RegionCounter(
# self.uniq_regioncounts = RegionCounter(strand_specific=strand_specific)
# self.ambig_regioncounts = RegionCounter(
# strand_specific=strand_specific,
# distribution_mode=distribution_mode,
# )
self.regioncounts = RegionCounter(
strand_specific=strand_specific,
distribution_mode=distribution_mode,
)

else:
self.uniq_seqcounts = AlignmentCounter(strand_specific=strand_specific)
self.ambig_seqcounts = AlignmentCounter(
# self.uniq_seqcounts = AlignmentCounter(strand_specific=strand_specific)
# self.ambig_seqcounts = AlignmentCounter(
# strand_specific=strand_specific,
# distribution_mode=distribution_mode
# )
self.seqcounts = AlignmentCounter(
strand_specific=strand_specific,
distribution_mode=distribution_mode
distribution_mode=distribution_mode,
)

def has_ambig_counts(self):
return self.ambig_regioncounts or self.ambig_seqcounts
return any(
(
self.seqcounts and self.seqcounts.has_ambig_counts(),
self.regioncounts and self.regioncounts.has_ambig_counts(),
)
)
# return self.ambig_regioncounts or self.ambig_seqcounts

def update_counts(self, count_stream, ambiguous_counts=False, pair=False, pe_library=None):
seq_counter, region_counter = (
(self.uniq_seqcounts, self.uniq_regioncounts)
if not ambiguous_counts
else (self.ambig_seqcounts, self.ambig_regioncounts)
)
# seq_counter, region_counter = (
# (self.uniq_seqcounts, self.uniq_regioncounts)
# if not ambiguous_counts
# else (self.ambig_seqcounts, self.ambig_regioncounts)
# )

if pe_library is not None:
# this is the case when the alignment has a read group tag
Expand All @@ -91,40 +106,51 @@ def update_counts(self, count_stream, ambiguous_counts=False, pair=False, pe_lib
increment = self.increments[pair]

contributed_counts = 0
if seq_counter is not None:
contributed_counts = seq_counter.update_counts(count_stream, increment=increment)
elif region_counter is not None:
contributed_counts = region_counter.update_counts(count_stream, increment=increment)
if self.seqcounts is not None:
contributed_counts = self.seqcounts.update_counts(count_stream, increment=increment, ambiguous_counts=ambiguous_counts,)
elif self.regioncounts is not None:
contributed_counts = self.regioncounts.update_counts(count_stream, increment=increment, ambiguous_counts=ambiguous_counts,)
# if seq_counter is not None:
# contributed_counts = seq_counter.update_counts(count_stream, increment=increment)
# elif region_counter is not None:
# contributed_counts = region_counter.update_counts(count_stream, increment=increment)

return contributed_counts

def dump_raw_counters(self, prefix, refmgr):
if self.uniq_seqcounts is not None:
self.uniq_seqcounts.dump(prefix, refmgr)
if self.ambig_seqcounts is not None:
self.ambig_seqcounts.dump(prefix, refmgr)
if self.uniq_regioncounts is not None:
self.uniq_regioncounts.dump(prefix, refmgr)
if self.ambig_regioncounts is not None:
self.ambig_regioncounts.dump(prefix, refmgr)
# if self.uniq_seqcounts is not None:
# self.uniq_seqcounts.dump(prefix, refmgr)
# if self.ambig_seqcounts is not None:
# self.ambig_seqcounts.dump(prefix, refmgr)
# if self.uniq_regioncounts is not None:
# self.uniq_regioncounts.dump(prefix, refmgr)
# if self.ambig_regioncounts is not None:
# self.ambig_regioncounts.dump(prefix, refmgr)
...

def get_unannotated_reads(self):
unannotated_reads = 0

if self.uniq_regioncounts is not None:
unannotated_reads += self.uniq_regioncounts.unannotated_reads
if self.ambig_regioncounts is not None:
unannotated_reads += self.ambig_regioncounts.unannotated_reads
if self.uniq_seqcounts is not None:
unannotated_reads += self.uniq_seqcounts.unannotated_reads
if self.ambig_seqcounts is not None:
unannotated_reads += self.ambig_seqcounts.unannotated_reads
# if self.uniq_regioncounts is not None:
# unannotated_reads += self.uniq_regioncounts.unannotated_reads
# if self.ambig_regioncounts is not None:
# unannotated_reads += self.ambig_regioncounts.unannotated_reads
# if self.uniq_seqcounts is not None:
# unannotated_reads += self.uniq_seqcounts.unannotated_reads
# if self.ambig_seqcounts is not None:
# unannotated_reads += self.ambig_seqcounts.unannotated_reads
if self.regioncounts is not None:
unannotated_reads += self.regioncounts
if self.seqcounts is not None:
unannotated_reads += self.seqcounts
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix missing attribute access in get_unannotated_reads.

The method is missing the .unannotated_reads attribute access which could lead to runtime errors.

         if self.regioncounts is not None:
-            unannotated_reads += self.regioncounts
+            unannotated_reads += self.regioncounts.unannotated_reads
         if self.seqcounts is not None:
-            unannotated_reads += self.seqcounts
+            unannotated_reads += self.seqcounts.unannotated_reads
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.seqcounts is not None:
unannotated_reads += self.seqcounts
if self.seqcounts is not None:
unannotated_reads += self.seqcounts.unannotated_reads


return unannotated_reads

def get_counts(self, seqid, region_counts=False, strand_specific=False):
if region_counts:
raise NotImplementedError()
rid, seqid = seqid[0], seqid[1:]

uniq_counter = self.uniq_regioncounts.get(rid, Counter())
ambig_counter = self.ambig_regioncounts.get(rid, Counter())

Expand All @@ -135,9 +161,11 @@ def get_counts(self, seqid, region_counts=False, strand_specific=False):
return [uniq_counter[seqid]], [ambig_counter[seqid]]

else:
uniq_counter, ambig_counter = self.uniq_seqcounts, self.ambig_seqcounts
# uniq_counter, ambig_counter = self.uniq_seqcounts, self.ambig_seqcounts


if strand_specific:
raise NotImplementedError()
uniq_counts, ambig_counts = [0.0, 0.0], [0.0, 0.0]
uniq_counts[seqid[1]] = uniq_counter[seqid]
ambig_counts[seqid[1]] = ambig_counter[seqid]
Expand All @@ -152,11 +180,29 @@ def get_counts(self, seqid, region_counts=False, strand_specific=False):
# ambig_counter[(rid, CountManager.MINUS_STRAND)],
# ]
else:
uniq_counts, ambig_counts = [uniq_counter[seqid]], [ambig_counter[seqid]]
# uniq_counts, ambig_counts = [uniq_counter[seqid]], [ambig_counter[seqid]]
uniq_counts, ambig_counts = [self.seqcounts[seqid][0]], [self.seqcounts[seqid][1]]

return uniq_counts, ambig_counts

def get_regions(self, rid):
return set(self.uniq_regioncounts.get(rid, set())).union(
self.ambig_regioncounts.get(rid, set())
# return set(self.uniq_regioncounts.get(rid, set())).union(
# self.ambig_regioncounts.get(rid, set())
# )
return set(self.uniq_regioncounts.get(rid, Counter())).union(
self.ambig_regioncounts.get(rid, Counter())
)

def get_all_regions(self, region_counts=False):
# uniq_counts, ambig_counts = (
# (self.uniq_seqcounts, self.ambig_seqcounts,),
# (self.uniq_regioncounts, self.ambig_regioncounts,),
# )[region_counts]
# yield from set(uniq_counts).union(ambig_counts)
counts = (
self.seqcounts,
self.regioncounts,
)[region_counts]

yield from counts

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential None reference in get_all_regions.

The method should handle the case where the selected counter is None.

         counts = (
             self.seqcounts,
             self.regioncounts,
         )[region_counts]
 
+        if counts is None:
+            return
         yield from counts
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_all_regions(self, region_counts=False):
# uniq_counts, ambig_counts = (
# (self.uniq_seqcounts, self.ambig_seqcounts,),
# (self.uniq_regioncounts, self.ambig_regioncounts,),
# )[region_counts]
# yield from set(uniq_counts).union(ambig_counts)
counts = (
self.seqcounts,
self.regioncounts,
)[region_counts]
yield from counts
def get_all_regions(self, region_counts=False):
# uniq_counts, ambig_counts = (
# (self.uniq_seqcounts, self.ambig_seqcounts,),
# (self.uniq_regioncounts, self.ambig_regioncounts,),
# )[region_counts]
# yield from set(uniq_counts).union(ambig_counts)
counts = (
self.seqcounts,
self.regioncounts,
)[region_counts]
if counts is None:
return
yield from counts

2 changes: 1 addition & 1 deletion gffquant/counters/region_counter.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from collections import Counter

from .. import DistributionMode
from .alignment_counter import AlignmentCounter
from .alignment_counter2 import AlignmentCounter


class RegionCounter(AlignmentCounter):
Expand Down
2 changes: 1 addition & 1 deletion gffquant/counters/seq_counter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
""" module docstring """

from .. import DistributionMode
from .alignment_counter import AlignmentCounter
from .alignment_counter2 import AlignmentCounter


class UniqueSeqCounter(AlignmentCounter):
Expand Down
Loading