Hello, I am running into the error below trying to run the TPM example provided in the docstrings.
ValueError: The output generated by `func` have different column names than the ones provided by `get_feature_names_out`.
Got output with columns names: ['x0', 'x1', 'x2', 'x3', 'x4'] and
`get_feature_names_out` returned: ['Gene_1', 'Gene_2', 'Gene_3', 'Gene_4', 'Gene_5'].
The column names can be overridden by setting `set_output(transform='pandas')` or
`set_output(transform='polars')` such that the column names are set to the names provided by `get_feature_names_out`.
This is the code I am running from the example
from rnanorm.datasets import load_toy_data
from rnanorm import TPM
dataset = load_toy_data()
dataset.exp
# Gene_1 Gene_2 Gene_3 Gene_4 Gene_5
#Sample_1 200 300 500 2000 7000
#Sample_2 400 600 1000 4000 14000
#Sample_3 200 300 500 2000 17000
#Sample_4 200 300 500 2000 2000
tpm = TPM(gtf=dataset.gtf_path).set_output(transform="pandas")
tpm.fit_transform(dataset.exp)
I also tried running the example code from this issue #20 which produces the same error message
from rnanorm import TPM
import pandas as pd
df = pd.DataFrame([[200, 400, 400], [300, 300, 800]], index=["Sample1", "Sample2"], columns=["Gene1", "Gene2", "Gene3"])
gene_lengths = pd.Series([100, 100, 200], index=["Gene1", "Gene2", "Gene3"])
df
# Gene1 Gene2 Gene3
# Sample1 200 400 400
# Sample2 300 300 800
# In [6]: gene_lengths
# Gene1 100
# Gene2 100
# Gene3 200
# dtype: int64
TPM(gene_lengths=gene_lengths).set_output(transform="pandas").fit_transform(df)
# Out[7]:
# Gene1 Gene2 Gene3
# Sample1 250000.0 500000.0 250000.0
# Sample2 300000.0 300000.0 400000.0
The error happens when running tpm.fit_transform(dataset.exp) this is on a new conda environment with python 3.13, pandas 2.3.2 rnanorm 2.2.0, sklearn 1.7.2, and as you can see, even having set_output(transform="pandas") the error occurs.
Thank you