Unethical and unfair comparison with fastreeR #3

gkanogiannis · 2025-05-21T11:00:23Z

I am the author of fastreeR one of the tools you compared against in your recent publication in GigaScience on your software VCF2DIS.

Your paper states that all tools were tested on a machine with 512 GB of RAM. However, in the benchmarking code you provided (see this link and this link, you explicitly limit the Java Virtual Machine used by fastreeR to 60 GB of memory via:

options(java.parameters="-Xmx60G")

This artificial memory constraint significantly impacts fastreeR’s performance, especially for large datasets where memory usage is a critical factor. By doing this, your benchmarking setup deprives fastreeR of the full resources available and creates an unfair and misleading performance comparison.

You then conclude that VCF2DIS is superior — without disclosing that fastreeR was intentionally memory-throttled, despite the system’s capacity. This is a serious breach of fair benchmarking practices and undermines the scientific validity of your results.

I have reported this concern to GigaScience and I am expecting their action.

I ask that you:

Publicly acknowledge this discrepancy.

Retract or revise the performance claims involving fastreeR.

Ensure future comparisons are conducted transparently and fairly.

Scientific comparisons must be honest, especially when they influence software adoption in the community.

— Anestis Gkanogiannis
Developer of fastreeR

The text was updated successfully, but these errors were encountered:

hewm2008 · 2025-05-22T01:50:45Z

Your request is totally unreasonable.

response_for_FastTreeR.docx

Please conduct a thorough test. All our operations are transparent and fair, and so is the code. Below is the detailed process of our inspection. Since a 64GB configuration will trigger errors, and even if there are no errors, the performance will not be fast.

hewm2008 · 2025-05-22T02:03:36Z

Your requirements are completely unreasonable. All our tests are open, fair, and carried out with integrity. However, you haven’t carefully tested the data and just made baseless remarks upon seeing the mention of 60GB. In fact, your program simply won’t run with a 64GB configuration. (Please refer to the screenshots in the above document for details.)

My suggestion is that your program should conduct a comprehensive comparison of the tests on 2504 samples with 88M loci and then provide the results.

gkanogiannis · 2025-05-22T06:21:07Z

Thank you for your response. However, I must clarify that the error you reported stems from your use of a very outdated version of fastreeR and/or Bioconductor.

The issue you encountered—related to memory usage and 64-bit integer limits in the JVM—is a well-known problem that was identified and resolved in April 2024 in the Java backend (BioInfoJava-Utils). This fix was included in fastreeR released as part of Bioconductor 3.19 on 1 May 2024, well before your benchmarking experiments in October 2024. For Bioconductor release shcedule see here.

See the relevant GitHub commit here gkanogiannis/BioInfoJava-Utils@25b60e0, and the screenshot bellow, showing when the fix was pushed (25 April 2024).

It is the responsibility of anyone conducting software comparisons to ensure they are testing the most up-to-date and stable version of the tools involved, on the time of their tests. Using an outdated version and then capping memory to 60 GB introduces artificial bottlenecks that were not existed in the release on the time of your experiments.

Furthermore, I would have gladly provided support or clarification had you contacted me before publishing these results. This would have avoided incorrect conclusions about fastreeR’s capabilities and ensured a fair and reproducible comparison.

To reiterate: fastreeR and its backend BioInfoJava-Utils is fully capable of running with arbitrarily large JVM heap memory, and performs robustly on large-scale datasets when configured correctly.

hewm2008 · 2025-05-22T07:16:40Z

First of all, I’d like to clarify that when we conducted the tests, we used the latest version available at that time, which was released in 2024, and this issue already existed back then.

Moreover, even with the latest version you’ve released（2025-04-15 22:24:00 UTC; biocbuild, this problem still persists. As shown in the following screenshots, we’ve used the latest version and also met the Java - related（openjdk version "11.0.20.1"） settings you required. （see the word for detailed also bellow ）

However, regardless, our program has extremely low memory requirements, consuming less than 1GB, and it also runs faster than yours （threads = 1).（see the word for detailed ）

Additionally, our data and code are publicly available. If you have any doubts, I suggest you conduct your own tests first. Therefore, my conclusion is that our operations are open, fair, and just. You can also have other third - party personnel conduct an evaluation.

So, I have to decline your requirements.

latest version.pdf

gkanogiannis · 2025-05-22T07:42:45Z

grep Packaged ~/01.Soft/R-4.3.3/library/fastreeR/DESCRIPTION

This only shows the version of the package you have downloaded in that specific folder.
It does not show the actual version it is loaded by R.

Please run the follwoing code and show me the result:

unloadNamespace("fastreeR")
library("fastreeR")
packageVersion("fastreeR")

hewm2008 · 2025-05-22T07:49:10Z

The same error occurs, so it’s not a version problem. You claim that higher memory usage leads to faster speed, but in our program, even with lower memory usage, it still runs faster than yours. Refer to the content in the Word document. There were no errors on the M1 data .

latest versionB.pdf

gkanogiannis · 2025-05-22T08:08:26Z

There is a serious problem with the java JRE/JDK you are using.

As seen in my previous comment, fastreeR prints a value for CHUNK_SIZE and it is capped at Integer.MAX_VALUE. This value in Java is 2,147,483,647 ref, but for some reason the java you are using overflows this value to 2,147,483,648

Please upgrade your Java version and run again.

hewm2008 · 2025-05-22T08:40:03Z

A: Currently, you need to modify the program to make it compatible with more machines and models. We are using the Java that comes pre - installed with the system, with the version “openjdk version ‘11.0.20.1’ 2023 - 08 - 24 LTS”, which is already a fairly new version. Previously, you mentioned that the program requires Java 1.8 or higher, and our version fully meets this requirement. From the users’ perspective, the program should be compatible with the system - provided Java rather than asking users to upgrade it.

B: Even though your program can run with 64GB of memory, our M1 loci of 91 individualscan handle up to 256GB of data, and the test speed is still very fast(see word). So, I don’t want to keep arguing about this issue. Based on the above, we can conclude that we haven’t artificially restricted the memory, and the whole process is open, fair, and just.

C: In fact, you can conduct some tests first. Let’s have a private conversation. It’s not very reasonable for you to directly ask Giga to retract the manuscript. We’ve also responded to Giga’s editor - in - chief in a friendly manner. Our article doesn’t overly criticize your software; it just describes our test results truthfully. This might lead Giga to misunderstand us. I believe they haven’t replied to you yet, perhaps because they’re inviting a third - party for evaluation. You can also conduct your own evaluation, and our evaluations are all open and fair.

D: I’ve also noticed that you’ve been continuously updating and maintaining the software in the community, making great contributions. We can see that you’ve made contributions to the industry, and we also recognize that your software is very excellent.

E: Finally, I have a suggestion. In fact, the lower the memory usage of a program, the better it is. Currently, just the data of M5 loci for 91 individuals already requires 64GB of memory. When dealing with the 88M loci data of 2050 individuals, even if the program doesn’t report an error, it will be very difficult to run.

gkanogiannis · 2025-05-22T08:44:55Z

The JRE is broken in your system, the value 2147483648 is impossible to be printed there!

You are keep repeating about the memory requirements of your software, but this is not the issue.

I do not care about your software, but you compared it with mine, so you need to have done the comparison correctly and when you faced issues you should have notified me.

Please update your JRE to another version, and let me know about the results of fastreeR.

hewm2008 · 2025-05-22T08:47:39Z

Now it’s necessary for you to test our data on your own. Our data is publicly available, and so is the testing code. As I mentioned earlier, the tests can pass on the M1. Your program isn’t faster than ours either.

gkanogiannis · 2025-05-22T08:54:43Z

You are not listening what I am saying.
You keep reapeating about your software.

There is a serious issue with the Java on your system that maked fastreeR to break.
The value 2147483648 can never be printed at that screenshot you showed.

Please use another JRE.

hewm2008 · 2025-05-22T08:58:43Z

I’m not familiar with Java here, but I’m using the Java that comes with the system. As mentioned above, it’s the “openjdk version ‘11.0.20.1’ 2023 - 08 - 24 LTS”.
OpenJDK Runtime Environment (Red_Hat - 11.0.20.1.1 - 2) (build 11.0.20.1+1 - LTS)
OpenJDK 64 - Bit Server VM (Red_Hat - 11.0.20.1.1 - 2) (build 11.0.20.1+1 - LTS, mixed mode, sharing).

The Java version in my system is already new enough. I dare not update it for fear of affecting other things.

Now it’s necessary for you to test our data on your own. Our data is publicly available, and so is the testing code. As I mentioned earlier, the tests can pass on the M1. Your program isn’t faster than ours either.

gkanogiannis · 2025-05-22T09:06:22Z

Thanks for confirming that you're using a Red Hat JRE.

Red Hat's OpenJDK implementations (especially those from java-11-openjdk and java-17-openjdk) are known to round memory values differently and may exhibit small differences in how Runtime.getRuntime().maxMemory() is interpreted.

In standard environments (e.g., Oracle JDK, Eclipse Temurin), the CHUNK_SIZE logic in my backend should always result in a value ≤ Integer.MAX_VALUE, because this line enforces it:

if (CHUNK_SIZE > Integer.MAX_VALUE) CHUNK_SIZE = Integer.MAX_VALUE;

However, in your case, you're seeing a printed value of 2147483648, which is one byte larger than Integer.MAX_VALUE (2,147,483,647) — a value that cannot be represented by an int and would fail if passed to ByteBuffer.allocate() or new byte[], for example.

This suggests that your JVM may be padding CHUNK_SIZE or doing internal alignment, causing it to exceed the integer limit even after the safeguard.

Neveertheless, since you seen unfamiliar with Java, I think it is better to wait for the intependent evaluator of Giga and I hope he will be using a non Red Hat JRE.

hewm2008 · 2025-05-22T09:15:54Z

B: Even though your program can run with 64GB of memory, our M1 loci of 91 individualscan handle up to 256GB of data, and the test speed is still very fast(see word). So, I don’t want to keep arguing about this issue. Based on the above, we can conclude that we haven’t artificially restricted the memory, and the whole process is open, fair, and just.

Our data for M1 data can pass the test, and the result records are as follows. You can also quickly obtain these values through self - testing instead of just waiting for their response. You can also conduct tests using M5 data. In fact, the speed is linearly correlated with the number of loci. Since our program shows high speed for M1 data , it indicates that it will also be fast for other loci.

gkanogiannis · 2025-05-22T09:18:05Z

I think I am talking to a wall...

You are testing fastreeR under a broken JRE. Replace it with a non Red Hat (Oracle for example) and try fastreeR again.

hewm2008 · 2025-05-22T09:30:23Z

I’m also trying to make it clear that there’s no problem with our test data.
we can conclude that we haven’t artificially restricted the memory, and the whole process is open, fair, and just.

At least in terms of time and memory usage (when thread == 1), it performs well.
You can conduct your own evaluation. I’m about to get off work right away. I’ll check tomorrow when I’m free to see if I need to reply to you.

gkanogiannis · 2025-05-22T09:32:19Z

There is no problem with your test data nor VCF2Dis.

The problem is with the Java you are using to test fastreeR.

Please change the JRE of your test system and try again.

hewm2008 · 2025-05-23T05:29:10Z

Dear author of FastTreeR,

A. Here, I’d like to present to you once again the information we sent back to the magazine for your reference (please refer to the above/below Word document). I sincerely hope this content will be helpful to you.
response_for_FastTreeR.docx

B. After the above discussion, I personally believe that our evaluation was not carried out under unfair conditions. Generally speaking, in the field of programming, the less memory a program occupies, the better. I truly hope that our exchange can assist you in further optimizing your software.

C. If you have no further questions about the above content, please feel free to leave a message . If not, I will close this discussion in one day. However, please rest assured that if you have any new thoughts or questions during this period, you can contact me at any time. I’ll be more than willing to continue serving you.

gkanogiannis · 2025-05-23T05:34:38Z

Thank you for your message.

As I have already communicated to the editorial team at GigaScience, the issue we are discussing is currently under independent review by a member of their editorial board. I trust that the outcome of that process will provide a more objective assessment of how fastreeR was configured and tested in your publication.

Given that this discussion is directly tied to claims made in your peer-reviewed article, I respectfully suggest that it remain open until the editorial review is complete.
Closing it prematurely could give the impression that you are attempting to suppress an open and transparent technical discussion.

To reiterate: my concern is not about the value or efficiency of your tool, but rather about the way my software was configured and evaluated, and the absence of communication with me prior to publication.

I will wait for the outcome of the journal’s review and remain available to continue this discussion in good faith.

gkanogiannis mentioned this issue May 22, 2025

Not the correct place for this comment gkanogiannis/fastreeR#4

Open

Unethical and unfair comparison with fastreeR #3

Unethical and unfair comparison with fastreeR #3

Comments

gkanogiannis commented May 21, 2025

hewm2008 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hewm2008 commented May 22, 2025

Uh oh!

gkanogiannis commented May 22, 2025

Uh oh!

hewm2008 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkanogiannis commented May 22, 2025

Uh oh!

hewm2008 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkanogiannis commented May 22, 2025

Uh oh!

hewm2008 commented May 22, 2025

Uh oh!

gkanogiannis commented May 22, 2025

Uh oh!

hewm2008 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkanogiannis commented May 22, 2025

Uh oh!

hewm2008 commented May 22, 2025

Uh oh!

gkanogiannis commented May 22, 2025

Uh oh!

hewm2008 commented May 22, 2025

Uh oh!

gkanogiannis commented May 22, 2025

Uh oh!

hewm2008 commented May 22, 2025

Uh oh!

gkanogiannis commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hewm2008 commented May 23, 2025

Uh oh!

gkanogiannis commented May 23, 2025

Uh oh!

hewm2008 commented May 22, 2025 •

edited

Loading

hewm2008 commented May 22, 2025 •

edited

Loading

hewm2008 commented May 22, 2025 •

edited

Loading

hewm2008 commented May 22, 2025 •

edited

Loading

gkanogiannis commented May 22, 2025 •

edited

Loading