Skip to content

Comments

functions to control nonce space and timeouts for all chip topologies#420

Open
adammwest wants to merge 23 commits intobitaxeorg:masterfrom
adammwest:fullscan_fix
Open

functions to control nonce space and timeouts for all chip topologies#420
adammwest wants to merge 23 commits intobitaxeorg:masterfrom
adammwest:fullscan_fix

Conversation

@adammwest
Copy link
Contributor

@adammwest adammwest commented Oct 20, 2024

What
relavent Issues/Prs

Goals

  • to set the nonce space to 100% for configurations that allow
  • support extended jobs for SV2

Search Space
there is the nonce space 32 bits
there is the version space 16 bits (BIP320)
there is ntime space ~12 bits (mpt, current time + 7200)
there is extranonce2 space ~64+bits

General mining info for ASICS
Typically ASICs will mine nonces first, but now they are so fast they have to mine more things,
in terms of cheapness ntime is good and versions are good due to ASICBOOST.

The general hierarchy is
hash boards -> chips -> cores

The older chips (Bitaxe Max), were supplied midstates
BM1397
the bitcoin header is to big to fit in 1 SHA compression so its split into 2

block0 block1
[midstate0][0,2,3,....]
[midstate1][0,2,3,....]
[midstate2][0,2,3,....]
[midstate3][0,2,3,....]
the purpose is share the message scheduler of SHA1 (first sha second block)
but this budens the controller to send work very fast to the chips

anyway there were 4 midstates and 672 small cores
so that means 168 cores on the chip each core does a independant search on the nonce range
core0 [0,2,3,....]
core1 [0,2,3,....] * offset1
...
core167 [0,2,3,....] * offset167

but how do we divide 168 cores into 2^32, we cant so this means there is a hole in the search.
168/256 = ~62.5% of the nonce space is covered due to the HW cores

in more recent BITMAIN chips
BM1370 we have version rolling

with version rolling
we manipulate the nversion field so we have more space to search, we have 2^16 available values
the BM1370 has 128 cores and 2040 small cores, there are 8 cores missing, 4 cores are missing on core 15 and 4 on core 127
these are the version generators

now our pipeline is more advanced
we generate 16 versions and supply then to each core
core 0 [0,1,2,3,....]
...
core 127 [0,1,2,3,....] * offset128
then we repeat, until 2^16 have run out or 4096 iterations.

Finally these ideas easily extend to multiple chips
when we have multiple chips we must also divide the nonce/version space per chip, then we can parellize.

CHIP 0
core n CHIP0 offset + [0,1,2,3,....] * n

CHIP 1
core n CHIP1 offset + [0,1,2,3,....] * n

In a simple system we have the following equation
time = size / freq
the PR tries to figure out the above timeout equation for different configurations

Finally, the nonce and version space is configurable as well.
currently in ESP-miner the nonce range is roughly 1/64

To control nonces we need hash counting number register
The hcn is a nonce limiter, and after it completes and restarts a new version is generated

figuring out the Maximum HCN is a solution to a dynamic equation, this enables setting the whole nonce space for different frequencies, chip length chains and core counts.

float hcn_space = (float)NONCE_SPACE / big_cores_up / asic_count_up;
double hcn_max = hcn_space * (double)FREQ_MULT / frequency * 0.5f; 

Note: The size is dependent on frequency which is odd.

after calculating the HCN max which is the equivalent to nonce space 100%, the timeout is easy, you just need non parallel space of the chip and frequency

double fullspace_timeout_s = serial_versions * serial_nonces / ((double)frequency_mhz * 1000.0 * 1000.0);

Additional Note

  • The CNO allows 16 bit division in the BM1370 of a chain of chips

This should work for any versions/hcn value/frequency/chip count

Testing TODO
hex
gammaturbo
gamma
supra
ultra
max
naja

@adammwest adammwest changed the title WIP: Fullscans WIP: Calulating fullscan_ms and space computed by the chip Oct 20, 2024
@adammwest adammwest changed the title WIP: Calulating fullscan_ms and space computed by the chip WIP: Calculating fullscan_ms and space computed by the chip Oct 20, 2024
@adammwest adammwest changed the title WIP: Calculating fullscan_ms and space computed by the chip Calculating fullscan_ms and space computed by the chip Oct 25, 2024
@adammwest adammwest changed the title Calculating fullscan_ms and space computed by the chip Calculating scan time and space computed by the chip Oct 25, 2024
@adammwest adammwest changed the title Calculating scan time and space computed by the chip Calculating scan time and space computed by BM chips Oct 25, 2024
@adammwest adammwest changed the title Calculating scan time and space computed by BM chips fix: Calculating scan time and space computed by BM chips Nov 7, 2024
@mutatrum
Copy link
Collaborator

mutatrum commented Nov 21, 2024

There is an off-by-one error somewhere:

I (17578) ASIC_task: ASIC Job Interval: 1812.53 ms
[...]
I (18228) create_jobs_task: Set chip version rolls 65535
I (18228) stratum_task: rx: {"params":[10000],"id":null,"method":"mining.set_difficulty"}
I (18238) create_jobs_task: Set chip fullscan 1812.498908

coming from this code in create_jobs_task.c:

            //calulate update to fullscan_ms as new version rolling
            double new_version_percent = (double)version_rolls / (double)65536.0;
            double prcnt_change = new_version_percent/GLOBAL_STATE->version_space_percent;
            GLOBAL_STATE->asic_job_frequency_ms *= prcnt_change;
            GLOBAL_STATE->version_space_percent = new_version_percent;
            ESP_LOGI(TAG, "Set chip fullscan %f", GLOBAL_STATE->asic_job_frequency_ms);

I set the BM1368_FULLSCAN_PERCENT to 1.0. The initial asic job interval is set 1/65536 shorter. Don't think that's supposed to happen?

Should it be:

            double new_version_percent = (double)(version_rolls + 1) / (double)65536.0;

Not sure where the 65536.0 is coming from? Is that the maximum it can be?

Other than that, it seems to be hashing fine with these settings on my Supra.

@WantClue WantClue requested a review from Georges760 December 2, 2024 22:46
@WantClue WantClue added the help wanted Extra attention is needed label Dec 2, 2024
@Georges760 Georges760 self-assigned this Dec 2, 2024
@adammwest adammwest changed the title fix: Calculating scan time and space computed by BM chips Knowlage of registers (hcn 10 and cno C0) Feb 9, 2025
@adammwest adammwest changed the title Knowlage of registers (hcn 10 and cno C0) functions to control nonce space timeouts using HCN, CNO (hcn 10 and cno C0) for all chips Feb 9, 2025
@adammwest adammwest changed the title functions to control nonce space timeouts using HCN, CNO (hcn 10 and cno C0) for all chips functions to control nonce space timeouts using registers HCN, CNO for all chips Feb 9, 2025
@adammwest adammwest changed the title functions to control nonce space timeouts using registers HCN, CNO for all chips functions to control nonce space and timeouts using registers HCN, CNO for all chips Feb 9, 2025
@adammwest adammwest changed the title functions to control nonce space and timeouts using registers HCN, CNO for all chips functions to control nonce space and timeouts for all chip topologies Feb 9, 2025
@mutatrum
Copy link
Collaborator

Is the BM1397 still unknown, as it doesn't set register 0x10 at all?

@mapio
Copy link
Contributor

mapio commented Mar 12, 2025

double new_version_percent = (double)version_rolls / (double)65536.0;

By the way, literal 65536.0 is already a double in specified in Section 6.4.4.2 ("Floating constants") of "C standard (ISO/IEC 9899:2018)", more precisely 6.4.4.2p4 states: "A floating constant has type double unless explicitly specified by a suffix.".

This implies that — see Section 6.3.1.8 ("Usual arithmetic conversions") — the other term of the division will be automatically converted to double.

So the idiomatic way of writing such expression should be double new_version_percent = version_rolls / 65536.0;.

Useless cast can confuse expert C programmers because they seem to imply that the standard can't be expected to hold (for some unspecified reason).

@mutatrum
Copy link
Collaborator

mutatrum commented Jun 2, 2025

Ok, story time. I've ported this to dev-latest, and it's running on my Gamma with default frequency/voltage. I took out all the percent configurations, so it'll scan the whole nonce range over the full version_mask, with maximum asic job interval. I got this:

I (14210) ASIC_task: ASIC Job Interval: 261213.00 ms

It then starts hashing, first job:

I (17370) bm1370Module: Job ID: 18, Core: 110/4, Ver: 004A8000

Until it reaches the end:

I (273530) bm1370Module: Job ID: 18, Core: 42/7, Ver: 1FA8E000

Indeed, 261 seconds later it was done. It starts with ver 0x00000000 and ends at 0x1FFFFFFF, which is the version_mask for this pool (ckpool). So, as long as the pool doesn't flush the jobs, it only switches to a new job once every 4 minutes 21 seconds. This means it ignores almost every mining.notify, and on average only switches ~4 jobs per block. No duplicate shares reported.

Interesting thing is, the pool doesn't like this. After 100 seconds of working on the same job, you get Invalid JobID errors on submits, and after enough of those, you get a client.reconnect. I reduced the ASIC job interval to 60 seconds as to not piss of the pool and the hashrate is normal, no reconnects, and after 9 hours:
image image

Another observation: the pool wanted to switch from the starting difficulty of 10000 to a lower difficulty after a few minutes, but when running with the 261 seconds ASIC job interval it took 27 minutes before that came into effect, after a mining.notify with a job flush. So a really long job interval points to something that's not correct yet.

If this can be used on all chips, it would be possible to eliminate the jobs queue completely. Just let it run, and on each notify, flush or not, just create one new job, send it to the ASIC and it just works, with a lot more simplified code. If you get a new mining.notify every 5 seconds, one needs to have 50Th/s on a single device to even start rolling extranonce_2 for a new job.

Related issues are #824 and #939.

@skot
Copy link
Collaborator

skot commented Jun 2, 2025

That's pretty exciting! I assume this won't work on the BM1397 (bitaxeMax) and prolly not on the upcoming BZM2 as they don't roll version.

Nonetheless I think this is worth implementing.

mutatrum added a commit to mutatrum/ESP-Miner that referenced this pull request Jun 2, 2025
Removed nonce_percentage and timeout_percentage to simplify the code
@mutatrum
Copy link
Collaborator

mutatrum commented Jun 2, 2025

Code to test: https://github.com/mutatrum/ESP-Miner/tree/fullscan-fix-revisited

@mutatrum
Copy link
Collaborator

mutatrum commented Jun 2, 2025

That's pretty exciting! I assume this won't work on the BM1397 (bitaxeMax) and prolly not on the upcoming BZM2 as they don't roll version.

Nonetheless I think this is worth implementing.

It's super interesting, there is code for the BM1397, not sure how it'll handle though, I only have a Supra and Gamma.

And how does it need to be controlled? Having the full scan range and the maximum ASIC job timeout clearly causes issues, both with the pool as well as with ESP-Miner itself, as jobs are active way too long.

@mutatrum
Copy link
Collaborator

It seems like we could switch to new work ASAP no matter how clean_jobs is set?

yes we can. i recommend to merge this!^^

Only if we have rolling nonce for BM1397.

@adammwest adammwest changed the base branch from v2.8.x to master February 21, 2026 15:56
@github-actions
Copy link

github-actions bot commented Feb 21, 2026

Test Results

47 tests  +5   47 ✅ +5   0s ⏱️ ±0s
 1 suites ±0    0 💤 ±0 
 1 files   ±0    0 ❌ ±0 

Results for commit f7af8ca. ± Comparison against base commit 5156662.

♻️ This comment has been updated with latest results.

// no version-rolling so same Nonce Space is splitted between Big Cores
return calculate_bm_timeout_ms(freq, asic_count, small_cores, cores, 4.0, ASIC_SET_TIMEOUT_PERCENT, 20);
case BM1366:
// ASIC_calculate_bm_timeout_ms(GLOBAL_STATE, GLOBAL_STATE->version_mask >> 13, 1.0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change comment ASIC_calculate_bm_timeout_ms to calculate_bm_timeout_ms,
consider adding new default config default_timeout to device_config.h

maybe move // ASIC_calculate_bm_timeout_ms(GLOBAL_STATE, GLOBAL_STATE->version_mask >> 13, 1.0);
to the PR description rather than have a comment for future use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants