Estimating Chain Work and Current Hashrate from the lowest hash

I want to find the relation between chain work (W) and the lowest hash ever seen (L). There are two different ways to run simulations to check an equation and they give wildly different results. (2^256 in the following is technically 2^256 - 1). The first method is 

$\text{AverageW} = \frac{2^{256}}{\text{AverageL}}$ &nbsp; &nbsp;  (1)

The second is:

$\text{AverageW} = \text{Average}(\frac{2^{256}}{L})$ &nbsp; &nbsp; (2)

The 1st equation gives the correct amount of work and the 2nd equation usually varies from 4 to 14 too high in the average work compared to the actual work.  I've seen up to 600x too high in simulations. The reason is because there's a division by the lowest hash and some hashes are really small which has an outsized-effect on the mean.  Mathematically mean(1/lowest_hash) > 1/mean(lowest_hash).  This is always true when taking the mean of positive values.  To make the AverageW of the 2nd eq to come out correct, a correction factor of about 6 in each trial is implied:

$W = \frac{2^{256}}{6 \cdot L}$

So why talk about the 2nd equation if it's unstable and the first one's right? If you have the lowest hash ever seen for a blockchain, you'll estimate its work with W = 2^256 / L.  But if you do that estimate for many blockchains that you know have the same amount of work and average it, i.e. use the 2nd equation, your average answer will be way too high.  If you average the lowest hash in all those blockchains, and apply equation 1, you get the correct average work for all of them, but it seems like the 2nd equation is the correct approach. 

Furthermore, as shown in my prior article, simulations show that if you have the N lowest hashes, there's a form of the 2nd equation that's perfectly accurate:

$\text{AverageW} = \text{Average}(\frac{2^{256} \cdot (N-1)}{\text{NthLowestHash}})$ &nbsp; &nbsp;   (3)

Notice that when N = 1, it's the 2nd eq, except we use the 2nd lowest hash. The way to interpret this is that the 2nd lowest hash is acting like a difficulty (!!) for the lowest hash, so it functions just like the usual chain work metric.  **Isn't that cool?** The 2nd lowest hash is the "effective difficulty" that total chain work has solved.  Just use it instead of the lowest hash. 

This method suggests an idea for N=1:  we know the mean and median winning hash for a given target is 1/2 the target, so why not pretend the lowest hash solved a difficulty that was 2x its value: 

$\text{AverageW} = \text{Average}(\frac{2^{256}}{2 \cdot L})$

As before, this modification of the 2nd equation gives wildly varying results no matter how many runs you average over, and very roughly speaking, it's about 3x too high (as the 1/6 factor above implies).

[Here's code to check equations 1, 2, and 3.](https://github.com/zawy12/difficulty-algorithms/blob/master/estimate_chain_work_from_lowest_hash2.py)

Block solvetimes and the lowest hash seen both follow the exponential distribution:

$t = \text{solvetime per block (aka per mean number of hashes)}$
$T = \text{expected solvetime per mean number of hashes = difficulty / hashrate}$
$\text{PDF(t)} = \frac{1}{T} e^{-\frac{1}{T} \cdot t}$

And:

$L = \text{lowest hash seen per "block of W" (per actual hashes)} $
$T = \frac{2^{256}}{\text{Work}} = \text{expected Target (hash value) per actual hashes}$ 
$\text{PDF(L}) = \frac{1}{T} e^{-\frac{1}{T} \cdot L}$

Each solvetime for a mean amount of work is the distance between tick marks on a timeline. Each lowest hash value seen per actual amount of work is the distance between tick marks on a hash-value line. Time can go from 0 to infinity but hash values are limited to 0 to 2^256, so the timeline and hash-value line should be seen as the sum of prior solvetimes and prior lowest hashes seen.

This is confirmed this by experiment. The histogram of L looks like the exponential PDF, the median is ln(2), and the mean and StdDev are 1, all as expected.  Grok 4 confirms.  [Here's my code for the testing and to confirm it's exponential.](https://github.com/zawy12/difficulty-algorithms/blob/master/estimate_chain_work_from_lowest_hash.py)

Grok insists W = 2^256 / L is the "unbiased estimator". It says I should not use many trials in my experiments to get an average in the 2nd equation because we're sampling a chain just once. I pointed out equation 3 (it agreed that it is correct) gives a work calculation for N=2 that is always smaller because the 2nd lowest hash is always > lowest hash. It didn't care and said access to N=2 in the calculation made it "more robust".   To support it's point, it provided the following equation ("finite sample bias correction") that showed itself more accurate in my experiments than my 1/6 factor. Notice it depends on the number of trial runs:

W = 2^256 / lowest_hash / (ln(trials) + 0.577)

But for 1 trial it's almost 2x higher than W = 2^256 / lowest_hash. Apparently it only applies for at least 2 trials.

In trying to get more comfortable with Grok's conclusion, I wanted to look at the effect of generated fake lowest_hash values from chain_work W in the same way we generate fake solvetimes from the expected block time. To review that "derivation":

$\text{CDF(L)} = 1 - e^{-L \cdot \frac{W}{2^{256}}}$

When values generated by the exponential distribution are plugged into the CDF, the CDF(L) is a uniform random variable from 0 to 1. This allows L values to be simulated by assigning CDF(L) = rand(0,1) and we can solve the CDF(L) for L:

$L = -\ln{(\text{1 - rand(0,1))}} \cdot \frac{2^{256}}{W}$

This is also how solvetimes "t" for blocks are simulated for a given hashrate H and difficulty D:

$t = -\ln{(\text{1 - rand(0,1))}} \cdot \frac{D}{H}$

I want to use this equation to calculate the expected work which is what equation 2 is attempting. The expected value formula isn't closed (there's no mathematical answer) which is why eq 2 by experiement varies a lot no matter how many trials I run. an alternate way to get the expected value is to use x = rand(0,1) in my equation above, solve for W, and integrate x from 0 to 1.  Like the expected integrate, this covers all possible L values to get the expected W. But I can't start with x=0 because it gives a divide by zero which is why it's not closed form and overestimates work by experiment.  But I can integrate over the rest of the possibilities to get an estimate without that extreme case.  Using the equation below, I can see W = 2^256 / lowest_hash for the middle 58% of trials.  1-x = 1-rand(0,1) represents L going from the least to highest L value in the 0 to 1 span. This integral is answering "what is the expected W = 2^256/L for the middle 58% of the trial runs". The range from 0.79 to 1 indicated in the integral doesn't have a large effect, so let's exclude only the lowest-valued hashes.  The integral from 0.23 to 1 also gives 1 like the middle 58%.  That is, W = 2^256 / lowest_hash for the upper 77% of the lowest_hashes in trails. 

<img width="442" height="108" alt="Image" src="https://github.com/user-attachments/assets/d8226bea-95d5-4e2b-9cd2-ca004952940f" />

It's important to note that when (at the moment) you see the lowest hash published, that doesn't mean you've seen $\text{chainWork} = \frac{2^{256}}{\text{lowestHash}}$.  The equation is only valid when you pick a point in time and then look for the lowest hash that has occurred. 

BTW the actual chain work as of 7/12/2025. Chain work is now 6.48E28. The lowest hash came in 2020 and was block 756951 with value 5.33E47.  To get current chain work, get the most recent hash and [use getblock hash](https://chainquery.com/bitcoin-cli/getblock) and convert chain work to decimal). The 12 lowest hashes can be found via a dune.com using the query:
```
SELECT hash, height, time
FROM bitcoin.blocks
ORDER BY hash ASC
LIMIT 12
```

**Estimating current hashrate**

I believe this is the best-possible estimate of _current_ hashrate for an arbitrarily-small fixed time period t when given only the lowest hash seen in t. 

$L = \text{lowest hash seen in time t}$

$W = \text{work in t}$

$\lambda = \frac{W}{2^{256}}$

$\text{exponential CDF}(L) = 1 - e^{-\lambda L}$

$E = \text{error signal} = 0.5 - \text{CDF}(L)$

The error signal is the probability that the prior estimate of W was wrong. 0.5 is the median observation if the prior estimate was correct, so there would be no error. Use the error signal in the EMA equation. 

$h = \text{height of t time segments}$

$W_{h} = W_{h-1} \cdot e^{-\frac{E}{N}}$

$\text{Stdev} \approx \frac{W}{\sqrt{2N}}$

N = "mean lifetime" of the EMA estimate in units of t. You choose N to get your desired stability / slowness of the estimate.  Divide by t to get hashrate. Use $\frac{2^{256}}{W_h}$ to get the target for a difficulty algorithm that changes every 600 seconds (probably impractical to securely implement).

```hashrate = sum(work)/(time duration)``` from $h - N$ to $h$ gives a better estimate of hashrate at $h -N/2$.  The EMA needs a starting W. If it starts at $h-N$ The starting $W_{h-N}$ could be obtained by ```sum(work) ``` from $h-\frac{3N}{2}$ to $h-\frac{N}{2}$ and dividing by N.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimating Chain Work and Current Hashrate from the lowest hash #84

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Estimating Chain Work and Current Hashrate from the lowest hash #84

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions