Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2ff968d
Create yacoin.h
Thirtybird Jun 5, 2013
38bb288
Create yacoin.c
Thirtybird Jun 5, 2013
5fe4ec1
Create scrypt-jane.c
Thirtybird Jun 5, 2013
3532ca9
Create scrypt-jane.h
Thirtybird Jun 5, 2013
81eb170
Merged ali1234 repository with floodberry's scrypt-jane repository
Jun 5, 2013
333fddf
Update yacoin.h
Thirtybird Jun 5, 2013
de6b6b7
Update yacoin.h
Thirtybird Jun 5, 2013
1afb046
Update yacoin.c
Thirtybird Jun 5, 2013
e0f3548
Update Makefile.am
Thirtybird Jun 5, 2013
654bcac
Update cpu-miner.c
Thirtybird Jun 5, 2013
eef004c
Update Makefile.am
Thirtybird Jun 5, 2013
5728a0e
Update yacoin.h
Thirtybird Jun 5, 2013
5467b55
Update yacoin.h
Thirtybird Jun 5, 2013
baf01f1
Update Makefile.am
Thirtybird Jun 5, 2013
aa6abd7
Update yacoin.c
Thirtybird Jun 5, 2013
46eaeae
Update cpu-miner.c
Thirtybird Jun 5, 2013
bee5058
Update Makefile.am
Thirtybird Jun 5, 2013
84c4ce8
Update cpu-miner.c
Thirtybird Jun 5, 2013
a05d1ba
cleaning up unneeded files
Jun 5, 2013
eb5ecae
Merge branch 'master' of http://github.com/Thirtybird/cpuminer
Jun 5, 2013
6c27877
moved YACoin routines into yacoin.c
Jun 5, 2013
84dd6ce
Updated to use YACoin routines in yacoin.c
Jun 5, 2013
25632c9
fixed missing include in yacoin.c and updated scanhash functionname i…
Jun 5, 2013
9ad3ea9
Updated README with detailed build instructions for MinGW
Thirtybird Jun 7, 2013
fd65a31
updated latest scrypt-jane from floodyberry
Thirtybird Jun 11, 2013
d8d4456
Updated README with detailed build instructions for MinGW 64-bit
Thirtybird Jun 11, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions Makefile.am
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

if WANT_JANSSON
JANSSON_INCLUDES= -I$(top_srcdir)/compat/jansson
else
Expand All @@ -16,7 +15,7 @@ bin_PROGRAMS = minerd
minerd_SOURCES = elist.h miner.h compat.h \
cpu-miner.c util.c \
sha2.c sha2-arm.S sha2-x86.S sha2-x64.S \
scrypt.c scrypt-arm.S scrypt-x86.S scrypt-x64.S scrypt-jane.c
scrypt.c scrypt-arm.S scrypt-x86.S scrypt-x64.S yacoin.c scrypt-jane/scrypt-jane.c
minerd_LDFLAGS = $(PTHREAD_FLAGS)
minerd_LDADD = @LIBCURL@ @JANSSON_LIBS@ @PTHREAD_LIBS@ @WS2_LIBS@
minerd_CPPFLAGS = @LIBCURL_CPPFLAGS@ -DSCRYPT_KECCAK512 -DSCRYPT_CHACHA -DSCRYPT_CHOOSE_COMPILETIME
Expand Down
79 changes: 70 additions & 9 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,78 @@ Notes for AIX users:
* GNU-style long options are not supported, but are accessible
via configuration file

Basic Windows build instructions, using MinGW:
Detailed Windows build instructions, using MinGW (32-bit):
Install MinGW and the MSYS Developer Tool Kit (http://www.mingw.org/)
* Make sure you have mstcpip.h in MinGW\include
If using MinGW-w64, install pthreads-w64
* Choose C, C++ and MSys on install as select to have it update its libraries
* Install into C:\MinGW
Include mstcpip.h from WINE in your MinGW library
* http://source.winehq.org/source/include/mstcpip.h
* select version 1.3.34
* copy this code into C:\MinGW\Include\mstcpip.h (strip out the line numbers!)
Install libcurl devel (http://curl.haxx.se/download.html)
* Make sure you have libcurl.m4 in MinGW\share\aclocal
* Make sure you have curl-config in MinGW\bin
In the MSYS shell, run:
./autogen.sh # only needed if building from git repo
LIBCURL="-lcurldll" ./configure CFLAGS="-O3"
make
* download curl-7.30.0.tar.gz from http://curl.haxx.se/download.html and put it in C:\deps\
* launch an MSYS shell and enter the following commands (the configure step will take a long time!)
cd /c/deps
tar -xvzf curl-7.30.0.tar.gz
cd curl-7.30.0
./configure –prefix=/c/mingw
make
make install
* copy c:\deps\curl-7.30.0\docs\libcurl\libcurl.m4 c:\mingw\share\aclocal
* copy c:\deps\curl-7.30.0\curl-config c:\mingw\bin
In the MSYS shell, navigate to the CPUminer source code direcctory
* You will likely get higher hashrates by forcing the compiler to build the executable for your
specific CPU architecture. This is done by adding "-march=<value>" into the CFLAGS. Those
values can be found at http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html
common choices for intel are : core2, corei7, corei7-avx
common choices for AMD are : athlon-fx
* Execute the following (replacing the value of -march with the value for your CPU type)
./autogen.sh
./configure CFLAGS="-march=core2 -O3"
make
strip minerd.exe
Combine the executables with the dependencies
* copy minerd.exe, C:\MinGW\bin\libcurl-4.dll, and C:\MinGW\bin\pthreadGC2.dll to the same directory

Detailed Windows build instructions, using MinGW (64-bit):
Install MinGW and the MSYS Developer Tool Kit (http://www.mingw.org/)
* Choose C, C++ and MSys on install and select to have it update its libraries
* Install into C:\MinGW
* Add C:\MinGW\bin and c:\MinGW\msys\1.0 to your path
Download MinGW64 from http://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Automated%20Builds/
* Choose mingw-w64-bin_i686-mingw_20111220.zip
* Extract ZIP to C:\MinGW64
* Add C:\MinGW64\bin to your path before C:\MinGW\bin
Install libcurl devel (http://curl.haxx.se/download.html)
* download curl-7.30.0.tar.gz from http://curl.haxx.se/download.html and put it in C:\deps\
* launch an MSYS shell and enter the following commands (the configure step will take a long time!)
cd /c/deps
tar -xvzf curl-7.30.0.tar.gz
cd curl-7.30.0
./configure --host=x86_64-w64-mingw32 –-prefix=/c/mingw64
make
make install
cp /c/deps/curl-7.30.0/docs/libcurl/libcurl.m4 /c/mingw/share/aclocal/libcurl.m4
Install pthreads
* download pthreads-20100604.zip from http://sourceforge.net/projects/mingw-w64/files/External%20binary%20packages%20%28Win64%20hosted%29/pthreads/ and put it in C:\deps\
* unzip the file to c:\deps\
* In the mingw64 subdirectory is pthreads-w64.zip - extract the contents to C:\MinGW64
In the MSYS shell, navigate to the CPUminer source code direcctory
* You will likely get higher hashrates by forcing the compiler to build the executable for your
specific CPU architecture. This is done by adding "-march=<value>" into the CFLAGS. Those
values can be found at http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html
common choices for intel are : core2, corei7, corei7-avx
common choices for AMD are : athlon-fx
* Execute the following (replacing the value of -march with the value for your CPU type)
./autogen.sh
./configure --host=x86_64-w64-mingw32 CFLAGS="-O3 -march=core2 -DCPU_X86_FORCE_INTRINSICS"
make
Strip minerd.exe
* In a command prompt, in the compilation directory, execute the following
x86_64-w64-mingw32-strip minerd.exe
Combine the executables with the dependencies
* copy minerd.exe, C:\MinGW64\bin\libcurl-4.dll, and C:\MinGW64\bin\pthreadGC2-w64.dll to the same directory


Architecture-specific notes:
ARM: No runtime CPU detection. The miner can take advantage
Expand Down
5 changes: 4 additions & 1 deletion cpu-miner.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
#include <curl/curl.h>
#include "compat.h"
#include "miner.h"
#include "yacoin.h"

#define PROGRAM_NAME "minerd"
#define DEF_RPC_URL "http://127.0.0.1:9332/"
Expand Down Expand Up @@ -133,6 +134,7 @@ static unsigned long accepted_count = 0L;
static unsigned long rejected_count = 0L;
double *thr_hashrates;


#ifdef HAVE_GETOPT_LONG
#include <getopt.h>
#else
Expand Down Expand Up @@ -633,7 +635,7 @@ static void *miner_thread(void *userdata)
break;

case ALGO_SCRYPT_JANE:
rc = scanhash_scrypt_jane(thr_id, work.data, work.target,
rc = scanhash_yacoin(thr_id, work.data, work.target,
max_nonce, &hashes_done);
break;

Expand Down Expand Up @@ -1151,3 +1153,4 @@ int main(int argc, char *argv[])

return 0;
}

2 changes: 1 addition & 1 deletion miner.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ extern int scanhash_scrypt(int thr_id, uint32_t *pdata,
unsigned char *scratchbuf, const uint32_t *ptarget,
uint32_t max_nonce, unsigned long *hashes_done);

extern int scanhash_scrypt_jane(int thr_id, uint32_t *pdata,
extern int scanhash_yacoin(int thr_id, uint32_t *pdata,
const uint32_t *ptarget,
uint32_t max_nonce, unsigned long *hashes_done);

Expand Down
161 changes: 161 additions & 0 deletions scrypt-jane/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
This project provides a performant, flexible implementations of Colin Percival's [scrypt](http://www.tarsnap.com/scrypt.html).

# Features

## Modular Design

The code uses a modular (compile, not runtime) layout to allow new mixing & hash functions to be added easily. The base components (HMAC, PBKDF2, and scrypt) are static and will immediately work with any conforming mix or hash function.

## Supported Mix Functions

* [Salsa20/8](http://cr.yp.to/salsa20.html)
* [ChaCha20/8](http://cr.yp.to/chacha.html)
* [Salsa6420/8]()

I am not actually aware of any other candidates for a decent mix function. Salsa20/8 was nearly perfect, but its successor, ChaCha20/8, has better diffusion and is thus stronger, is potentially faster given advanced SIMD support (byte level shuffles, or a 32bit rotate), and is slightly cleaner to implement given that it requires no pre/post processing of data for SIMD implementations.

64-byte blocks are no longer assumed! Salsa6420/8 is a 'proof of concept' 64-bit version of Salsa20/8 with a 128 byte block, and rotation constants chosen to allow 32-bit word shuffles instead of rotations for two of the rotations which put it on par with ChaCha in terms of SSE implementation shortcuts.

## Supported Hash Functions

* SHA256/512
* [BLAKE256/512](https://www.131002.net/blake/)
* [Skein512](http://www.skein-hash.info/)
* [Keccak256/512](http://keccak.noekeon.org/) (SHA-3)

Hash function implementations, unlike mix functions, are not optimized. The PBKDF2 computations are relatively minor in the scrypt algorithm, so including CPU specific versions, or vastly unrolling loops, would serve little purpose while bloating the code, both source and binary, and making it more confusing to implement correctly.

Most (now only two!) of the SHA-3 candidates fall in to the "annoying to read/implement" category and have not been included yet. This will of course be moot once ~~BLAKE is chosen as SHA-3~~ Keccak is chosen as SHA-3. Well shit.

## CPU Adaptation

The mixing function specialization is selected at runtime based on what the CPU supports (well, x86/x86-64 for now, but theoretically any). On platforms where this is not needed, e.g. where packages are usually compiled from source, it can also select the most suitable implementation at compile time, cutting down on binary size.

For those who are familiar with the scrypt spec, the code specializes at the ROMix level, allowing all copy, and xor calls to be inlined efficiently. ***Update***: This is actually not as important as I switched from specializing at the mix() level and letting the compiler somewhat inefficiently inline block_copy and block_xor to specializing at ChunkMix(), where they can be inlined properly. I thought about specializing at ROMix(), but it would increase the complexity per mix function even more and would not present many more opportunities than what is generated by the compiler presently.

MSVC uses SSE intrinsics as opposed to inline assembly for the mix functions to allow the compiler to fully inline properly. Also, Visual Studio is not smart enough to allow inline assembly in 64-bit code.

## Self Testing

On first use, scrypt() runs a small series of tests to make sure the hash function, mix functions, and scrypt() itself, are generating correct results. It will exit() (or call a user defined fatal error function) should any of these tests fail.

Test vectors for individual mix and hash functions are generated from reference implementations. The only "official" test vectors for the full scrypt() are for SHA256 + Salsa20/8 of course; other combinations are generated from this code (once it works with all reference test vectors) and subject to change if any implementation errors are discovered.

# Performance (on an E5200 2.5GHZ)

Benchmarks are run _without_ allocating memory, i.e. allocating enough memory before the trials are run. Different allocators can have different costs and non-deterministic effects, which is not the point of comparing implementations. The only hash function compared will be SHA-256 to be comparable to Colin's reference implementation, and the hash function will generally be a fraction of a percent of noise in the overall result.

Three different scrypt settings are tested (the last two are from the scrypt paper):

* 'High Volume': N=4096, r=8, p=1, 4mb memory
* 'Interactive': N=16384, r=8, p=1, 16mb memory
* 'Non-Interactive': N=1048576, r=8, p=1, 1gb memory

Cycle counts are in millions of cycles. All versions compiled with gcc 4.6.3, -O3. Sorted from fastest to slowest.

Scaling refers to how much more expensive 'Non-Interactive' is to compute than 'High Volume', normalized to "ideal" scaling (256x difficulty). Under 100% means it becomes easier to process as N grows, over 100% means it becomes more difficult to process as N grows.


<table>
<thead><tr><th>Implemenation</th><th>Algo</th><th>High Volume</th><th>Interactive</th><th>Non-Interactive</th><th>Scaling</th></tr></thead>
<tbody>

<tr><td>scrypt-jane SSSE3 64bit</td><td>Salsa6420/8 </td><td>18.2m</td><td> 75.6m</td><td>5120.0m</td><td>110.0%</td></tr>
<tr><td>scrypt-jane SSSE3 64bit</td><td>ChaCha20/8 </td><td>19.6m</td><td> 79.6m</td><td>5296.7m</td><td>105.6%</td></tr>
<tr><td>scrypt-jane SSSE3 32bit</td><td>ChaCha20/8 </td><td>19.8m</td><td> 80.3m</td><td>5346.1m</td><td>105.5%</td></tr>
<tr><td>scrypt-jane SSE2 64bit </td><td>Salsa6420/8 </td><td>19.8m</td><td> 82.1m</td><td>5529.2m</td><td>109.1%</td></tr>
<tr><td>scrypt-jane SSE2 64bit </td><td>Salsa20/8 </td><td>22.1m</td><td> 89.7m</td><td>5938.8m</td><td>105.0%</td></tr>
<tr><td>scrypt-jane SSE2 32bit </td><td>Salsa20/8 </td><td>22.3m</td><td> 90.6m</td><td>6011.0m</td><td>105.3%</td></tr>
<tr><td>scrypt-jane SSE2 64bit </td><td>ChaCha20/8 </td><td>23.9m</td><td> 96.8m</td><td>6399.7m</td><td>104.6%</td></tr>
<tr><td>scrypt-jane SSE2 32bit </td><td>ChaCha20/8 </td><td>24.2m</td><td> 98.3m</td><td>6500.7m</td><td>104.9%</td></tr>
<tr><td>*Reference SSE2 64bit* </td><td>Salsa20/8 </td><td>32.9m</td><td>135.2m</td><td>8881.6m</td><td>105.5%</td></tr>
<tr><td>*Reference SSE2 32bit* </td><td>Salsa20/8 </td><td>33.0m</td><td>134.4m</td><td>8885.2m</td><td>105.2%</td></tr>
</tbody>
</table>

* scrypt-jane Salsa6420/8-SSSE3 is ~1.80x faster than reference Salsa20/8-SSE2 for High Volume, but drops to 1.73x faster for 'Non-Interactive' instead of remaining constant
* scrypt-jane ChaCha20/8-SSSE3 is ~1.67x faster than reference Salsa20/8-SSE2
* scrypt-jane Salsa20/8-SSE2 is ~1.48x faster than reference Salsa20/8-SSE2

# Performance (on a slightly noisy E3-1270 3.4GHZ)

All versions compiled with gcc 4.4.7, -O3. Sorted from fastest to slowest.

<table>
<thead><tr><th>Implemenation</th><th>Algo</th><th>High Volume</th><th>Interactive</th><th>Non-Interactive</th><th>Scaling</th></tr></thead>
<tbody>
<tr><td>scrypt-jane AVX 64bit </td><td>Salsa6420/8 </td><td>11.8m</td><td> 52.5m</td><td>3848.6m</td><td>127.4%</td></tr>
<tr><td>scrypt-jane SSSE3 64bit </td><td>Salsa6420/8 </td><td>13.3m</td><td> 57.9m</td><td>4176.6m</td><td>122.7%</td></tr>
<tr><td>scrypt-jane SSE2 64bit </td><td>Salsa6420/8 </td><td>14.2m</td><td> 61.1m</td><td>4382.4m</td><td>120.6%</td></tr>
<tr><td>scrypt-jane AVX 64bit </td><td>ChaCha20/8 </td><td>18.0m</td><td> 77.4m</td><td>5396.8m</td><td>117.1%</td></tr>
<tr><td>scrypt-jane AVX 32bit </td><td>ChaCha20/8 </td><td>18.3m</td><td> 82.1m</td><td>5421.8m</td><td>115.7%</td></tr>
<tr><td>scrypt-jane SSSE3 64bit </td><td>ChaCha20/8 </td><td>19.0m</td><td> 81.3m</td><td>5600.7m</td><td>115.1%</td></tr>
<tr><td>scrypt-jane AVX 64bit </td><td>Salsa20/8 </td><td>19.0m</td><td> 81.2m</td><td>5610.6m</td><td>115.3%</td></tr>
<tr><td>scrypt-jane AVX 32bit </td><td>Salsa20/8 </td><td>19.0m</td><td> 81.3m</td><td>5621.6m</td><td>115.6%</td></tr>
<tr><td>scrypt-jane SSSE3 32bit </td><td>ChaCha20/8 </td><td>19.1m</td><td> 81.8m</td><td>5621.6m</td><td>115.0%</td></tr>
<tr><td>scrypt-jane SSE2 64bit </td><td>Salsa20/8 </td><td>19.5m</td><td> 83.8m</td><td>5772.9m</td><td>115.6%</td></tr>
<tr><td>scrypt-jane SSE2 32bit </td><td>Salsa20/8 </td><td>19.6m</td><td> 84.0m</td><td>5793.9m</td><td>115.5%</td></tr>
<tr><td>*Reference SSE2/AVX 64bit* </td><td>Salsa20/8 </td><td>21.5m</td><td> 90.4m</td><td>6147.1m</td><td>111.7%</td></tr>
<tr><td>*Reference SSE2/AVX 32bit* </td><td>Salsa20/8 </td><td>22.3m</td><td> 94.0m</td><td>6267.7m</td><td>110.0%</td></tr>
<tr><td>scrypt-jane SSE2 64bit </td><td>ChaCha20/8 </td><td>23.1m</td><td> 97.7m</td><td>6670.0m</td><td>112.8%</td></tr>
<tr><td>scrypt-jane SSE2 32bit </td><td>ChaCha20/8 </td><td>23.3m</td><td> 98.4m</td><td>6728.7m</td><td>112.8%</td></tr>
<tr><td>*Reference SSE2 64bit* </td><td>Salsa20/8 </td><td>30.4m</td><td>125.6m</td><td>8139.4m</td><td>104.6%</td></tr>
<tr><td>*Reference SSE2 32bit* </td><td>Salsa20/8 </td><td>30.0m</td><td>124.5m</td><td>8469.3m</td><td>110.3%</td></tr>
</tbody>
</table>

* scrypt-jane Salsa6420/8-AVX is 1.60x - 1.82x faster than reference Salsa20/8-SSE2/AVX
* scrypt-jane ChaCha20/8-AVX is 1.13x - 1.19x faster than reference Salsa20/8-SSE2/AVX
* scrypt-jane Salsa20/8-AVX is 1.09x - 1.13x faster than reference Salsa20/8-SSE2/AVX


# Building

[gcc,icc,clang] scrypt-jane.c -O3 -[m32,m64] -DSCRYPT_MIX -DSCRYPT_HASH -c

where SCRYPT_MIX is one of

* SCRYPT_SALSA
* SCRYPT_SALSA64 (no optimized 32-bit implementation)
* SCRYPT_CHACHA

and SCRYPT_HASH is one of

* SCRYPT_SHA256
* SCRYPT_SHA512
* SCRYPT_BLAKE256
* SCRYPT_BLAKE512
* SCRYPT_SKEIN512
* SCRYPT_KECCAK256
* SCRYPT_KECCAK512

e.g.

gcc scrypt-jane.c -O3 -DSCRYPT_CHACHA -DSCRYPT_BLAKE512 -c
gcc example.c scrypt-jane.o -o example

clang *may* need "-no-integrated-as" as some? versions don't support ".intel_syntax"

# Using

#include "scrypt-jane.h"

scrypt(password, password_len, salt, salt_len, Nfactor, pfactor, rfactor, out, want_bytes);

## scrypt parameters

* Nfactor: Increases CPU & Memory Hardness
* rfactor: Increases Memory Hardness
* pfactor: Increases CPU Hardness

In scrypt terms

* N = (1 << (Nfactor + 1)), which controls how many times to mix each chunk, and how many temporary chunks are used. Increasing N increases both CPU time and memory used.
* r = (1 << rfactor), which controls how many blocks are in a chunk (i.e., 2 * r blocks are in a chunk). Increasing r increases how much memory is used.
* p = (1 << pfactor), which controls how many passes to perform over the set of N chunks. Increasing p increases CPU time used.

I chose to use the log2 of each parameter as it is the common way to communicate settings (e.g. 2^20, not 1048576).

# License

Public Domain, or MIT
28 changes: 28 additions & 0 deletions scrypt-jane/code/scrypt-conf.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
pick the best algo at runtime or compile time?
----------------------------------------------
SCRYPT_CHOOSE_COMPILETIME (gcc only!)
SCRYPT_CHOOSE_RUNTIME
*/
#define SCRYPT_CHOOSE_RUNTIME


/*
hash function to use
-------------------------------
SCRYPT_BLAKE256
SCRYPT_BLAKE512
SCRYPT_SHA256
SCRYPT_SHA512
SCRYPT_SKEIN512
*/
//#define SCRYPT_SHA256


/*
block mixer to use
-----------------------------
SCRYPT_CHACHA
SCRYPT_SALSA
*/
//#define SCRYPT_SALSA
Original file line number Diff line number Diff line change
Expand Up @@ -81,17 +81,21 @@ scrypt_getROMix() {
#if defined(SCRYPT_TEST_SPEED)
static size_t
available_implementations() {
size_t cpuflags = detect_cpu();
size_t flags = 0;

#if defined(SCRYPT_CHACHA_AVX)
flags |= cpu_avx;
if (cpuflags & cpu_avx)
flags |= cpu_avx;
#endif

#if defined(SCRYPT_CHACHA_SSSE3)
flags |= cpu_ssse3;
if (cpuflags & cpu_ssse3)
flags |= cpu_ssse3;
#endif

#if defined(SCRYPT_CHACHA_SSE2)
if (cpuflags & cpu_sse2)
flags |= cpu_sse2;
#endif

Expand Down
File renamed without changes.
Loading