The Beast

The Beast is an experimental, flexible, declarative-oriented toolkit to read machine-readable data from the various sources and transform them into follow-the-money entities (FTM).

~~Do not rely on this one until it is out of alpha. Everything is very volatile.~~

The Beast is currently in beta and is quite stable. While we can foresee some changes to the mapping format to allow for better flexibility, we are slow to implement them, and we are cautious.

The Beast is battle-tested. Complete documentation is available here.

Current status

High priority

Ingest from databases (mongo, postgres) using SQLAlchemy or PeeWee
Tests for the databases ingest
Basic CLI
Signals on exceptions and policy for the incorrectly parsed entity values (drop, drop all, drop entity, reraise)
Tests for the signals
Stats collector (number of signals of each type, number of invalid entities, etc)
Packaging (partially done in packaging_and_spark_integration branch)
Documentation (@legless, your notes will be very valuable)

Low priority

Advanced ingest routines: regex validation to discard values that do not pass the test?
Tests for the resolver wrappers

Done

Running tests

pip install -r requirements.txt
python -m pytest

Run using Docker

The /bin/ directory contains scripts to run Beast inside a Docker container.

Use /bin/run data/mapping.yaml to run Beast with selected mapping. Note: mapping and source file(s) must be in the Beast root (sub-)directory. E.g. ./data/mapping.yaml You can't point Beast to a file outside its root directory.

Use /bin/tests to run tests.

Use /bin/black to run black to format source files before contributing a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
bin		bin
docs		docs
thebeast		thebeast
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
beast.py		beast.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
sample_records.py		sample_records.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Beast

More reading

Current status

High priority

Low priority

Running tests

Run using Docker

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

dchaplinsky/thebeast

Folders and files

Latest commit

History

Repository files navigation

The Beast

More reading

Current status

High priority

Low priority

Running tests

Run using Docker

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages