GitHub

Project replicating the stored rpocedure: ABCS.AON_AROAS_DASH.create_optimization_report_test_ip_during_campaign. Utilizes a modular architecture to handle dependencies separately interactively crating documentation of the used assets, while replicating the original pipeline workflow.

*General architecture (abbreviated)

graph TD A[Start] --> B(Load Config) B --> C(Establish Connection) C --> D(Run Executor) D --> E[End] Components:

Main

Is in charge of calling the connector.py, for establishing connection to snowflake, in charge of calling loader.py to load the configuration file, in charge of calling executor.py to run the pipeline.

Config

It saves in profiles/ the different condigurations in json format, each configuration contains the task in order to execute and the attributes to be set to run the pipeline. loader.py loads these files and returns the respective dictionary.

Connection

It establishes connection to snowflake giving the cursor and the session.

Executor

This directory contains several components whose main task is to execute the pipeline and log the intermediate results.
stage_registry.py is a dictionary giving for each task name the respective function
database_utils.py are auxiliary functions that update the config, execute queries and log which task is done, with timestamp, tables created and tables read from.

-executor.py that loops through the tasks in the config dictionary, passing on the config for each task and the context that is updated along the pipeline.

-stages/ contains the functions which are of two types, functions that execute processes in procedures.py and functions that return queries in create.py to create tables and get.pythat return queries to fetch information.

Logging

the format of the tasks log includes the timestamp of the task, then the name of the task, the first 70 characters of the query, the created tables, specifing persistent and temporal ones, and the tables that the query uses in its execution. This format is applied for each task, with exception of the firts task get campaign and the last task smooth values. In the final line of the logging file the ran_id identifier is printed, however it can also be extracted from the last 6 characters of each logged created persistent table.

Naming of persistent tables

All persistent tables' names are identified with the project id + tag_campaign id, additionally for the project to allow parallel non-conflicting runs at the same time of the same project and campaign a unique identifier is appended to each (persistent) table's name. This identifier will be printed at the end of the log file to be able to locate the respective tables. #---------------------------------------

Future imporvements

Mantaining a dictionary for data assets keeping an hierarchy inherited from the degree of manipulation that a table contains.

Modularizing functions/query generation by different tasks: ** tables

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
executor		executor
logs		logs
trials		trials
lineage.json		lineage.json
main.py		main.py
readme.md		readme.md
reference.txt		reference.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Main

Config

Connection

Executor

Logging

Naming of persistent tables

Future imporvements

About

Uh oh!

Releases

Packages

Languages

capi1206/master_ETL_sql

Folders and files

Latest commit

History

Repository files navigation

Main

Config

Connection

Executor

Logging

Naming of persistent tables

Future imporvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages