diff --git a/README.md b/README.md index 5e47840..57118bc 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,15 @@ This repository serves as a collection of things learned by people working in AG - [Basic SQLite3](sqlite3/basic-sqlite3.md) +### Data Warehouse + +- [Logical Design for Data Warehouse](data-warehouse/logical-design.md) +- [Physical Design for Data Warehouse](data-warehouse/physical-design.md) + +### ETL + +- [Basic ETL](etl/basic-etl.md) + ## Contributing If you want to share your learnings for today, please check out CONTRIBUTING.md diff --git a/data-warehouse/logical-design.md b/data-warehouse/logical-design.md new file mode 100644 index 0000000..ab50b26 --- /dev/null +++ b/data-warehouse/logical-design.md @@ -0,0 +1,34 @@ +# Data Warehouse: Logical Design + +**Date: June 29, 2016** + +### Logical Design + +Being able to identify logical relationships between objects. + +#### Factors to consider + +- data content +- relationships of data +- data warehouse environment +- data transformation requirement +- frequency of refresh + +#### Components + +- entity +- attribute +- relationship + +#### Output + +- should present entity and attributes as fact tables and dimensions +- should be able to have a model of data from source to subjective information + +### References +- [Oracle9i Data Warehousing Guide](https://docs.oracle.com/cd/B10501_01/server.920/a96520/toc.htm) +- [Intricity101 videos](https://www.youtube.com/user/Intricity101/videos) + +### Author + +Almer Mendoza diff --git a/data-warehouse/physical-design.md b/data-warehouse/physical-design.md new file mode 100644 index 0000000..873c73c --- /dev/null +++ b/data-warehouse/physical-design.md @@ -0,0 +1,49 @@ +# Data Warehouse: Physical Design + +**Date: June 29, 2016** + +### Physical Design + +Taking ways and effectivity of storage into consideration. + +#### Structures + +###### Tablespaces + +Tablespaces are container of Physical Design structures. + +###### Tables and partitioned tables + +Tables and Partioned Tables are container of raw data. They are the basic unit of storage. + +###### Data Segment Compression + +Ensures that speed and time spent on execution queries must increase and decrease, respectively. + +###### Views + +Visualizes data using tables. + +###### Integrity Constraints + +Adds rules on data manipulation to avoid invalid information. + +###### Dimensions + +Schema object defining relationships between fields. + +###### Materialized Views + +Does advance calculations and creates summaries to avoid expensive aggregate operations. + +###### Indexes and Partitioned Indexes + +Use of indexes to further partition table. Usuaully uses binary digit to signify on what category (or if it is part of the table category). + +### References +- [Oracle9i Data Warehousing Guide](https://docs.oracle.com/cd/B10501_01/server.920/a96520/toc.htm) +- [Intricity101 videos](https://www.youtube.com/user/Intricity101/videos) + +### Author + +Almer Mendoza diff --git a/etl/basic-etl.md b/etl/basic-etl.md new file mode 100644 index 0000000..2488386 --- /dev/null +++ b/etl/basic-etl.md @@ -0,0 +1,24 @@ +# ETL - Basic + +**Date: (June 29, 2016)** + +ETL or *Extract - Transform - Load* is a process done in data warehousing. It is used when: +- You want to aggregate data from different sources into one collection (the warehouse) +- You want to select and arrange only the pertinent information for any kind of analytic, and provide his/her own user view of the data (Data mart) +- You want to clean the data and make meaningful sense out of it. + +ETL can be broken down to three major steps: + +1. Extract + - The part where you gather data from different data sources (csv files, databases, etc...). Data can also come from different data warehouses. + - It should be designed to avoid negative effects on source system such as its performance, response time, or any kind of data locking. + +2. Transform + - Making the extracted data usable. + - This includes mapping the data, matching rows, enhancing data, summarizing data, etc. + - Transformation also includes standardizing data (such as currency and time formats) and handling encoding + +3. Load + - Fetches prepared data and storing them to the data warehouse and database, or data mart. + +Source: [ETL Tutorial | Extract Transform and Load](https://www.youtube.com/watch?v=WZw0OTgCBOY)