From ec8f77ac7ddabc28cf42c7d7c4cde5acffe905ab Mon Sep 17 00:00:00 2001 From: Matthew Marcos Date: Wed, 29 Jun 2016 14:35:59 +0800 Subject: [PATCH 1/4] ETL basic --- README.md | 4 ++++ etl/etl-basic.md | 21 +++++++++++++++++++++ 2 files changed, 25 insertions(+) create mode 100644 etl/etl-basic.md diff --git a/README.md b/README.md index 5e47840..ca1fdc4 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,10 @@ This repository serves as a collection of things learned by people working in AG - [Basic SQLite3](sqlite3/basic-sqlite3.md) +### ETL + +- [Basic SQLite3](etl/basic-etl.md) + ## Contributing If you want to share your learnings for today, please check out CONTRIBUTING.md diff --git a/etl/etl-basic.md b/etl/etl-basic.md new file mode 100644 index 0000000..eb9e20a --- /dev/null +++ b/etl/etl-basic.md @@ -0,0 +1,21 @@ +# ETL - Basic + +**Date: (June 29, 2016)** + +ETL or *Extract - Transform - Load* is a process done in data warehousing. It is used when: +- You want to aggregate data from different sources into one collection (the warehouse) +- You want to select and arrange only the pertinent information for any kind of analytic, and provide his/her own user view of the data (Data mart) +- You want to clean the data and make meaningful sense out of it. + +ETL can be broken down to three major steps: +1. Extract + - The part where you gather data from different data sources (csv files, databases, etc...). Data can also come from different data warehouses. + - It should be designed to avoid negative effects on source system such as its performance, response time, or any kind of data locking. +2. Transform + - Making the extracted data usable. + - This includes mapping the data, matching rows, enhancing data, summarizing data, etc. + - Transformation also includes standardizing data (such as currency and time formats) and handling encoding +3. Load + - Fetches prepared data and storing them to the data warehouse and database, or data mart. + +Source: [ETL Tutorial | Extract Transform and Load](https://www.youtube.com/watch?v=WZw0OTgCBOY) From 7007b9761538eeb2df2f9de9d93ce0d38d6803a7 Mon Sep 17 00:00:00 2001 From: Matthew Marcos Date: Wed, 29 Jun 2016 14:37:53 +0800 Subject: [PATCH 2/4] Fixed link name for ETL Basic --- README.md | 2 +- etl/{etl-basic.md => basic-etl.md} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename etl/{etl-basic.md => basic-etl.md} (100%) diff --git a/README.md b/README.md index ca1fdc4..f0dd4d1 100644 --- a/README.md +++ b/README.md @@ -58,7 +58,7 @@ This repository serves as a collection of things learned by people working in AG ### ETL -- [Basic SQLite3](etl/basic-etl.md) +- [Basic ETL](etl/basic-etl.md) ## Contributing diff --git a/etl/etl-basic.md b/etl/basic-etl.md similarity index 100% rename from etl/etl-basic.md rename to etl/basic-etl.md From 089ac6c1623feaed64c9aaca0bf6b754acaa6b0d Mon Sep 17 00:00:00 2001 From: Matthew Marcos Date: Wed, 29 Jun 2016 14:41:06 +0800 Subject: [PATCH 3/4] Update formatting --- etl/basic-etl.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/etl/basic-etl.md b/etl/basic-etl.md index eb9e20a..2488386 100644 --- a/etl/basic-etl.md +++ b/etl/basic-etl.md @@ -8,13 +8,16 @@ ETL or *Extract - Transform - Load* is a process done in data warehousing. It is - You want to clean the data and make meaningful sense out of it. ETL can be broken down to three major steps: + 1. Extract - The part where you gather data from different data sources (csv files, databases, etc...). Data can also come from different data warehouses. - It should be designed to avoid negative effects on source system such as its performance, response time, or any kind of data locking. + 2. Transform - Making the extracted data usable. - This includes mapping the data, matching rows, enhancing data, summarizing data, etc. - Transformation also includes standardizing data (such as currency and time formats) and handling encoding + 3. Load - Fetches prepared data and storing them to the data warehouse and database, or data mart. From 136133f7cde0355cfd26e7ba8220d747018d1b97 Mon Sep 17 00:00:00 2001 From: Almer Mendoza Date: Wed, 29 Jun 2016 17:43:01 +0800 Subject: [PATCH 4/4] Added TIL for Physical and Logical Design of Data Warehouse --- README.md | 5 ++++ data-warehouse/logical-design.md | 34 +++++++++++++++++++++ data-warehouse/physical-design.md | 49 +++++++++++++++++++++++++++++++ 3 files changed, 88 insertions(+) create mode 100644 data-warehouse/logical-design.md create mode 100644 data-warehouse/physical-design.md diff --git a/README.md b/README.md index f0dd4d1..57118bc 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,11 @@ This repository serves as a collection of things learned by people working in AG - [Basic SQLite3](sqlite3/basic-sqlite3.md) +### Data Warehouse + +- [Logical Design for Data Warehouse](data-warehouse/logical-design.md) +- [Physical Design for Data Warehouse](data-warehouse/physical-design.md) + ### ETL - [Basic ETL](etl/basic-etl.md) diff --git a/data-warehouse/logical-design.md b/data-warehouse/logical-design.md new file mode 100644 index 0000000..ab50b26 --- /dev/null +++ b/data-warehouse/logical-design.md @@ -0,0 +1,34 @@ +# Data Warehouse: Logical Design + +**Date: June 29, 2016** + +### Logical Design + +Being able to identify logical relationships between objects. + +#### Factors to consider + +- data content +- relationships of data +- data warehouse environment +- data transformation requirement +- frequency of refresh + +#### Components + +- entity +- attribute +- relationship + +#### Output + +- should present entity and attributes as fact tables and dimensions +- should be able to have a model of data from source to subjective information + +### References +- [Oracle9i Data Warehousing Guide](https://docs.oracle.com/cd/B10501_01/server.920/a96520/toc.htm) +- [Intricity101 videos](https://www.youtube.com/user/Intricity101/videos) + +### Author + +Almer Mendoza diff --git a/data-warehouse/physical-design.md b/data-warehouse/physical-design.md new file mode 100644 index 0000000..873c73c --- /dev/null +++ b/data-warehouse/physical-design.md @@ -0,0 +1,49 @@ +# Data Warehouse: Physical Design + +**Date: June 29, 2016** + +### Physical Design + +Taking ways and effectivity of storage into consideration. + +#### Structures + +###### Tablespaces + +Tablespaces are container of Physical Design structures. + +###### Tables and partitioned tables + +Tables and Partioned Tables are container of raw data. They are the basic unit of storage. + +###### Data Segment Compression + +Ensures that speed and time spent on execution queries must increase and decrease, respectively. + +###### Views + +Visualizes data using tables. + +###### Integrity Constraints + +Adds rules on data manipulation to avoid invalid information. + +###### Dimensions + +Schema object defining relationships between fields. + +###### Materialized Views + +Does advance calculations and creates summaries to avoid expensive aggregate operations. + +###### Indexes and Partitioned Indexes + +Use of indexes to further partition table. Usuaully uses binary digit to signify on what category (or if it is part of the table category). + +### References +- [Oracle9i Data Warehousing Guide](https://docs.oracle.com/cd/B10501_01/server.920/a96520/toc.htm) +- [Intricity101 videos](https://www.youtube.com/user/Intricity101/videos) + +### Author + +Almer Mendoza