Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ This repository serves as a collection of things learned by people working in AG

- [Ambari Managed Ranger Will Auto-Generate Services](apache-ranger/ambari-managed-ranger-auto-generate-services.md)

### Celery

- [Getting Started with Celery](celery/about-celery.md)

### Data Warehouse

- [Data Warehouse Overview](data-warehouse/about-data-warehouse.md)

### Hadoop: HDFS

- [What is HDFS](hadoop-hdfs/basic-hdfs.md)
Expand Down
36 changes: 36 additions & 0 deletions celery/about-celery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Getting Started with Celery

**Date: June 9, 2016**

Celery is a task queue, which means it is a tool used to distribute tasks across several servers.

#### Brokers
Celery uses brokers to mediate between clients and workers. There are several choices available for brokers:
- RabbitMQ
- Redis
- Using a database
- Using SQLAlchemy
- Using the DJango database
- Other brokers
- Amazon SQS
- MongoDB
- IronMQ

#### Installing Celery
```
pip install celery
```

#### Creating a Celery instance
```
app = Celery('module_name', backend='redis://localhost', broker='redis://localhost')
```
The first argument to a Celery instance is the module name, next is the backend argument wherein Celery stores or sends the states, and last is the broker keyword that specifies the URL of the message broker.

#### Running a worker server
```
celery -A tasks worker --loglevel=info
```

#### References:
- [First Steps with Celery](http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html)
30 changes: 30 additions & 0 deletions data-warehouse/about-data-warehouse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Data Warehouse Overview

**Date: June 29, 2016**

A data warehouse is a relational database designed for query and analysis. It basically helps in the analysis of data. It makes use of ETL (Extract-Transform-Load) process, OLAP (Online Analytical Processing) engines and other tools that help in the management of data.

#### Characteristics
- Subject-Oriented
- The ability to define what the data warehouse contains
- Integrated
- Data has a consistent format therefore there are no problems regarding data inconsistency
- Nonvolatile
- Data entered into the warehouse cannot be changed
- Time Variant
- The ability to store historical data

#### Requirements
- Workload
- Designed to accommodate ad hoc queries
- Data modification
- It is updated on a regular basis by the ETL process
- Schema Design
- Uses denormalized or partially denormalized schema for optimized performance
- Typical Operations
- A typical data warehouse query scans thousands or millions of rows
- Historical Data
- Stores years worth of data for historical analysis

#### References:
- [Oracle Documentation](https://docs.oracle.com/cd/B10501_01/server.920/a96520/concept.htm#50413)