Efficiency on large datasets

As far as cacheing data, a compromise between a huge inefficient DataFrame and SQL might be HDF5 (pytables). This can provide hierarchial data storage on the drive freeing up memory and allowing for querying through pandas. Please do some research.

Heres a few links to get you going:
[Fast Data Mining](https://ep2013.europython.eu/media/conference/slides/fast-data-mining-with-pytables-and-pandas.pdf)
[HDF5 Pandas cookbook](http://pandas.pydata.org/pandas-docs/dev/cookbook.html#cookbook-hdf)
[Pytables](http://www.pytables.org/moin)
[Large Datasets](http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiency on large datasets #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Efficiency on large datasets #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions