bigData

* This is a beta version, and is still under construction *

March 8th 2023: Still debugging this package. For some reason the functions are not actually downloading with the package pull on external machines. I've worked through many possible solutions with git hub experts and we still can't quite get them to pull. I'll likely just rebuild this near-from-scratch and see if that remedies the problem.

################################################################################

bigData

################################################################################

About

################################################################################

This package is designed to simplify data loading and processing in places where other data processing types are memory intensive. In many cases, but not always command line tools can be just as fast (or nearly) as many C++ data parsers, but updating C++ libraries across operating systems can fail, even temporarily at times. I created this list of functions to do my large data work at ORNL, but now it's being converted into a package to manage data using common commandline functions and tools. This is also my first R package release, so it's a work and learning process.

About half of the tools use commandline bash, awk, head-cut, etc, processes to simplify installation and lessen the backend installation requirements. Also, exceptionally large files or files larger than memory can be parsed and filtered iteratively using temporary files. Many common data preview or profiling techniques are replicated with R functions and command line queries, an example would be the summary() function, but without actually loading the file to memory. Much of the ongoing development of this package's backend will be parallelized, likely with GNU parallel, rather than internal parallelization to prevent any non-concurence processes, and perfect file/memory object recapitulation.

The second half of tools are within R functions that can work much the same as other packages such as dplyr, etc. However, I find these functions much more user friendly, taking a one-liner long command, and making it much simpler in function form.

The easiest and preferred way to use this package is to create a directory or R project to operate in. Like the Raster package it uses lots of temporary files to process data. Unlike many temporary file packages, sometimes temp files may not get deleted between sessions and slowly fills your hard drive, or require specific temporary file removal commands [which is just good habit]. Within set directories/projects means you can see and delete any residual temporary files, and/or archive them if needed later.

################################################################################

Caveats

################################################################################

Current form is Unix/Linux or Windows with bash installed

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
R		R
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
bigData.Rproj		bigData.Rproj
openjdk.jdk		openjdk.jdk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

* This is a beta version, and is still under construction *

bigData

About

Caveats

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

***** This is a beta version, and is still under construction *****

bigData

About

Caveats

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

* This is a beta version, and is still under construction *

Packages