Data Tools with Twitter Ingestion Server, Twitter GeoTagger and AsterixDB Ingestion Server.#807
Data Tools with Twitter Ingestion Server, Twitter GeoTagger and AsterixDB Ingestion Server.#807baiqiushi wants to merge 17 commits intoISG-ICS:masterfrom
Conversation
…at any client can subscribe to get realtime ingestion stream
|
NOT ready to be merged |
Codecov Report
@@ Coverage Diff @@
## master #807 +/- ##
=======================================
Coverage 63.91% 63.91%
=======================================
Files 75 75
Lines 4076 4076
Branches 355 355
=======================================
Hits 2605 2605
Misses 1471 1471 Continue to review full report at Codecov.
|
…will allow skipping printing those tweets that can not be geotagged; Move TwitterGeoTaggerTest to test folder;
Codecov Report
@@ Coverage Diff @@
## master #807 +/- ##
=======================================
Coverage 63.91% 63.91%
=======================================
Files 75 75
Lines 4076 4076
Branches 355 355
=======================================
Hits 2605 2605
Misses 1471 1471 Continue to review full report at Codecov.
|
…on output file rotation in TwitterIngestioinServer; (2) add parameter for switching between general Twitter and TwitterMap output format in AsterixDBIngestionDriver; (3) Fix the issue of the unexpected end of file for output gzip files in TwitterIngestionServer;
…does not wait for the WebsocketClient to long live waiting for tweets from the Proxy server; (2) fix the bug in AsterixDBAdapterForTwitterMap that the schema should be initilized in the constructor;
… TwitterGeoTagger.
…nsafe issue in AsterixDBAdapterForTWitterMap and AsterixDBAdapterForTwitter.
Data Tools
Data Tools is a new module consisting of 3 components that serve the data preparation of the TwitterMap application.
Twitter Ingestion Server
Twitter Ingestion Server is a daemon service that can ingest real-time tweets from Twitter Filter Stream API into local gzip files in a daily rotation manner.
It is also a light-weight HTTP server with 3 endpoints:
/stats- HTTP GET endpoint that returns current ingestion status information in JSON format./proxy- WebSocket endpoint that pushes real-time tweets to any client in connection./- HTTP GET endpoint that returns anindex.htmlas an example page demonstrating the usage of the above two endpoints.Twitter GeoTagger
Twitter GeoTagger is Java program to geoTag Twitter JSON with
{stateID, stateName, countyID, countyName, cityID, cityName}.It has 2 modes,
tagOneTweetthat can be called from other programs;AsterixDB Ingestion Server
TBD.