- Crawl news from multi suppliers
SkyNews: technology newshttp://feeds.skynews.com/feeds/rss/technology.xmlIT News: it newshttps://www.itnews.com.au/RSS/rss.ashx"
- Store news into the
DISKas rss file named witharticle.rss - Also store news into the DB by using
MySQL, about tables structure, refer to shadow-news-entity project - Multi suppliers crawlers can be executing by IDE or executing by shell scripts.
- This project is built base on
DDD-ArchitectureandTDD programming. Built by some layers like asDomain,Infrastructure,interface,application, andrepository. - This project are using JPA and EclipseLink by
infrastructure layerto access MySQL DB ROMEis used as a library for working withrss fileJava 8is the main language level
If there are any problems, please feel free to contact to me or create new pull request
- Email: trungvu.inside@gmail.com
- git clone git@github.com:chariot9/shadow-news-crawler.git
Execute: mvn clean install
- Run main method in
NewsBootstrapby adding parameters to main class like as/data/news/skynews 1 20170524 20170528 - About the meaning of each parameters:
args[0]: folder to store news file in the the diskargs[1]: SupplierID, forSkyNewsis 1 andITNewsis 2args[2]: Date for getting news with published from itargs[3]: Date for getting news with published to it
To set up all necessary environment and build jar file, run the following shell script in project folder:
./shell/release
To run the program, execute the following command:
cd /data/shell/shadow/execute && ./diamond_exe.sh
Created by Trung, Yokohama Japan 2017