Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
16d6933
start work on making a data chunk
georgejhunt Jul 1, 2016
202a294
add python to generate csv
georgejhunt Jul 1, 2016
552b05e
lots of progress
georgejhunt Jul 2, 2016
c3327be
fixes for first demo
georgejhunt Jul 3, 2016
2358182
remove the typos in ansible upstream
georgejhunt Jul 3, 2016
d1d04dd
Merge branch 'upstream' of https://github.com/georgejhunt/xsce into u…
georgejhunt Jul 3, 2016
bf8908e
give apache permissions
georgejhunt Jul 3, 2016
9868b72
refinements
georgejhunt Jul 3, 2016
80c742b
Create README.md
georgejhunt Jul 3, 2016
a1e60f9
Update README.md
georgejhunt Jul 3, 2016
23bae63
Update README.md
georgejhunt Jul 3, 2016
76d56ef
Update README.md
georgejhunt Jul 3, 2016
d83b559
Update README.md
georgejhunt Jul 3, 2016
4e40d1f
add source email to the zips dir
georgejhunt Jul 4, 2016
a797fee
use content_base for /library
georgejhunt Jul 4, 2016
3f3203e
Update README.md
georgejhunt Jul 7, 2016
e5a4e4c
move password to a file -- make it readable by apache, but not world
georgejhunt Jul 8, 2016
1e7e9dc
cosmetic changes to upstream description
georgejhunt Jul 8, 2016
b9a76d8
Update README.md
georgejhunt Jul 8, 2016
21684a0
use our own special token in log, and systemd to start and stop
georgejhunt Jul 10, 2016
8064614
Merge branch 'upstream' of https://github.com/georgejhunt/xsce into u…
georgejhunt Jul 10, 2016
7008862
try to get the daemon working
georgejhunt Jul 10, 2016
e4eaa73
do not print ticks
georgejhunt Jul 10, 2016
8eaf266
change upstream => reports in roles
georgejhunt Jul 10, 2016
e87be7b
change upstream to reports in all
georgejhunt Jul 10, 2016
fa87257
remember 8-mgmt-tools/meta
georgejhunt Jul 10, 2016
81e29d4
fix stuff
georgejhunt Jul 10, 2016
b89cbcf
stop =>stopped
georgejhunt Jul 10, 2016
68e269a
refinement, test, until it works -- acpower
georgejhunt Jul 11, 2016
6fd282d
Merge branch 'upstream' of https://github.com/georgejhunt/xsce into u…
georgejhunt Jul 11, 2016
6f2556d
fix timestamp, rem acrecord
georgejhunt Jul 13, 2016
05e0d85
Merge branch 'upstream' of https://github.com/georgejhunt/xsce into u…
georgejhunt Jul 13, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions installer/livecd/release-6.1/F22/ks.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,11 @@ TARGET=/usr/share/anaconda/interactive-defaults.ks
# could change to sed out firstboot
echo "%include $ADDKS" >> $TARGET

<<<<<<< HEAD
echo "" > /etc/resolv.conf
=======
echo "" > /etc/hosts
>>>>>>> fa8725770f371fd7f9fa553eeb0f4ded85eb5988

%end

Expand Down
1 change: 1 addition & 0 deletions installer/livecd/release-6.1/F23/ks.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,7 @@ TARGET=/usr/share/anaconda/interactive-defaults.ks

echo "%include $ADDKS" >> $TARGET
echo "" > /etc/resolv.conf
echo "" > /etc/hosts

%end

Expand Down
1 change: 1 addition & 0 deletions roles/8-mgmt-tools/meta/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ dependencies:
- { role: phpmyadmin, tags: ['services','phpmyadmin','tools'], when: phpmyadmin_install }
- { role: awstats, tags: ['services','awstats','tools'], when: awstats_install }
- { role: teamviewer, tags: ['services','teamviewer','tools'], when: teamviewer_install }
- { role: reports, tags: ['reports'], when: reports_install }
48 changes: 48 additions & 0 deletions roles/reports/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Note for Upstream data path
**Objective**
* To provide feedback to Wikimedia about how many times a health related Android wiki reader has been downloaded from the XSCE servers onto a smart phone.

**Notes to myself**

This was a need that surfaced at the wikimedia conference in late June 2016. After a discussion on the XSCE weekly call, I cobbled together pieces of the larger solution, which evolved from that discussion. The thrust of the discussion was:

* Health workers either have, or could be provided with, smart phones, which themselves have internet connectivity at least part of the time.
* So XSCE servers could create on the demand from a smart phone browser, a zip file which included the usage information which was wanted by creators of the health wikipedia information.
* Email is part of the normal usage of almost every smart phone, and is ideally suited to intermittent internet connectivity.
* There will need to be a cloud based presence, which gathers and consolidates the email from individual deployments.

**The Initial Solution**

A combination of scripts in bash and python create the zipped smartphone download. Perhaps a weak link in this strategy is that it relies on a human being to learn how to download from the XSCE server, and how to attach that downloaded zip file to an email, and send it off (to xscenet@gmail.com).

* Upstream.wsgi -- A browser request to a url on the XSCE server (http://wikihealth.lan/data) stimulates this python program. This in turn calls a bash script which generates the zipped information package. The completed zip file is returned to the browser's request.
* mkchunk -- Is a bash script which generates the zipped requested information. In doing its job, it calls another python script (sift.py), which parses the apache log files, and selects the web server downloads that are of interest
* sift.py -- Takes advantage of apache log parsing functions, to provide flexible and easy access to the desired information. The logs will be searched for any downloads of files in /library/content/wikimed.
* harvest.py -- Runs in the cloud, and uses python imap libraries to access the gmail mail server, and fetch the uploaded email, and their attached zip files. It unzips and consolidates the information, and makes it available at http://xscenet.net/analytics

**How to install Upstream on XSCE - Release-6.1**
* Follow the normal install instructions at https://github.com/XSCE/xsce/wiki/XSCE-Installation.
* Then install and enable the "reports" software:
```
echo "reports_install: True" >> /opt/schoolserver/xsce/vars/local_vars.yml
echo "reports_enabled: True" >> /opt/schoolserver/xsce/vars/local_vars.yml
cd /opt/schoolserver/xsce
./runtags reports
```

**Assumptions:**

* The number of downloaded APK's will be less than 10000 on a per station basis. -- Until proven to be a problem, each upload will contain a full data set -- the zip deflate is very good at substituting a token for repeating text chunks. (estimated record size = 30 which implies zip file size of 300K)
* Logrotate on the servers will make apache logs disappear. So a json representation of a python dictionary will be used to accumulate and preserve data
* The apache logs will be searched for ".apk", and GET, and status= 200 success, and added to the data set if found
* Date-time + downloaded URL will the the dictionary key.
* All available apache logs will be scanned, every time a zip file is to be generated, and each record "of interest" will be checked against the dictionary.
* output records in csv have the following fields: date-time, week number, URL,UUID of server. ( week permits quick/dirty trend charts)
* Initially, during the debug phase, the apache logs (full) will be included in the zip file.
* There is nothing propritary about the uploaded data, and it does not need to be encrypted, or the download protected with password.
*Currently, the xscenet@gmail.com account is accessed via a password, which is visible at github. I need to generate a ssl public/private key pair, and put it in place for the cloud to use.
* The email address of the data collector is known at the time of harvesting email from the gmail server. It is capturesd and presented somewhere in the cloud presentation.
* The zip file can easily contain additional information -- at this point it includes
* uptime
* vnstat
* apache logs
3 changes: 3 additions & 0 deletions roles/reports/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
reports_install: False
reports_enabled: False
acpower_enabled: False
71 changes: 71 additions & 0 deletions roles/reports/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
- name: Make folder and set permission
file: path={{ item }}
owner=apache
group=root
mode=0755
state=directory
with_items:
- "{{ content_base }}/reports/html/raw_data"
- "{{ content_base }}/reports/html/zips"
- "{{ content_base }}/reports/history"
- "{{ content_base }}/reports/staging"
- "{{ content_base }}/reports/data"

- name: Put files where they belong
template: src={{ item.src }}
dest={{ item.dest }}
owner=root
group={{ item.group }}
mode={{ item.mode }}
with_items:
- { src: 'reports.conf', dest: '/etc/httpd/conf.d/', group: "root" , mode: '0644' }
- { src: 'reports.wsgi', dest: '{{ content_base }}/reports', group: "root" , mode: '0644' }
- { src: 'sift.py', dest: '{{ content_base }}/reports', group: "apache" , mode: '0755' }
- { src: 'harvest.py', dest: '{{ content_base }}/reports', group: "apache" , mode: '0755' }
- { src: 'mkchunk', dest: '{{ content_base }}/reports', group: "apache" , mode: '0755' }
- { src: 'acpower.service', dest: '/etc/systemd/system/', group: "root" , mode: '0644' }
- { src: 'heartbeat', dest: '{{ content_base }}/reports', group: "root" , mode: '0755' }
- { src: 'analyze-power', dest: '{{ content_base }}/reports', group: "root" , mode: '0755' }
- { src: 'xsce-acpower-init', dest: '/usr/libexec/', group: "root" , mode: '0755' }
- { src: 'acrecord.py', dest: '{{ content_base }}/reports', group: "root" , mode: '0755' }
when: reports_install == True

- name: enable the downloading of data chunk
template: src='reports.conf'
mode=0644
dest=/etc/httpd/conf.d/
when: reports_enabled

- name: remove config file to disable
file: path=/etc/httpd/conf.d/reports.conf
state=absent
when: not reports_enabled == True

- name: Start periodic logging of acpower
service: name=acpower
state=started
enabled=yes
when: acpower_enabled == True

- name: Stop periodic logging of acpower
service: name=acpower
state=stopped
enabled=no
when: acpower_enabled != True

- name: add reports to service list
ini_file: dest='{{ service_filelist }}'
section=reports
option='{{ item.option }}'
value='{{ item.value }}'
with_items:
- option: name
value: Upstream
- option: description
value: '"Upstream is a method of communicating usage information from the XSCE server to a centralized data collection location. It uses a smart phone to download a zipped directory of information while offline, and then later, when connected to the internet, emails that package, as an attachment, reports"'
- option: enabled
value: "{{ reports_enabled }}"
- option: installed
value: "{{ reports_install }}"
- option: acpower_enabled
value: "{{ acpower_enabled }}"
12 changes: 12 additions & 0 deletions roles/reports/templates/acpower.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[Unit]
Description=Puts heartbeat, startup, shutdown records into log
After=syslog.target

[Service]
PIDFile=/var/run/acpower.pid
ExecStart=/usr/libexec/xsce-acpower-init start
# docs say systemd sents sigterm
#ExecStop=/usr/bin/kill -s SIGINT $MAINPID

[Install]
WantedBy=multi-user.target
104 changes: 104 additions & 0 deletions roles/reports/templates/analyze-power
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/bin/env python
# read apache logs, sifting for records we want to save

import sys
from os import path
import os
import datetime
from pprint import pprint
import glob
import json
from dateutil.tz import *
from dateutil.parser import *

LOC='/library/reports'

def tstamp(dtime):
'''return a UNIX style seconds since 1970 for datetime input'''
epoch = datetime.datetime(1970, 1, 1,tzinfo=tzutc())
newdtime = dtime.astimezone(tzutc())
since_epoch_delta = newdtime - epoch
return since_epoch_delta.total_seconds()

# fetch the dictionary of previous downloads if it exists
if path.isfile(path.join(LOC,"data","downloads")):
strm = open(path.join(LOC,"data","downloads"),"r")
downloads = json.load(strm)
else: downloads = {}
added = 0

# get the UUID of this machine
with open("/etc/xsce/uuid", "r") as infile:
uuid=infile.read()

filedata = []
# traverse the apache logs and get the date in first record
# The following will need to change for Fedora
for fn in glob.glob('/var/log/messages*'):
for line in open(fn, 'r'):
datestr = line[0:15]
try:
dt= parse(datestr)
except:
continue
dt = dt.replace(tzinfo=tzlocal())
timestamp = tstamp(dt)
filedata.append( (timestamp,fn) )
#print( line[0:14], dt.strftime("%y%m%d"))
break


#pprint(filedata)
# traverse the apache logs
started = False
for ts,fn in sorted(filedata):
for line in open(fn, 'r'):

datestr = line[0:15]
try:
dt= parse(datestr)
except:
continue
dt = dt.replace(tzinfo=tzlocal())
timestamp = tstamp(dt)
if not started:
last_heartbeat = timestamp
last_powerdown_timestamp = timestamp
last_powerup = timestamp
started = True
# look for a start up record in the log
nibbles = line.split()
#pprint(nibbles)
if nibbles[4] != 'root:': continue
if nibbles[5] == 'xsce_startup':
last_powerup = timestamp
print timestamp, "startup. Downtime:", timestamp - last_heartbeat,line[0:14]
"""
if last_heartbeat == last_powerdown_timestamp:
print 'normal shutdown'
else:
print 'power interruption shutdown'
"""
elif nibbles[5] == "xsce_tick":
#print timestamp,'tick',line[0:14]
last_heartbeat = timestamp
elif nibbles[5] == 'xsce_shutdown' :

print timestamp, 'Normal system shutdown',line[0:14],"Powered seconds: ",timestamp - last_powerup
last_heatbeat = timestamp
last_powerdown_timestamp = timestamp

# now store away the accumulated data

with open(path.join(LOC,"data","downloads"),"w") as outfile:
json.dump(downloads, outfile)

# now create the final csv file
outfile = open(path.join(LOC,"staging","downloads_csv"),'w')

for key in sorted(downloads):
outfile.write("%s,%s,%s,%s,\n" % (downloads[key]["time"],\
downloads[key]["week"],\
downloads[key]["url"], uuid.rstrip(), ))

# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 background=light
5 changes: 5 additions & 0 deletions roles/reports/templates/cp2git
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash -x
# copy scripts to the reports role in xsce git
for f in `find /library/reports -maxdepth 1 -type f`; do
cp $f /opt/schoolserver/xsce/roles/reports/templates/
done
124 changes: 124 additions & 0 deletions roles/reports/templates/harvest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#!/bin/env python
# fetch email from xscenet@gmail.com, unzip it into raw_data
import glob
import imaplib
import email
import os
import zipfile
import json
from time import sleep

# go get the password for interaction with xscenet@gmail.com
with open("/root/.xscenet_gmail",'r') as passwd:
credential = passwd.read()
upenv = "{{ content_base }}/reports"
zips_dir = os.path.join(upenv,"html","zips")
raw_dir = os.path.join(upenv,"html","raw_data")
m = imaplib.IMAP4_SSL('imap.gmail.com')
m.login('xscenet@gmail.com', credential)
# declare location of dictionry with all our download data
# -- the key for dictionary is datetime+download_url
download_data = os.path.join(upenv,"downloads.json")

def merge_data(filename=None):
if filename == None:
print("no filename in merge_data")
return
# fetch the dictionary of previous downloads if it exists
if os.path.isfile(download_data):
with open(download_data,"r") as strm:
downloads = json.load(strm)
else: downloads = {}
added = 0

for line in open(filename, 'r'):
data_chunks = line.split(',')
# put the data in the dictionary
key = data_chunks[0] + data_chunks[2]
if not key in downloads:
downloads[key] = {"time": data_chunks[0],
"week": data_chunks[1],
"url": data_chunks[2],
"uuid": data_chunks[3],
}
added += 1
else:
continue
print("added records to data store: %s" % added)

# now store away the accumulated data
with open(os.path.join(download_data),"w") as outfile:
json.dump(downloads, outfile)


m.select("[Gmail]/All Mail")

resp, items = m.search(None, "(ALL)")
items = items[0].split()

for emailid in items:
resp, data = m.fetch(emailid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
temp = m.store(emailid,'+FLAGS', '\\Seen')
m.expunge()

if mail.get_content_maintype() != 'multipart':
continue

# print "["+mail["From"]+"] :" + mail["Subject"]

for part in mail.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue

filename = part.get_filename()
original_zip_dir = filename[:-4]
# put insert the from into the zip file name
s=mail["From"]
sender = s[s.find("<")+1:s.find(">")]
filename = filename[:-4] + "-" + sender + '.zip'
att_path = os.path.join(zips_dir, filename)

if not os.path.isfile(att_path) :
print("writing: ",att_path)
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()

# go through the zip files, and expand them if not already expanded
for zf in glob.glob(zips_dir+"/*"):
# get the directory name we want in raw directory
raw_base = os.path.join(raw_dir,original_zip_dir)
if not os.path.isdir(raw_base):
with zipfile.ZipFile(zf,"r") as cpzip:
cpzip.extractall(raw_dir)
# now merge the data in the downloads.csv with our data_store at
csv = os.path.join(raw_base,"downloads_csv")
breakout = 0
while True:
if os.path.isfile(csv):
break
sleep(.2)
breakout += 1
if breakout > 10:
break
if breakout > 10:
print "failed to find %s" % csv
raise
merge_data(csv)


# regenerate the publicly visible merged data from all reporters
with open(download_data,"r") as strm:
downloads = json.load(strm)
with open(os.path.join(upenv,"html","downloads.csv.txt"),"w") as outfile:
for dl in sorted(downloads.keys()):

outfile.write("%s,%s,%s,%s\n" % (downloads[dl]["time"],\
downloads[dl]["week"],\
downloads[dl]["url"],\
downloads[dl]["uuid"],\
))
Loading