Upstream #747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

georgejhunt wants to merge 32 commits into XSCE:release-6.1 from georgejhunt:upstream

installer/livecd/release-6.1/F22/ks.cfg

-Original file line number
+Diff line change
@@ Expand Up / @@ -243,7 +243,11 @@ TARGET=/usr/share/anaconda/interactive-defaults.ks @@
     # could change to sed out firstboot
     echo "%include $ADDKS" >> $TARGET
+    <<<<<<< HEAD
     echo "" > /etc/resolv.conf
+    =======
+    echo "" > /etc/hosts
+    >>>>>>> fa8725770f371fd7f9fa553eeb0f4ded85eb5988
     %end
@@ Expand Down @@

installer/livecd/release-6.1/F23/ks.cfg

-Original file line number
+Diff line change
@@ Expand Up / @@ -258,6 +258,7 @@ TARGET=/usr/share/anaconda/interactive-defaults.ks @@
     echo "%include $ADDKS" >> $TARGET
     echo "" > /etc/resolv.conf
+    echo "" > /etc/hosts
     %end
@@ Expand Down @@

roles/8-mgmt-tools/meta/main.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -8,3 +8,4 @@ dependencies: @@
        - { role: phpmyadmin, tags: ['services','phpmyadmin','tools'], when: phpmyadmin_install }
        - { role: awstats, tags: ['services','awstats','tools'], when: awstats_install }
        - { role: teamviewer, tags: ['services','teamviewer','tools'], when: teamviewer_install }
+       - { role: reports, tags: ['reports'], when: reports_install }

roles/reports/README.md

-Original file line number
+Diff line change
@@ -0,0 +1,48 @@
+    # Note for Upstream data path
+    **Objective**
+    * To provide feedback to Wikimedia about how many times a health related Android wiki reader has been downloaded from the XSCE servers onto a smart phone.
+    **Notes to myself**
+    This was a need that surfaced at the wikimedia conference in late June 2016. After a discussion on the XSCE weekly call, I cobbled together pieces of the larger solution, which evolved from that discussion. The thrust of the discussion was:
+      * Health workers either have, or could be provided with, smart phones, which themselves have internet connectivity at least part of the time.
+      * So XSCE servers could create on the demand from a smart phone browser, a zip file which included the usage information which was wanted by creators of the health wikipedia information.
+      * Email is part of the normal usage of almost every smart phone, and is ideally suited to intermittent internet connectivity.
+      * There will need to be a cloud based presence, which gathers and consolidates the email from individual deployments.
+    **The Initial Solution**
+    A combination of scripts in bash and python create the zipped smartphone download. Perhaps a weak link in this strategy is that it relies on a human being to learn how to download from the XSCE server, and how to attach that downloaded zip file to an email, and send it off (to xscenet@gmail.com).
+      * Upstream.wsgi -- A browser request to a url on the XSCE server (http://wikihealth.lan/data) stimulates this python program. This in turn calls a bash script which generates the zipped information package. The completed zip file is returned to the browser's request.
+      * mkchunk -- Is a bash script which generates the zipped requested information. In doing its job, it calls another python script (sift.py), which parses the apache log files, and selects the web server downloads that are of interest
+      * sift.py -- Takes advantage of apache log parsing functions, to provide flexible and easy access to the desired information. The logs will be searched for any downloads of files in /library/content/wikimed.
+      * harvest.py -- Runs in the cloud, and uses python imap libraries to access the gmail mail server, and fetch the uploaded email, and their attached zip files. It unzips and consolidates the information, and makes it available at http://xscenet.net/analytics
+    **How to install Upstream on XSCE - Release-6.1**
+      * Follow the normal install instructions at https://github.com/XSCE/xsce/wiki/XSCE-Installation.
+      * Then install and enable the "reports" software:
+    ```
+      echo "reports_install: True" >> /opt/schoolserver/xsce/vars/local_vars.yml
+      echo "reports_enabled: True" >> /opt/schoolserver/xsce/vars/local_vars.yml
+      cd /opt/schoolserver/xsce
+      ./runtags reports
+    ```
+    **Assumptions:**
+    * The number of downloaded APK's will be less than 10000 on a per station basis. -- Until proven to be a problem, each upload will contain a full data set -- the zip deflate is very good at substituting a token for repeating text chunks. (estimated record size = 30 which implies zip file size of 300K)
+    * Logrotate on the servers will make apache logs disappear. So a  json representation of a python dictionary will be used to accumulate and preserve data
+    * The apache logs will be searched for ".apk", and GET, and status= 200 success, and added to the data set if found
+    * Date-time + downloaded URL will the the dictionary key.
+    * All available apache logs will be scanned, every time a zip file is to be generated, and each record "of interest" will be checked against the dictionary.
+    * output records in csv have the following fields: date-time, week number, URL,UUID of server. ( week permits quick/dirty trend charts)
+    * Initially, during the debug phase, the apache logs (full) will be included in the zip file.
+    * There is nothing propritary about the uploaded data, and it does not need to be encrypted, or the download protected with password.
+    *Currently, the xscenet@gmail.com account is accessed via a password, which is visible at github. I need to generate a ssl public/private key pair, and put it in place for the cloud to use.
+    * The email address of the data collector is known at the time of harvesting email from the gmail server. It is capturesd and presented somewhere in the cloud presentation.
+    * The zip file can easily contain additional information -- at this point it includes
+        * uptime
+        * vnstat
+        * apache logs

roles/reports/defaults/main.yml

-Original file line number
+Diff line change
@@ -0,0 +1,3 @@
+    reports_install: False
+    reports_enabled: False
+    acpower_enabled: False

roles/reports/tasks/main.yml

-Original file line number
+Diff line change
@@ -0,0 +1,71 @@
+    - name: Make folder and set permission
+      file: path={{ item }}
+            owner=apache
+            group=root
+            mode=0755
+            state=directory
+      with_items:
+        - "{{ content_base }}/reports/html/raw_data"
+        - "{{ content_base }}/reports/html/zips"
+        - "{{ content_base }}/reports/history"
+        - "{{ content_base }}/reports/staging"
+        - "{{ content_base }}/reports/data"
+    - name: Put files where they belong
+      template: src={{ item.src }}
+                dest={{ item.dest }}
+                owner=root
+                group={{ item.group }}
+                mode={{ item.mode }}
+      with_items:
+        - { src: 'reports.conf', dest: '/etc/httpd/conf.d/', group: "root" , mode: '0644' }
+        - { src: 'reports.wsgi', dest: '{{ content_base }}/reports', group: "root" , mode: '0644' }
+        - { src: 'sift.py', dest: '{{ content_base }}/reports', group: "apache" , mode: '0755' }
+        - { src: 'harvest.py', dest: '{{ content_base }}/reports', group: "apache" , mode: '0755' }
+        - { src: 'mkchunk', dest: '{{ content_base }}/reports', group: "apache" , mode: '0755' }
+        - { src: 'acpower.service', dest: '/etc/systemd/system/', group: "root" , mode: '0644' }
+        - { src: 'heartbeat', dest: '{{ content_base }}/reports', group: "root" , mode: '0755' }
+        - { src: 'analyze-power', dest: '{{ content_base }}/reports', group: "root" , mode: '0755' }
+        - { src: 'xsce-acpower-init', dest: '/usr/libexec/', group: "root" , mode: '0755' }
+        - { src: 'acrecord.py', dest: '{{ content_base }}/reports', group: "root" , mode: '0755' }
+      when: reports_install == True
+    - name: enable the downloading of data chunk
+      template: src='reports.conf'
+                mode=0644
+                dest=/etc/httpd/conf.d/
+      when: reports_enabled
+    - name: remove config file to disable
+      file: path=/etc/httpd/conf.d/reports.conf
+            state=absent
+      when: not reports_enabled == True
+    - name: Start periodic logging of acpower
+      service: name=acpower
+               state=started
+               enabled=yes
+      when: acpower_enabled == True
+    - name: Stop periodic logging of acpower
+      service: name=acpower
+               state=stopped
+               enabled=no
+      when: acpower_enabled != True
+    - name: add reports to service list
+      ini_file: dest='{{ service_filelist }}'
+                section=reports
+                option='{{ item.option }}'
+                value='{{ item.value }}'
+      with_items:
+        - option: name
+          value: Upstream
+        - option: description
+          value: '"Upstream is a method of communicating usage information from the XSCE server to a centralized data collection location.  It uses a smart phone to download a zipped directory of information while offline, and then later, when connected to the internet, emails that package, as an attachment, reports"'
+        - option: enabled
+          value: "{{ reports_enabled }}"
+        - option: installed
+          value: "{{ reports_install }}"
+        - option: acpower_enabled
+          value: "{{ acpower_enabled }}"

roles/reports/templates/acpower.service

-Original file line number
+Diff line change
@@ -0,0 +1,12 @@
+    [Unit]
+    Description=Puts heartbeat, startup, shutdown records into log
+    After=syslog.target
+    [Service]
+    PIDFile=/var/run/acpower.pid
+    ExecStart=/usr/libexec/xsce-acpower-init start
+    # docs say systemd sents sigterm
+    #ExecStop=/usr/bin/kill -s SIGINT $MAINPID
+    [Install]
+    WantedBy=multi-user.target

roles/reports/templates/analyze-power

-Original file line number
+Diff line change
@@ -0,0 +1,104 @@
+    #!/bin/env python
+    # read apache logs, sifting for records we want to save
+    import sys
+    from os import path
+    import os
+    import datetime
+    from pprint import pprint
+    import glob
+    import json
+    from dateutil.tz import *
+    from dateutil.parser import *
+    LOC='/library/reports'
+    def tstamp(dtime):
+        '''return a UNIX style seconds since 1970 for datetime input'''
+        epoch = datetime.datetime(1970, 1, 1,tzinfo=tzutc())
+        newdtime = dtime.astimezone(tzutc())
+        since_epoch_delta = newdtime - epoch
+        return since_epoch_delta.total_seconds()
+    # fetch the dictionary of previous downloads if it exists
+    if path.isfile(path.join(LOC,"data","downloads")):
+        strm = open(path.join(LOC,"data","downloads"),"r")
+        downloads = json.load(strm)
+    else: downloads = {}
+    added = 0
+    # get the UUID of this machine
+    with open("/etc/xsce/uuid", "r") as infile:
+        uuid=infile.read()
+    filedata = []
+    # traverse the apache logs and get the date in first record
+    #  The following will need to change for Fedora
+    for fn in glob.glob('/var/log/messages*'):
+        for line in open(fn, 'r'):
+            datestr = line[0:15]
+            try:
+                dt= parse(datestr)
+            except:
+                continue
+            dt = dt.replace(tzinfo=tzlocal())
+            timestamp = tstamp(dt)
+            filedata.append( (timestamp,fn) )
+            #print( line[0:14], dt.strftime("%y%m%d"))
+            break
+    #pprint(filedata)
+    # traverse the apache logs
+    started = False
+    for ts,fn in sorted(filedata):
+        for line in open(fn, 'r'):
+            datestr = line[0:15]
+            try:
+                dt= parse(datestr)
+            except:
+                continue
+            dt = dt.replace(tzinfo=tzlocal())
+            timestamp = tstamp(dt)
+            if not started:
+                last_heartbeat = timestamp
+                last_powerdown_timestamp = timestamp
+                last_powerup = timestamp
+                started = True
+            # look for a start up record in the log
+            nibbles = line.split()
+            #pprint(nibbles)
+            if nibbles[4] != 'root:': continue
+            if  nibbles[5] == 'xsce_startup':
+                last_powerup = timestamp
+                print timestamp, "startup. Downtime:", timestamp - last_heartbeat,line[0:14]
+                """
+                if last_heartbeat == last_powerdown_timestamp:
+                    print 'normal shutdown'
+                else:
+                    print 'power interruption shutdown'
+                """
+            elif nibbles[5] == "xsce_tick":
+                #print timestamp,'tick',line[0:14]
+                last_heartbeat = timestamp
+            elif nibbles[5] == 'xsce_shutdown' :
+                print timestamp, 'Normal system shutdown',line[0:14],"Powered seconds: ",timestamp - last_powerup
+                last_heatbeat = timestamp
+                last_powerdown_timestamp = timestamp
+    # now store away the accumulated data
+    with open(path.join(LOC,"data","downloads"),"w") as outfile:
+        json.dump(downloads, outfile)
+    # now create the final csv file
+    outfile = open(path.join(LOC,"staging","downloads_csv"),'w')
+    for key in sorted(downloads):
+        outfile.write("%s,%s,%s,%s,\n" % (downloads[key]["time"],\
+                    downloads[key]["week"],\
+                    downloads[key]["url"], uuid.rstrip(), ))
+    # vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 background=light

roles/reports/templates/cp2git

-Original file line number
+Diff line change
@@ -0,0 +1,5 @@
+    #!/bin/bash -x
+    # copy scripts to the reports role in xsce git
+    for f in `find /library/reports -maxdepth 1 -type f`; do
+        cp $f /opt/schoolserver/xsce/roles/reports/templates/
+    done

roles/reports/templates/harvest.py

-Original file line number
+Diff line change
@@ -0,0 +1,124 @@
+    #!/bin/env python
+    # fetch email from xscenet@gmail.com, unzip it into raw_data
+    import glob
+    import imaplib
+    import email
+    import os
+    import zipfile
+    import json
+    from time import sleep
+    # go get the password for interaction with xscenet@gmail.com
+    with open("/root/.xscenet_gmail",'r') as passwd:
+        credential = passwd.read()
+    upenv = "{{ content_base }}/reports"
+    zips_dir = os.path.join(upenv,"html","zips")
+    raw_dir = os.path.join(upenv,"html","raw_data")
+    m = imaplib.IMAP4_SSL('imap.gmail.com')
+    m.login('xscenet@gmail.com', credential)
+    # declare location of dictionry with all our download data
+    #    -- the key for dictionary is datetime+download_url
+    download_data = os.path.join(upenv,"downloads.json")
+    def merge_data(filename=None):
+        if filename == None:
+            print("no filename in merge_data")
+            return
+        # fetch the dictionary of previous downloads if it exists
+        if os.path.isfile(download_data):
+            with open(download_data,"r") as strm:
+                downloads = json.load(strm)
+        else: downloads = {}
+        added = 0
+        for line in open(filename, 'r'):
+            data_chunks = line.split(',')
+            # put the data in the dictionary
+            key = data_chunks[0] + data_chunks[2]
+            if not key in downloads:
+                downloads[key] = {"time": data_chunks[0],
+                                  "week": data_chunks[1],
+                                  "url":  data_chunks[2],
+                                  "uuid": data_chunks[3],
+                                 }
+                added += 1
+            else:
+                continue
+        print("added records to data store: %s" % added)
+        # now store away the accumulated data
+        with open(os.path.join(download_data),"w") as outfile:
+            json.dump(downloads, outfile)
+    m.select("[Gmail]/All Mail")
+    resp, items = m.search(None, "(ALL)")
+    items = items[0].split()
+    for emailid in items:
+        resp, data = m.fetch(emailid, "(RFC822)")
+        email_body = data[0][1]
+        mail = email.message_from_string(email_body)
+        temp = m.store(emailid,'+FLAGS', '\\Seen')
+        m.expunge()
+        if mail.get_content_maintype() != 'multipart':
+            continue
+    #    print "["+mail["From"]+"] :" + mail["Subject"]
+        for part in mail.walk():
+            if part.get_content_maintype() == 'multipart':
+                continue
+            if part.get('Content-Disposition') is None:
+                continue
+            filename = part.get_filename()
+            original_zip_dir = filename[:-4]
+            # put insert the from into the zip file name
+            s=mail["From"]
+            sender = s[s.find("<")+1:s.find(">")]
+            filename = filename[:-4] + "-" + sender + '.zip'
+            att_path = os.path.join(zips_dir, filename)
+            if not os.path.isfile(att_path) :
+                print("writing: ",att_path)
+                fp = open(att_path, 'wb')
+                fp.write(part.get_payload(decode=True))
+                fp.close()
+    # go through the zip files, and expand them if not already expanded
+    for zf in glob.glob(zips_dir+"/*"):
+        # get the directory name we want in raw directory
+        raw_base = os.path.join(raw_dir,original_zip_dir)
+        if not os.path.isdir(raw_base):
+            with zipfile.ZipFile(zf,"r") as cpzip:
+                cpzip.extractall(raw_dir)
+            # now merge the data in the downloads.csv with our data_store at
+        csv = os.path.join(raw_base,"downloads_csv")
+        breakout = 0
+        while True:
+            if os.path.isfile(csv):
+                break
+            sleep(.2)
+            breakout += 1
+            if breakout > 10:
+                break
+        if breakout > 10:
+            print "failed to find %s" % csv
+            raise
+        merge_data(csv)
+    # regenerate the publicly visible merged data from all reporters
+    with open(download_data,"r") as strm:
+        downloads = json.load(strm)
+    with open(os.path.join(upenv,"html","downloads.csv.txt"),"w") as outfile:
+        for dl in sorted(downloads.keys()):
+            outfile.write("%s,%s,%s,%s\n" % (downloads[dl]["time"],\
+                                             downloads[dl]["week"],\
+                                             downloads[dl]["url"],\
+                                             downloads[dl]["uuid"],\
+                                            ))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream #747

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Upstream #747

Are you sure you want to change the base?

Uh oh!

Upstream #747

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!