Skip to content

Example GUI apps

Thor Whalen edited this page Nov 24, 2022 · 8 revisions

User can…

Towards a full-fledge Audio ML GUI

Base case

Let's start with dead-simple functionality:

  • select local waveform (from .wav file) only
  • get a visualization of "model" outputs (the standard timing table or, if time permits, a graph).

We'll use a fixed size chunker with chk_size=2048 & chk_step=2048 as our chunker and std(chk) as our "model". It corresponds to a series of audio "volumes". Note that functionally, there's no model; just a function whose input is a

This base case is a tiny instance of a more general pattern. Read more about this in the "Discussion about the more general GUI space" section below

Choosing a model

The essential part of Base Audio Analytics above is a "waveform function" that gets us from the waveform to a time series of numbers corresponding to the model outputs on a series of chunks of the input waveform. This function was the aggregate of a chunker, which segmented the audio, and a "chunk function" which was applied to the chunks.

Here, we'll add the ability to select a "waveform function" from a list of options.

  • select
    • local waveform (from .wav file) only
    • waveform function
  • visualize output

Auto Audio Anomaly ML

ML stands for "Machine Learning". Learning means parametrizing the "model" (function) automatically based on data.

We don't have any of that above, so let's include it now:

  • select local waveform (from .wav file) only
  • visualize output of anomaly scores for the waveform, with a model learned from this waveform

Uploading Audio

This is an orthogonal feature we could introduce at any point of our incremental journey: The ability to "upload" not one, but several waveforms, which can then be selected as a separate concern to perform Upload is in quotes because this could mean actually making a copy of some file(s) in a separate (local or remote) system, or it could simply mean recording a reference to the file(s) for easy selection later.

  • upload audio (.wav files only, for now)
  • view the collection of "uploaded" wav files (for example, a list of file names, or paths)
  • select single item (file) and view information and/or graph about it

Note: This uploading audio GUI is one instance of a very general and useful problem pattern: Store CRUD.

Note: The "viewing information" part of this use case can be generalized by "apply function to item" (key and/or value).

Parametrized Audio Anomaly ML

  • upload audio (only .wav files for now)
  • view waveforms (select and view information and/or graph)
  • learn model (select, or use default, chunker, featurizer and learner and name of learned model)
  • apply model
  • make (i.e. parametrize) chunkers, featurizers, and learners
  • visualize the results (the standard timing table or, if time permits, a graph).
  • system (web app) returns to UI the three most anomalous segments of that wave file

Create audio data collection, build and save models, run models on audio data, view results on “playable” graph).

Acquiring data

Live streams

Slabs: Make a live data read, specifying what live data sources to tap into as well as how to process these (visualize, store, etc.).

(Note: A general slabs interface can use much of the same “pipeline maker” GUI components.)

Stored data

KeyFilterAndTrans: Given the specification of source of files (or blobs of sorts -- such as folder(s), zip file(s), s3 bucket, etc.), the user is shown the list of keys (relative file paths) which they can then filter and transform using some GUI mapping of dol.filt_iter and dol.wrap_kvs(key_of_id, id_of_key), viewing the results live. Large amount of keys should be handled gracefully. One instance of this would be regular expressions to match (and filter in) keys.

WfTrans: Create “waveform readers” that allow them to make audio collections and read this audio from various sources and formats.

WfStore: Explore audio collection. This includes viewing meta data (names, tags, duration, time intervals, etc.) as well as reading data items (e.g. visual representation of audio, hear audio, etc.)

(Generalization: create and explore data collections.)

Annotations

LiveAnnotator: Create and edit tag collections and use these to “annotate time”, live. Example; Chose a tag collection and see buttons for each tag. When button is clicked, it flashes, indicating that an annotation interval has opened. If it’s clicked again, the annotation interval is closed and the button stops flashing. These (interval, tag) pairs (or (bt, tt, tag) triples) are saved in some way.

(Note: Some contexts might need mutually exclusive tags, some may allow overlapping. Some contexts might need to accommodate “punctual” annotations (no an interval, but simply some “marker” of a specific (single) timestamp.)

TagTriggeredRecording: Similar to LiveAnnotator but buttons trigger audio (or other live streams) recording. Overlapping annotation can easily be supported. Punctual annotations would probably use a buffer to record audio in an interval surrounding the marker time.

StoredDataAnnotator: Find stored time-indexed data, and "view" (includes playing) the multi-dimensional data in such a way that intervals (including whole "files" or "sessions") can be annotated.

Making and running complex processes

ProcessParametrizer: User can chose from a set of “processes” (pipelines, DAGs), view it’s structure, and set both functional nodes and input nodes to make a ready-to-use (or half-baked) runnable process.

IntelligentCells: User can chose & parametrize components, drop them in a “box”, and get a fully structured process.

DagMaker: User can chose & parametrize components, and connect them to make what ever structured process they want.

ExplicitCache: User can run processes on saved data, and results are saved. Example: Model training and model testing.

AutoCache: User can specify inner-nodes of a process that should cache their results and these stored intermediate results can automatically be used to not recalculate. Example: Train-test cycles.

Discussion about the more general GUI space

Note that the simple base case we started this document with is an instance of a more general use case that can be carried out in many different ways. Let's call this the "Get Audio Analytics" story, defined as:

  • select audio source
  • select analytics component (or use default)
  • get results of applying analytics component to audio

Audio source

  • single (persisted) waveform (formats often encountered; WAV, PCM, CSV)
  • live from sensor (microphone/accelerometer)
  • multiple (persisted) waveforms
    • Various sources
      • Folder (single or multiple)
      • Zip file (single or multiple)
      • remote folder or zip file (server filesystem or other blob storage -- e.g. AWS S3)
      • DB
    • Various formats

Analytics component

  • Lower level acoustic ones, useful for diagnosing raw data acquisition (e.g. volume, spectrums, and other acoustic features)
  • Higher level ones (e.g. anomaly and event detection scores)
  • Default component versus component "pushed" to device for particular use case and context
  • Analytics components can be "engineered" (e.g. physics/expert models) or "learned" from data (ML)
    • That taps in to another huge whole set of user stories we won't mention now
  • Customer may or may not have the ability to make there own analytics components

Get results

  • Ability to acquire the output of analytics components
    • internally, to be able to do visualizations
    • externally, to enable the customer to use this output (e.g. notifications, decision system, etc.)
  • Visualize analytics
    • different approaches needed for light data versus heavy (requires aggregation, filtering, and/or backend support)
    • multi-channel time-series graphs
      • different styles: Line, bar, vline, heatmap (possibly zoomable)
    • without time component
      • splatters (tsne, umap)
      • density maps (based on tsne/umap stats)
      • chordal graphs (for detection confusion matrices)
      • tradeoff graphs (e.g. precision/recall or other classification metric pairs)
    • ability to use visualizations for further exploration and annotation is key.
      • create and annotate segment of time-series graph (e.g. dataset/context segmentation and train/test splits)
      • create group channels and segments (big or small)

A few references