Releases · NaegleLab/KSTAR

09 Jan 00:40

srcrowl

v1.1.0

335a335

v1.1.0: Proteome updates, reducing user burden, and streamlining pregeneration Latest

Latest

Primary Goals of this Release

Update the phosphoproteome and network files to match the most up to date SwissProt proteome
Tweak how pregenerated random experiments are stored and handled during activity calculation, including restructuring the network directory to allow for multiple networks with different names
Lower the user burden by automatically loading required data and having master functions that don't require multiple lines of code
Provide better tools for determining thresholds to use prior to activity calculation

Full Changelog: v1.0.4...v1.1.0

Summary of Major Changes

Phosphoproteome Update

We updated the previous reference files to have position and peptide information matching the current UniProt (as of November 2025). These files have been uploaded to FigShare and are now the default files loaded with KSTAR.
We also updated the underlying information in the KSTAR networks to match the current proteome (same weighted network, but updated site positions).
To make sure the correct reference files are used, we added a unique reference hash stored in a json file in RESOURCE_FILES directory, which much match the hash of the network used. This ensures that the network is built on the same reference phosphoprotoeome

Pregeneration Updates

Default pregenerated random experiments now exist in the same directory as the corresponding KSTAR networks under the folder 'RANDOM_ACTIVITIES', which will always be used
Added global configuration parameter indicating where custom random activities can be saved (not the default activities shipped with KSTAR resource files)
Rather than using the 'directories.txt' file previously used to indicate where network directories, we now use a .json file that the user can update using the update_configuration() function. This includes changing how pregenerated experiments are handled.
Fixed issues with saving and using custom pregenerated random activities, which in v1.0 weren't being recognized

Lowering the user burden

Rather than having user load the networks and create a log file, we now automatically do these actions when initializing the ExpMapper and KinaseActivity classes. All the user needs to provide us an output directory and name of the run.
Provided a master dotplot function which automatically stitches together the different components of the dotplot (clusters, context, evidence size, and the actual dotplot). This does not require the user to create their subplots. Users can also directly apply this function from the KinaseActivity class.
The three key master functions (enrichment_analysis(), randomized_analysis(), and MannWhitney_analysis()) are combined into a single master function, run_kstar_analysis()

Thresholding decisions

The KinaseActivity.test_threshold() function has been updated to also calculate the similarity of evidence between columns as well as how many data columns are lost at the provided threshold
To visualize the impact of different thresholds, we've added a new function called KinaseActivity.test_threshold_range(), which produces plots of the evidence size and similarity across multiple different thresholds.
For ease of selection, we have also added a KinaseActivity.recommend_threshold() function to provide our suggestion about the optimal threshold that provides a good balance between the total number of sites used as evidence and minimizing overlap between sample columns

Configuration changes

Added function to see total memory usage by KSTAR's resource files (config.get_package_memory()
Added function to see the available networks in the default network directory (config.get_available_networks())
Created a .json file to store desired configuration parameters, including the network directory location, whether to use pregenerated experiments and save random experiments by default, and where to save custom random activities. This file replaces the old 'directories.txt' file.

Other changes

Added a new ExperimentMapper.save_experiment() function to the mapper class, which will save the mapped experiment, as well as additional information about the success of mapping
While still in the testing phase, we have added a new module called dataset_processing() intended to help users process their datasets for use with KSTAR, mainly by formatting peptide sequences and converting between IDs (such as converting gene names to uniprot IDs).
We added a new class to the plot module, called KSTAR_PDF, which generates a three page PDF summarizing the results from the KSTAR run. This is intended to be a first pass that users can look at to get a quick idea of what their data looks like.
FDR calculations are now based on 150 comparisons, rather than 100. The fundamental way this is calculated remains unchanged.
Removed use of pickles
Removed dependency on biopython, as previously this was only used to read in fasta files.

For more changes, see the full changelog

Assets 2

14 May 13:32

levicuster

v1.0.4

a8474fe

v1.0.4: KSTAR update to allow use of pre-generated experiments

Updated KSTAR to allow use of pre-generated activity lists while maintaining existing functionality.
Added variables to the config module, USE_PREGEN_DATA, SAVE_NEW_PRECOMPUTE, PREGENERATED_EXPERIMENTS_PATH, NETWORK_HASH_Y, NETWORK_HASH_ST, DIRECTORY_FOR_SAVE_PRECOMPUTE
Added install_network_files() function to config module. This automatically installs the
network files when the user runs config.install_network_files() from the tutorial.
Added a hash id to the prune module. This gives each network its own unique hash id that is stored in the run_information.txt.
Added instance variables to the calculate module, min_dataset_size_for_pregenerated, max_diff_from_pregenerated, random_activities_list, compendia_distribution, data_columns_from_scratch, use_pregen_data, save_new_precompute, pregenerated_experiments_path, directory_for_save_precompute, network_hash
Added new functions to the calculate module -

calculate_random_enrichment: Generates random experiments matching real data's compendia distribution, calculates kinase activities for each using hypergeometric tests, and aggregates results into a DataFrame.
calculate_random_activities: Controls random experiment pipeline - decides whether to use pre-generated data or create new experiments, then processes datasets individually or in batch.
calculate_random_activity_singleExperiment2: Handles a single random experiment in multiprocessing mode - builds experiment matching real data's compendia distribution and calculates its activity.
add_pregenerated_to_random_enrichment: Combines pre-generated activities with newly calculated ones, ensures proper ordering, and updates the master random_enrichment DataFrame.
load_pregenerated_random_activities: Finds and loads pre-computed activity files based on dataset characteristics and renames columns to match current experiment.
save_new_precomputed_random_enrichment: Saves random activity results in an organized directory structure for future reuse.
network_check_for_pregeneration: Verifies if pre-generated data exists for the current network by checking hash directories and metadata.
check_file_sizes_for_pregenerated: Locates pre-generated files matching current dataset characteristics and returns their sizes.
get_compendia_distribution: Calculates percentage of sites in each compendia class (0-2) per dataset.
get_run_information_content: Reads metadata from RUN_INFORMATION.txt in the appropriate network directory.
parse_network_information: Extracts structured configuration data from a RUN_INFORMATION.txt file.

Assets 2

08 Feb 04:56

srcrowl

v0.5.3

2aae081

v0.5.3: Bug fixes and minor updates for pandas v2

Fix aggregation so that it does not throw error from non-numeric columns
Throw error if binarizing data does not output any evidence
Fixed issue where evidence columns were incorrectly removed if no quantification was greater than 1
Various updates for pandas v2
Minor fixes to plotting code
Remove setuptools as requirement, as it's no longer used

Assets 2

08 Feb 03:31

srcrowl

v0.5.0

a8328d4

v0.5.0 Addition of new features for post hoc analysis

Updates/changes:

Renamed modules for which their name no longer reflected their true use: normalize -> random_experiments, validate -> analysis
Completely removed normalization functions from the first iterations of KSTAR that are no longer in use
Added catch to the pruning procedure to ensure that the code is not stopped if a kinase does not have any remaining edges, and instead keeps the kinase with fewer edges and records the error in the log.

New features:

New functions in pruning module intended to guide users to best parameter values to use for their purposes + whether their parameter values are actually feasible.
In addition to binarizing experiments by a threshold, you can now instead provide the desired number of phosphorylation sites to use for each sample and KSTAR will grab that number of sites with the greatest abundance (or least if greater = False)
New function in KinaseActivity class, called test_threshold, intended to make it easier to check how a threshold value impacts the number of sites used across all samples
Can add the number of phosphorylation sites used for each sample to a dotplot using evidence_size() function in DotPlot class
Added new submodule in analysis module, called coverage, which is for exploring the coverage (number of sites with connections in network) of the phosphoproteome and phosphoproteomic experiments by KSTAR networks (or other kinase-substrate networks)
Added new submodule in analysis module, called interactions, which is intended to contain functions for determining what active kinases are interacting with in the sample. Currently, contains two functions for outputting the phosphorylation sites that contributed most to a kinases activity prediction, based on the number of different networks they are predicted to interact.

Assets 2

13 Oct 23:51

srcrowl

v0.4.2

c84152d

v0.4.2 Bug fixes and improving use of command line for pruning

Updated previous release to fix bugs and expand the number of parameters that can be inputted into the pruning.py script via the command line

Assets 2

12 Sep 21:47

srcrowl

v0.4.0

a13d5df

v0.4.0 Pruning Generalization and Reducing Memory Burden

In this release, two major updates were made:

Redundant steps were removed during the random experiment generation and activity calculation steps to reduce memory burden
Additional parameters were added to the pruning class to allow for user to input different site accession and number columns (if different from those used in NetworKIN). Goal is to make it so that it can be used for any kinase-substrate network.

Assets 2

18 Jan 17:17

srcrowl

v0.3.2

7a428a1

v0.3.2 Streamlining Fixes

Small changes to errors in pruning.py and other fixes to previous release. Functionally identical release to v0.3.1.

Assets 2

09 Dec 17:02

srcrowl

v0.3.1

eebcc09

v0.3.1 Streamlining the Pipeline

The primary change of this release was to remove the normalization pipeline, which generated normalized p-values based on the random experiments, and instead focus on Mann Whitney generated p-values (as this works better). Other changes include:

Added PROCESSES parameter to the pruning functions, as was done with activity calculation
Updated plotting functions to fix various visualization errors

Assets 2

16 Jul 21:23

srcrowl

v0.2.1

da777f0

v0.2.1 Config Update 2

Made the following adjustments to KSTAR configuration:
-create_network_pickles() will only generate new pickles if it does not find them in the network directory
-config.PROCESSES was removed. Instead of setting config variable, the number of processes to run in parallel is set through function parameters.

Assets 2

12 Jul 20:57

srcrowl

v0.2.0

1341a3c

v0.2.0 Configuration Update

Configuration files updated so that source code does not require editing and all setup can easily be performed within python

Assets 2

Releases: NaegleLab/KSTAR

v1.1.0: Proteome updates, reducing user burden, and streamlining pregeneration

Uh oh!

v1.0.4: KSTAR update to allow use of pre-generated experiments

Uh oh!

v0.5.3: Bug fixes and minor updates for pandas v2

Uh oh!

v0.5.0 Addition of new features for post hoc analysis

Uh oh!

v0.4.2 Bug fixes and improving use of command line for pruning

Uh oh!

v0.4.0 Pruning Generalization and Reducing Memory Burden

Uh oh!

v0.3.2 Streamlining Fixes

Uh oh!

v0.3.1 Streamlining the Pipeline

Uh oh!

v0.2.1 Config Update 2

Uh oh!

v0.2.0 Configuration Update

Uh oh!