-
Notifications
You must be signed in to change notification settings - Fork 172
feat(skills): introduce owasp-ml #1227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d103e53
2d60264
d9f7f15
8b2e26a
54ed854
327b6e5
a2ae835
896e81f
1ed8db9
7c487a7
bbd34f5
4a3ed55
e097c54
f760a0a
ce46b6a
3b509df
75de93a
212e998
840c337
37ac2de
988e0b9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,45 @@ | ||||||
| --- | ||||||
| name: owasp-ml | ||||||
| description: OWASP Machine Learning Top 10 (2023) vulnerability knowledge base for identifying, assessing, and remediating security risks in machine learning systems - Brought to you by microsoft/hve-core. | ||||||
| license: CC-BY-SA-4.0 | ||||||
| user-invocable: false | ||||||
| metadata: | ||||||
| authors: "OWASP Machine Learning Security Project" | ||||||
| spec_version: "1.0" | ||||||
| framework_revision: "1.0.0" | ||||||
| last_updated: "2026-02-16" | ||||||
| skill_based_on: "https://github.com/chris-buckley/agnostic-prompt-standard" | ||||||
| content_based_on: "https://owasp.org/www-project-machine-learning-security-top-10/" | ||||||
| --- | ||||||
|
|
||||||
| # OWASP ML Top 10 — Skill Entry | ||||||
|
|
||||||
| This `SKILL.md` is the **entrypoint** for the OWASP ML Top 10 skill. | ||||||
|
|
||||||
| The skill encodes the **OWASP Machine Learning Security Top 10** as structured, machine-readable references | ||||||
| that an agent can query to identify, assess, and remediate machine learning security risks. | ||||||
|
|
||||||
| ## Normative references (ML Top 10) | ||||||
|
|
||||||
| 1. [00 Vulnerability Index](references/00-vulnerability-index.md) | ||||||
| 2. [01 Input Manipulation Attack](references/01-input-manipulation-attack.md) | ||||||
| 3. [02 Data Poisoning Attack](references/02-data-poisoning-attack.md) | ||||||
| 4. [03 Model Inversion Attack](references/03-model-inversion-attack.md) | ||||||
| 5. [04 Membership Inference Attack](references/04-membership-inference-attack.md) | ||||||
| 6. [05 Model Theft](references/05-model-theft.md) | ||||||
| 7. [06 AI Supply Chain Attacks](references/06-ai-supply-chain-attacks.md) | ||||||
| 8. [07 Transfer Learning Attack](references/07-transfer-learning-attack.md) | ||||||
| 9. [08 Model Skewing](references/08-model-skewing.md) | ||||||
| 10. [09 Output Integrity Attack](references/09-output-integrity-attack.md) | ||||||
| 11. [10 Model Poisoning](references/10-model-poisoning.md) | ||||||
|
|
||||||
| ## Skill layout | ||||||
|
|
||||||
| * `SKILL.md` — this file (skill entrypoint). | ||||||
| * `references/` — the ML Top 10 normative documents. | ||||||
| * `00-vulnerability-index.md` — index of all vulnerability identifiers, categories, and cross-references. | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RI-03 — "index" vs "master index" (Low) All three sibling OWASP skills say
Suggested change
|
||||||
| * `01` through `10` — one document per vulnerability aligned with OWASP ML Security Top 10 numbering. | ||||||
|
|
||||||
| --- | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RI-01 — Missing Third-Party Attribution (High) All sibling OWASP skills ( Since the content is derived from the OWASP ML Security Top 10, proper attribution is required by the CC-BY-SA-4.0 license. Suggested addition (before the closing ## Third-Party Attribution
Copyright © OWASP Foundation.
OWASP® Machine Learning Security Top 10 (2023) content is derived from works by the
OWASP Foundation, licensed under CC BY-SA 4.0
(<https://creativecommons.org/licenses/by-sa/4.0/>).
Source: <https://owasp.org/www-project-machine-learning-security-top-10/>
Modifications: Vulnerability descriptions restructured into agent-consumable reference
documents with added detection and remediation guidance.
OWASP® is a registered trademark of the OWASP Foundation. Use does not imply endorsement. |
||||||
|
|
||||||
| *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| --- | ||
| title: OWASP ML Top 10 Vulnerability Index | ||
| description: Index of OWASP Machine Learning Security Top 10 (2023) vulnerability identifiers, categories, and cross-references | ||
| --- | ||
|
|
||
| # 00 Vulnerability Index | ||
|
|
||
| This document provides the index for the OWASP Machine Learning Security Top 10 vulnerabilities. | ||
| Each entry includes its identifier, title, and primary category. | ||
|
|
||
| ## Vulnerability catalog | ||
|
|
||
| | ID | Title | Category | | ||
| |---|---|---| | ||
| | ML01:2023 | Input Manipulation Attack | Input Security | | ||
| | ML02:2023 | Data Poisoning Attack | Data Integrity | | ||
| | ML03:2023 | Model Inversion Attack | Privacy | | ||
| | ML04:2023 | Membership Inference Attack | Privacy | | ||
| | ML05:2023 | Model Theft | Intellectual Property | | ||
| | ML06:2023 | AI Supply Chain Attacks | Supply Chain | | ||
| | ML07:2023 | Transfer Learning Attack | Model Integrity | | ||
| | ML08:2023 | Model Skewing | Data Integrity | | ||
| | ML09:2023 | Output Integrity Attack | Output Security | | ||
| | ML10:2023 | Model Poisoning | Model Integrity | | ||
|
|
||
| ## Cross-reference matrix | ||
|
|
||
| Each vulnerability document follows a consistent structure: | ||
|
|
||
| 1. Description — what the vulnerability is and how it manifests in machine learning systems. | ||
| 2. Risk — concrete consequences of exploitation and business impact. | ||
| 3. Vulnerability checklist — indicators that the system is exposed. | ||
| 4. Prevention controls — defensive measures and rectification steps. | ||
| 5. Example attack scenarios — realistic exploitation narratives. | ||
| 6. Detection guidance — signals and methods to identify exposure. | ||
| 7. Remediation — immediate and long-term actions to contain and resolve. | ||
|
|
||
| ## Category groupings | ||
|
|
||
| ### Input Security | ||
|
|
||
| * ML01:2023 Input Manipulation Attack | ||
|
|
||
| ### Data Integrity | ||
|
|
||
| * ML02:2023 Data Poisoning Attack | ||
| * ML08:2023 Model Skewing | ||
|
|
||
| ### Privacy | ||
|
|
||
| * ML03:2023 Model Inversion Attack | ||
| * ML04:2023 Membership Inference Attack | ||
|
|
||
| ### Intellectual Property | ||
|
|
||
| * ML05:2023 Model Theft | ||
|
|
||
| ### Supply Chain | ||
|
|
||
| * ML06:2023 AI Supply Chain Attacks | ||
|
|
||
| ### Model Integrity | ||
|
|
||
| * ML07:2023 Transfer Learning Attack | ||
| * ML10:2023 Model Poisoning | ||
|
|
||
| ### Output Security | ||
|
|
||
| * ML09:2023 Output Integrity Attack | ||
|
|
||
| --- | ||
|
|
||
| *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| --- | ||
| title: 'ML01: Input Manipulation Attack' | ||
| description: OWASP ML Top 10 reference for input manipulation and adversarial attack vulnerabilities including crafted perturbations that cause incorrect model outputs | ||
| --- | ||
|
|
||
| # 01 Input Manipulation Attack | ||
|
|
||
| Identifier: ML01:2023 | ||
| Category: Input Security | ||
|
|
||
| ## Description | ||
|
|
||
| Input Manipulation Attacks is an umbrella term which includes Adversarial Attacks, a type of attack | ||
| in which an attacker deliberately alters input data to mislead the model. The attacker crafts | ||
| inputs with small, carefully designed perturbations that cause the model to produce incorrect | ||
| outputs while appearing legitimate to human observers. This category affects any machine learning | ||
| system that accepts external input, including image classifiers, intrusion detection systems, and | ||
| natural language processing models. | ||
|
|
||
| ## Risk | ||
|
|
||
| * Misclassification of inputs leading to security bypass or harm to the system. | ||
| * The manipulated input may not be noticeable to the naked eye, making the attack difficult to | ||
| detect. | ||
| * Exploitation requires technical knowledge of deep learning and input processing techniques. | ||
| * Attackers with knowledge of the model's architecture can craft targeted perturbations. | ||
| * Cascading failures when misclassified inputs trigger downstream actions in automated pipelines. | ||
|
|
||
| ## Vulnerability checklist | ||
|
|
||
| * The model lacks adversarial training and has not been exposed to adversarial examples during | ||
| training. | ||
| * No input validation is performed to detect anomalies, unexpected values, or patterns. | ||
| * The model is not designed with robust architectures or defense mechanisms against manipulative | ||
| inputs. | ||
| * Model predictions are consumed directly without downstream verification or confidence | ||
| thresholding. | ||
| * No monitoring is in place to detect distribution shifts or anomalous input patterns at inference | ||
| time. | ||
| * The model's architecture and parameters are accessible to potential attackers. | ||
|
|
||
| ## Prevention controls | ||
|
|
||
| 1. Train the model on adversarial examples to improve robustness against manipulated inputs. | ||
| 2. Use models designed with robust architectures and activation functions that incorporate defense | ||
| mechanisms against adversarial perturbations. | ||
| 3. Implement input validation to check input data for anomalies such as unexpected values or | ||
| patterns and reject inputs that are likely to be malicious. | ||
| 4. Apply confidence thresholding to flag or reject predictions below a confidence threshold. | ||
| 5. Use ensemble methods that combine multiple models to reduce the likelihood that a single | ||
| adversarial perturbation fools all models. | ||
| 6. Restrict access to model internals to prevent attackers from crafting targeted attacks. | ||
|
|
||
| ## Example attack scenarios | ||
|
|
||
| ### Scenario A — Image classification bypass | ||
|
|
||
| A deep learning model is trained to classify images into categories such as dogs and cats. An | ||
| attacker manipulates an image that is visually similar to a legitimate image of a cat but contains | ||
| small, carefully crafted perturbations that cause the model to misclassify it as a dog. When the | ||
| model is deployed in a real-world setting, the attacker uses the manipulated image to bypass | ||
| security measures or cause harm to the system. | ||
|
|
||
| ### Scenario B — Network intrusion detection evasion | ||
|
|
||
| A deep learning model is trained to detect intrusions in a network. An attacker manipulates | ||
| network traffic by carefully crafting packets that evade the model's intrusion detection system. | ||
| The attacker alters features of the network traffic such as the source IP address, destination IP | ||
| address, or payload in a way that avoids detection. The attacker may hide their source IP address | ||
| behind a proxy server or encrypt the payload. This leads to data theft, system compromise, or | ||
| other forms of damage. | ||
|
|
||
| ## Detection guidance | ||
|
|
||
| * Monitor input distributions at inference time for statistical anomalies or distribution shifts | ||
| compared to training data. | ||
| * Log all inputs and outputs to detect patterns of adversarial probing. | ||
| * Implement anomaly detection on incoming data to flag inputs that deviate significantly from | ||
| expected distributions. | ||
| * Compare model confidence scores over time to detect sudden drops or unusual prediction patterns. | ||
| * Use gradient-based detection methods to identify inputs that produce unusually large gradients. | ||
|
|
||
| ## Remediation | ||
|
|
||
| * Retrain the model with adversarial examples incorporated into the training dataset. | ||
| * Deploy input validation filters to reject or quarantine suspicious inputs before inference. | ||
| * Implement robust model architectures that are inherently resistant to small perturbations. | ||
| * Add confidence-based gating to suppress low-confidence predictions. | ||
| * Restrict public access to model APIs and internals to limit attacker reconnaissance. | ||
| * Continuously monitor model performance in production for accuracy degradation that may indicate | ||
| ongoing adversarial attacks. | ||
|
|
||
| --- | ||
|
|
||
| *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| --- | ||
| title: 'ML02: Data Poisoning Attack' | ||
| description: OWASP ML Top 10 reference for data poisoning vulnerabilities including training data manipulation, label corruption, and pipeline compromise | ||
| --- | ||
|
|
||
| # 02 Data Poisoning Attack | ||
|
|
||
| Identifier: ML02:2023 | ||
| Category: Data Integrity | ||
|
|
||
| ## Description | ||
|
|
||
| Data poisoning attacks occur when an attacker manipulates the training data to cause the model to | ||
| behave in an undesirable way. The attacker injects malicious data into the training dataset by | ||
| compromising the data storage system, exploiting vulnerabilities in data collection pipelines, or | ||
| corrupting the data labeling process. The poisoned data causes the model to learn incorrect | ||
| patterns, leading to unreliable predictions when deployed. This attack is particularly dangerous | ||
| because manipulated training data may be difficult to detect and can persist through multiple | ||
| retraining cycles. | ||
|
|
||
| ## Risk | ||
|
|
||
| * The model will make incorrect predictions based on the poisoned data, leading to false decisions | ||
| and potentially serious consequences. | ||
| * The attack has moderate exploitability and is difficult to detect. | ||
| * Attackers who have access to the training data or the data collection pipeline can execute the | ||
| attack. | ||
| * Lack of data validation and insufficient monitoring of the training data increase exposure. | ||
| * Poisoned data may persist across retraining cycles if not identified and removed. | ||
|
|
||
| ## Vulnerability checklist | ||
|
|
||
| * Training data is not thoroughly validated or verified before use. | ||
| * No data integrity checks such as checksums or digital signatures are applied to training | ||
| datasets. | ||
| * Training data is stored without encryption or secure transfer protocols. | ||
| * Training data is not separated from production data. | ||
| * Access controls do not restrict who can access or modify the training data. | ||
| * No anomaly detection is applied to training data to detect sudden distribution changes or | ||
| labeling inconsistencies. | ||
| * Multiple independent data labelers are not used to cross-validate labeling accuracy. | ||
| * No separate validation set is used to verify model behavior after training. | ||
|
|
||
| ## Prevention controls | ||
|
|
||
| 1. Ensure that training data is thoroughly validated and verified before use by implementing data | ||
| validation checks and employing multiple data labelers to validate labeling accuracy. | ||
| 2. Store training data securely using encryption, secure data transfer protocols, and firewalls. | ||
| 3. Separate training data from production data to reduce the risk of training data compromise. | ||
| 4. Implement access controls to limit who can access the training data and when. | ||
| 5. Regularly monitor the training data for anomalies and conduct audits to detect data tampering. | ||
| 6. Validate the model using a separate validation set that was not used during training to detect | ||
| poisoning attacks that may have affected the training data. | ||
| 7. Train multiple models using different subsets of the training data and use an ensemble to make | ||
| predictions, reducing the impact of poisoning attacks. | ||
| 8. Use anomaly detection techniques to detect abnormal behavior in the training data such as | ||
| sudden changes in data distribution or data labeling. | ||
|
|
||
| ## Example attack scenarios | ||
|
|
||
| ### Scenario A — Poisoning a spam classifier | ||
|
|
||
| An attacker poisons the training data for a deep learning model that classifies emails as spam or | ||
| not spam. The attacker injects maliciously labeled spam emails into the training dataset by | ||
| compromising the data storage system, hacking into the network, or exploiting a vulnerability in | ||
| the data storage software. The attacker also manipulates the data labeling process by falsifying | ||
| labels or bribing data labelers to provide incorrect labels. | ||
|
|
||
| ### Scenario B — Poisoning a network traffic classifier | ||
|
|
||
| An attacker poisons the training data for a deep learning model used to classify network traffic | ||
| into categories such as email, web browsing, and video streaming. The attacker introduces a large | ||
| number of examples of network traffic incorrectly labeled as a different type of traffic, causing | ||
| the model to make incorrect traffic classifications when deployed. This leads to misallocation of | ||
| network resources or degradation of network performance. | ||
|
|
||
| ## Detection guidance | ||
|
|
||
| * Apply statistical analysis to training datasets to detect sudden distribution shifts or | ||
| anomalous labeling patterns. | ||
| * Use holdout validation sets to compare model behavior against known-clean baselines. | ||
| * Monitor model accuracy over retraining cycles for unexpected degradation. | ||
| * Cross-validate data labels using multiple independent labelers or automated consistency checks. | ||
| * Audit data pipeline access logs for unauthorized modifications to training datasets. | ||
|
|
||
| ## Remediation | ||
|
|
||
| * Remove identified poisoned data from the training dataset and retrain the model. | ||
| * Implement data provenance tracking to trace the origin of all training data. | ||
| * Enforce strict access controls and audit logging on data storage and labeling systems. | ||
| * Deploy anomaly detection on data ingestion pipelines to catch future poisoning attempts. | ||
| * Use ensemble models trained on different data subsets to reduce single-point-of-failure risk. | ||
| * Conduct periodic audits of data labeling quality and consistency. | ||
|
|
||
| --- | ||
|
|
||
| *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RI-04 — Technology signals may be too narrow (Medium)
Other skill signal blocks have 3–5 entries for broader detection coverage. ML codebases commonly include framework imports (
torch,tensorflow,sklearn,keras,transformers) and additional model formats (.safetensors). Adding at least one framework-import signal would improve the codebase profiler's detection accuracy.Consider expanding to: