diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/SKILL.md b/plugins/android-reverse-engineering/skills/android-reverse-engineering/SKILL.md index 6b31074..f804a65 100644 --- a/plugins/android-reverse-engineering/skills/android-reverse-engineering/SKILL.md +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/SKILL.md @@ -24,6 +24,31 @@ If anything is missing, follow the installation instructions in `${CLAUDE_PLUGIN ## Workflow +### Phase 0: Fingerprint the App (recommended before anything else) + +Before installing tools or decompiling, run a fast triage to determine what +kind of app you are looking at. **Decompiling Java is mostly useless for +Flutter, React Native, Cordova/Capacitor, and Xamarin apps** — the real code +lives elsewhere. The fingerprint script tells you which. + +```bash +bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/fingerprint.sh +``` + +It prints, in one screen: + +- **Mobile framework** (Flutter / React Native / Cordova / Xamarin / Native Kotlin / etc.) with the file marker that triggered the verdict. +- **HTTP stack** (Retrofit, OkHttp, Ktor, Apollo, Volley) detected via DEX string scan — works even when class names are obfuscated. +- **DI / serialization** signals (Hilt, Dagger, Koin, kotlinx.serialization, Moshi, Gson, Jackson). +- **Obfuscation level** estimate based on root-level short-named packages. +- **Notable third-party SDKs** (AppsFlyer, Datadog, Sentry, Firebase, payment SDKs, support/chat SDKs, etc.). +- **Consolidated native libraries** across the base APK and all splits — XAPK split bundles often place `.so` files in `config..apk`, not in `base.apk`. +- **Recommended next step**, which differs by framework (e.g. for Flutter the script suggests `blutter` / `strings libapp.so` rather than jadx). + +If the fingerprint says the app is Flutter / RN / Cordova / Xamarin, **stop** +and switch to the framework-appropriate tooling. Phases 1–5 below assume a +native (Java/Kotlin) Android app. + ### Phase 1: Verify and Install Dependencies Before decompiling, confirm that the required tools are available — and install any that are missing. @@ -123,12 +148,45 @@ Navigate the decompiled output to understand the app's architecture. - Distinguish app code from third-party libraries - Look for packages named `api`, `network`, `data`, `repository`, `service`, `retrofit`, `http` — these are where API calls live -3. **Identify the architecture pattern**: +3. **Read every `BuildConfig.java`** — these are almost never obfuscated and frequently leak the highest-signal constants in the entire APK (base URLs, flavor names, build type, third-party API keys, feature flags): + ```bash + find /sources -name BuildConfig.java -exec grep -H '=' {} \; + ``` + Each Gradle module emits its own `BuildConfig`, so expect 1–N hits. Read all of them. + +4. **Identify the architecture pattern**: - MVP: look for `Presenter` classes - MVVM: look for `ViewModel` classes and `LiveData`/`StateFlow` - Clean Architecture: look for `domain`, `data`, `presentation` packages - This informs where to look for network calls in the next phases +### Phase 3.5: Recover Kotlin Class Names (only for obfuscated Kotlin apps) + +If Phase 0 reported moderate / high obfuscation **and** the app is Kotlin +(Compose / kotlin_module markers detected), run the metadata recovery +script before tracing call flows. R8 obfuscates JVM symbols but cannot +strip Kotlin metadata strings, so original FQNs leak through +`@DebugMetadata` and `@Metadata.d2`. + +```bash +bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh \ + /sources /mapping +``` + +Then use the lookup helper instead of plain grep — every hit comes +annotated with the owning class's real name: + +```bash +bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/lookup-name.sh \ + /mapping --grep '"/api/' /sources +``` + +Typical recovery on a real-world Kotlin app: ~100% of `*Repository` / +`*ViewModel` / `*UseCase` / `*Impl` classes, ~80% of DTOs. + +See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/kotlin-name-recovery.md` +for the full technique and limitations. + ### Phase 4: Trace Call Flows Follow execution paths from user-facing entry points down to network calls. @@ -190,15 +248,32 @@ On Windows (PowerShell): & "${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/find-api-calls.ps1" /sources/ -Auth ``` -Then, for each discovered endpoint, read the surrounding source code to extract: -- HTTP method and path -- Base URL -- Path parameters, query parameters, request body -- Headers (especially authentication) -- Response type -- Where it's called from (the call chain from Phase 4) +Document the endpoints in **two tiers** — going deep on every endpoint is +prohibitively expensive on apps with 100+ paths, and most of them do not +warrant it. Always produce Tier 1; expand Tier 2 only for the endpoints +that matter. + +#### Tier 1 — flat inventory (always) -**Document each endpoint** using this format: +A single table covering every discovered endpoint. Aim for one line each; +if you cannot determine a column, write `?`. + +| Host | Method | Path | Auth | Source file | +|------|--------|------|------|-------------| +| `api.example.com` | GET | `/v1/users/profile` | Bearer | `com/example/api/UserApi.java` | +| `api.example.com` | POST | `/v1/auth/login` | none | `com/example/api/AuthApi.java` | + +This table answers "what does the backend look like" in one screen and +takes ~5 minutes to produce from the `--paths` output even on a large app. + +#### Tier 2 — per-endpoint detail (only for high-value endpoints) + +Reserve the detailed format for the few endpoints that actually need it: + +- the entire authentication flow (login, refresh, logout, OTP/SMS, anonymous, registration) +- payment / checkout / order-creation endpoints +- anything the user explicitly asked about +- anything that looked unusual during the scan (custom signing, undocumented headers, etc.) ```markdown ### `METHOD /path` @@ -213,6 +288,10 @@ Then, for each discovered endpoint, read the surrounding source code to extract: - **Called from**: `LoginActivity → LoginViewModel → UserRepository → ApiService` ``` +As a default, do not produce Tier 2 entries for more than ~10 endpoints +unless the user explicitly asks for more — Tier 1 plus a Tier 2 deep dive +on auth + 1-2 key flows is what most consumers of this work actually want. + See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/api-extraction-patterns.md` for library-specific search patterns and the full documentation template. ## Output diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/api-extraction-patterns.md b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/api-extraction-patterns.md index 5467eb1..4023139 100644 --- a/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/api-extraction-patterns.md +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/api-extraction-patterns.md @@ -55,6 +55,65 @@ grep -rn 'Interceptor\|addInterceptor\|addNetworkInterceptor\|intercept(' source grep -rn '\.execute()\|\.enqueue(' sources/ ``` +## Ktor (Kotlin) + +Ktor is the dominant HTTP client in Kotlin Multiplatform and modern +Kotlin-only Android apps. Unlike Retrofit, Ktor does **not** use annotations +to declare endpoints — paths appear as plain string arguments to +`client.get(...)` / `client.post(...)`, often inside an extension function. + +```bash +# Calls +grep -rn '\b\(client\|httpClient\|HttpClient\)\.\(get\|post\|put\|delete\|patch\|head\|request\)\s*[<(]' sources/ + +# Default request / base URL configuration +grep -rn 'HttpRequestBuilder\|defaultRequest\s*{\|\burl\s*(\s*"\|URLBuilder' sources/ + +# Auth plugin (bearer / refresh) +grep -rn '\bbearer\s*{\|BearerTokens\s*(\|loadTokens\s*{\|refreshTokens\s*{' sources/ +``` + +Typical Ktor call (after decompile): + +```java +client.get("api/v1/users/profile") { + parameter("locale", "en-US"); +} +``` + +The base URL is usually applied via `defaultRequest { url { host = "..." } }` +in the client builder. Search for `host =` and `URLProtocol.HTTPS` references +to pin it down. + +**Note on obfuscation:** in heavily R8-shrunk apps the call site +`client.get("path")` is inlined to something like `aVar.a(dVar, "path")` +and the `client.(` regex misses it. The path string itself is **not** +obfuscated, however — fall back to the generic path-literal search +(`--paths`) for the endpoint inventory in those cases. Ktor library +internals (`BearerTokens`, `loadTokens`, `refreshTokens`, `URLProtocol`) +remain searchable because Ktor keeps these on its public API. + +Ktor's authentication plugin uses the +[`Auth { bearer { loadTokens { ... }; refreshTokens { ... } } }`](https://ktor.io/docs/auth.html) +DSL — bearer access tokens with automatic refresh. After R8, the DSL +lambdas appear as `Function2`/`Function3` impls referencing +`BearerTokens(...)` calls. + +## Apollo Kotlin (GraphQL) + +```bash +# Client setup +grep -rn 'ApolloClient\|\.serverUrl(\|HttpNetworkTransport' sources/ + +# Operations (queries / mutations / subscriptions) +grep -rn '\.query(\s*[A-Z]\|\.mutation(\s*[A-Z]\|\.subscription(\s*[A-Z]' sources/ +``` + +Apollo generates one class per operation under a generated package; once you +find the GraphQL endpoint URL via `ApolloClient.serverUrl("...")`, use the +operation classes themselves as the API documentation — each carries its +GraphQL document text in `OPERATION_DOCUMENT`. + ## Volley ```bash @@ -77,6 +136,25 @@ grep -rn 'loadUrl\|evaluateJavascript\|addJavascriptInterface\|WebViewClient\|sh WebView-based apps may load API endpoints via JavaScript bridges. Look for `@JavascriptInterface` annotated methods. +## Endpoint-Shaped Path Literals (obfuscation-resistant) + +When the HTTP client cannot be identified (custom abstraction, heavy +inlining, KMP shared module), or the call sites are obfuscated to +`a.b(c, "path")`, fall back to extracting the path string literals +themselves. R8 does not obfuscate string contents, so paths leak through. + +```bash +# All quoted strings shaped like an API path, deduplicated +grep -rhoE '"(/[A-Za-z0-9_{}.\-]+(/[A-Za-z0-9_{}.\-]+)+/?|(api|v[0-9]+|graphql|users?|account|auth|sso|oauth|profile|cart|basket|order|product|inventory|search|category|address|location|delivery|payment|invoice|favo[u]?rites?)(/[A-Za-z0-9_{}.\-]+)+/?)"' sources/ \ + | grep -Ev '^"(image|video|audio|text|application|content)/|^"/(proc|sys|dev|tmp|etc)/' \ + | sort -u +``` + +The skill ships this as `find-api-calls.sh --paths`, which prints both a +deduplicated inventory and the full list of call sites. On real-world +Kotlin apps this single command typically produces 100–300 distinct +endpoint paths, which is the most useful first artifact for documentation. + ## Hardcoded URLs and Secrets ```bash diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/call-flow-analysis.md b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/call-flow-analysis.md index 7669f62..fa8be66 100644 --- a/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/call-flow-analysis.md +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/call-flow-analysis.md @@ -84,9 +84,9 @@ Look for: - Firebase/analytics initialization - Base URL configuration -## 5. Dependency Injection (Dagger / Hilt) +## 5. Dependency Injection -Modern Android apps use DI. Trace bindings to find implementations: +### Dagger / Hilt ```bash # Hilt modules @@ -102,10 +102,43 @@ grep -rn '@Component\|@Subcomponent' sources/ grep -rn '@Inject' sources/ ``` -To trace a call flow through DI: -1. Find where an interface is used (e.g., `ApiService` injected into a repository) -2. Find the `@Provides` or `@Binds` method that creates the implementation -3. Follow the implementation to the actual HTTP call +### Koin + +Koin is the dominant DI framework in Kotlin Multiplatform and a large +share of Kotlin-only Android apps. It uses a runtime DSL rather than +compile-time generated factories, so the search patterns are different: + +```bash +# Confirm Koin is actually wired up +grep -rn 'org\.koin\.' sources/ + +# DI module declarations +grep -rn 'fun [A-Za-z]\+Module\|module\s*{\|module(' sources/ + +# Bindings inside a module DSL +grep -rn 'single\s*[<{(]\|factory\s*[<{(]\|viewModel\s*[<{(]\|scoped\s*[<{(]\|singleOf\|factoryOf' sources/ + +# Resolution call-sites (where a binding is consumed) +grep -rn '\bget\s*<\|\binject\s*<\|by\s\+inject\b\|by\s\+viewModel\b\|getKoin' sources/ +``` + +After R8, every binding lambda becomes an anonymous +`Function2` impl. To find the binding for an +interface `Foo`, look for files that contain both a Koin import / module +DSL marker and a reference to `Foo`: + +```bash +grep -rln 'org\.koin\.core\.module' sources/ | xargs grep -l 'Foo' +``` + +### Trace through DI + +1. Find where an interface is used (e.g. `ApiService` injected into a + repository). +2. Find the `@Provides` / `@Binds` method (Hilt) **or** the + `single { ... }` / `factory { ... }` block (Koin) that creates the + implementation. +3. Follow the implementation to the actual HTTP call. ## 6. Find Constants and Configuration @@ -145,8 +178,9 @@ When code is obfuscated (ProGuard/R8): 1. **Start from strings**: Search for URLs, error messages, and known constants 2. **Start from framework classes**: Activities and Fragments are named in the manifest 3. **Follow library calls**: Retrofit `@GET`/`@POST` annotations are readable even when the interface class name is obfuscated -4. **Use `--deobf`**: jadx can generate readable replacement names +4. **Recover original Kotlin names from metadata**: `@DebugMetadata` and `@Metadata.d2` strings preserve the original FQNs even after R8 obfuscation. Run `scripts/recover-kotlin-names.sh` to build an `obf -> real` map (typically recovers 30-50% of classes — and almost 100% of `*Repository` / `*ViewModel` / `*Impl`). See [`kotlin-name-recovery.md`](./kotlin-name-recovery.md). This is the single highest-leverage step on any Kotlin app. 5. **Cross-reference**: If `class a` calls `Retrofit.create(b.class)`, then `b` is a Retrofit service interface +6. **`--deobf` is rarely enough on its own**: jadx's `--deobf` renames obfuscated symbols with synthetic placeholders (`p001a`, `C0123Foo`) — useful for disambiguation but it does **not** recover original names. Pair it with the metadata recovery above. ## 8. Tracing a Complete Call Flow: Example diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/kotlin-name-recovery.md b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/kotlin-name-recovery.md new file mode 100644 index 0000000..d7d049d --- /dev/null +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/kotlin-name-recovery.md @@ -0,0 +1,108 @@ +# Recovering Original Class Names from Kotlin Metadata + +When R8/ProGuard obfuscates a Kotlin app, JVM symbols are renamed but the +**Kotlin metadata strings cannot be stripped** — the Kotlin runtime depends +on them at runtime for reflection, coroutines, and `data class` features. + +Two annotations leak the original fully-qualified names: + +## `@DebugMetadata` + +Generated for nearly every Kotlin coroutine `SuspendLambda` (i.e. almost +every `suspend` function in a modern app): + +```java +@DebugMetadata( + c = "com.example.feature.account.AccountRepositoryImpl$fetch$1", + f = "AccountRepositoryImpl.kt", + l = {42, 51}, + m = "invokeSuspend" +) +public final class a extends SuspendLambda implements Function2<...> { ... } +``` + +The `c =` field carries the original outer class FQN (with a `$` suffix +for inner / lambda scopes — strip everything after the first `$` to get the +declaring class). + +## `@Metadata.d2` + +Every Kotlin class carries a top-level `@Metadata` annotation. The `d2` +array lists internal class refs in JVM type-descriptor format +(`Lcom/example/Foo;`): + +```java +@Metadata(d1 = {"..."}, + d2 = {"...","Lcom/example/feature/account/AccountRepositoryImpl;","..."}) +public final class b implements ... { ... } +``` + +The first non-stdlib descriptor in `d2` is usually the file's primary +class. + +## How to mine them + +The skill ships two scripts: + +```bash +# Build a mapping from a decompiled sources directory: +bash scripts/recover-kotlin-names.sh /sources [mapping-dir] + +# Outputs: +# /mapping.tsv obf_fqn real_fqn file +# /mapping.json same data, JSON +# /by_package/ per-real-package index files + +# Query the mapping: +bash scripts/lookup-name.sh Repository # search +bash scripts/lookup-name.sh -o ab.cd # obf -> real +bash scripts/lookup-name.sh -p com.example.feature # list package +bash scripts/lookup-name.sh --grep '"api/' /sources + # ^ greps decompiled code and appends '// real.fqn' to each hit +``` + +## What you typically recover + +On a real-world obfuscated Kotlin app the script recovers **30 – 50 % of +classes** — but more importantly, **almost 100 % of the classes you +actually want to read**: + +| Class kind | Recovery rate | +|---------------------------|---------------| +| `*Repository` / `*Impl` | ~100 % | +| `*ViewModel` | ~100 % | +| `*UseCase` / `*Interactor`| ~100 % | +| Plain `data class` DTOs | ~80 % | +| Pure-Java helper classes | low (no Kotlin metadata) | +| Anonymous inner classes | sometimes recovered as the parent FQN | + +## Why `jadx --deobf` is not enough + +`--deobf` renames obfuscated identifiers using internal heuristics, but the +output is still synthetic (`p001a`, `C0123Foo`). It does **not** recover +the *original* names. Kotlin metadata recovery is the only reliable way to +map back to the names the developer actually wrote, and it costs essentially +nothing — just a regex pass over the decompiled sources. + +Run both: `--deobf` for fields/methods that have no metadata source, plus +the recovery script for class names. + +## Limitations + +- **Method names and field names** are not recovered. Kotlin metadata only + preserves class-level FQNs and a few signatures. For method names you + still need jadx-gui's interactive rename or pattern inference. +- **Pure-Java classes** carry no `@Metadata`, so they remain obfuscated. +- **Heavily inlined classes** (`@JvmInline value class`, top-level fun + files compiled into shared `*Kt.class` synthetic classes) sometimes show + up under the wrong filename — treat results as a strong hint, not gospel. + +## Reading flow with the mapping + +1. Run `recover-kotlin-names.sh` once after decompiling. +2. Use `lookup-name.sh --grep '' ` instead of plain `grep` + so every hit comes annotated with the real owning class. +3. When you hit an obfuscated FQN in code (e.g. `nq.e`), resolve it with + `lookup-name.sh -o nq.e` — you will often see siblings + (`nq.d`, `nq.f`, ...) that are the same class's split lambdas/inner + classes, which is useful context. diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/third_party_hosts.txt b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/third_party_hosts.txt new file mode 100644 index 0000000..976636c --- /dev/null +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/references/third_party_hosts.txt @@ -0,0 +1,122 @@ +# Third-party host denylist used by find-api-calls.sh --urls. +# +# Patterns are extended-regex hostname suffixes / fragments. A host is +# considered "third-party noise" if any pattern below matches anywhere +# in the hostname. Lines starting with '#' and blank lines are ignored. +# +# This list is intentionally conservative: when a pattern would hide a +# legitimate first-party host (e.g. an app may run its own *.s3.amazonaws.com +# bucket), keep the pattern but expect manual review of the bucketed output. + +# Google / Firebase / Play / Crashlytics +\.googleapis\.com$ +\.google\.com$ +\.gstatic\.com$ +\.googleusercontent\.com$ +\.googletagmanager\.com$ +\.googlesyndication\.com$ +\.firebaseio\.com$ +\.firebaseapp\.com$ +\.firebaseinstallations\.googleapis\.com$ +\.firebaseremoteconfig\.googleapis\.com$ +\.crashlytics\.com$ +\.app-measurement\.com$ + +# Apple / Microsoft / Adobe +\.apple\.com$ +\.icloud\.com$ +\.microsoft\.com$ +\.live\.com$ +\.office\.com$ +\.adobe\.com$ +ns\.adobe\.com + +# Meta +\.facebook\.com$ +\.fbcdn\.net$ +\.instagram\.com$ +\.whatsapp\.com$ + +# Other social / messaging / video +\.twitter\.com$ +\.x\.com$ +\.tiktok\.com$ +\.youtube\.com$ +\.youtu\.be$ +\.linkedin\.com$ +\.snapchat\.com$ +\.pinterest\.com$ +\.reddit\.com$ + +# Mobile attribution / analytics / observability +\.appsflyersdk\.com$ +\.appsflyer\.com$ +\.adjust\.com$ +\.branch\.io$ +\.amplitude\.com$ +\.segment\.com$ +\.mixpanel\.com$ +\.hotjar\.com$ +\.clarity\.ms$ +\.datadoghq\.(com|eu|us)$ +\.sentry\.io$ +\.bugsnag\.com$ +\.newrelic\.com$ +\.instabug\.com$ +\.embrace\.io$ +\.rollout\.io$ +\.launchdarkly\.com$ + +# Push / notifications +\.onesignal\.com$ +\.urbanairship\.com$ +\.airship\.com$ + +# Support / chat +\.zendesk\.com$ +\.intercom\.io$ +\.intercomcdn\.com$ +\.helpshift\.com$ +\.salesforce\.com$ +\.freshchat\.com$ +\.kustomerapp\.com$ + +# Payments +\.stripe\.com$ +\.braintreepayments\.com$ +\.braintreegateway\.com$ +\.payu\.com$ +\.payu\.in$ +\.paypal\.com$ +\.adyen\.com$ +\.checkout\.com$ +\.klarna\.com$ + +# Maps / location +\.mapbox\.com$ +\.openstreetmap\.org$ + +# Storage / CDN (often third-party even when the bucket name is app-specific) +\.s3\.amazonaws\.com$ +\.cloudfront\.net$ +\.akamaihd\.net$ +\.akamaized\.net$ +\.fastly\.net$ +\.cloudflare\.com$ +\.azureedge\.net$ + +# DNS / well-known infra +\.localhost$ +^localhost +^127\. + +# Standards / RFCs / placeholders that show up as XML/XMP namespaces +\.w3\.org$ +\.w3c\.org$ +example\.(com|org|net)$ + +# Certificate authorities +\.sectigo\.com$ +\.entrust\.com$ +\.digicert\.com$ +\.letsencrypt\.org$ diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh index db52acf..22f5ed6 100755 --- a/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh @@ -14,8 +14,12 @@ Arguments: Options: --retrofit Search only for Retrofit annotations --okhttp Search only for OkHttp patterns + --ktor Search only for Ktor client patterns + --apollo Search only for Apollo (GraphQL) patterns --volley Search only for Volley patterns --urls Search only for hardcoded URLs + --paths Extract unique endpoint-shaped path string literals + (works on heavily obfuscated apps where call sites are inlined) --auth Search only for auth-related patterns --all Search all patterns (default) -h, --help Show this help message @@ -29,8 +33,11 @@ EOF SOURCE_DIR="" SEARCH_RETROFIT=false SEARCH_OKHTTP=false +SEARCH_KTOR=false +SEARCH_APOLLO=false SEARCH_VOLLEY=false SEARCH_URLS=false +SEARCH_PATHS=false SEARCH_AUTH=false SEARCH_ALL=true @@ -38,8 +45,11 @@ while [[ $# -gt 0 ]]; do case "$1" in --retrofit) SEARCH_RETROFIT=true; SEARCH_ALL=false; shift ;; --okhttp) SEARCH_OKHTTP=true; SEARCH_ALL=false; shift ;; + --ktor) SEARCH_KTOR=true; SEARCH_ALL=false; shift ;; + --apollo) SEARCH_APOLLO=true; SEARCH_ALL=false; shift ;; --volley) SEARCH_VOLLEY=true; SEARCH_ALL=false; shift ;; --urls) SEARCH_URLS=true; SEARCH_ALL=false; shift ;; + --paths) SEARCH_PATHS=true; SEARCH_ALL=false; shift ;; --auth) SEARCH_AUTH=true; SEARCH_ALL=false; shift ;; --all) SEARCH_ALL=true; shift ;; -h|--help) usage ;; @@ -72,6 +82,58 @@ run_grep() { grep $GREP_OPTS -E "$pattern" "$SOURCE_DIR" 2>/dev/null || true } +# Print a one-screen summary FIRST so a reader knows what to expect from +# the long output that follows. Skipped when a single section flag was +# requested (the user wants raw matches, not an overview). One pass over +# the tree, counts bucketed by tag — running 8 separate greps was too slow. +if [[ "$SEARCH_ALL" == true ]]; then + section "Summary (counted in a single pass)" + declare -A H=( + [retrofit]=0 [okhttp]=0 [ktor]=0 [apollo]=0 [volley]=0 + [hilt]=0 [koin]=0 [bearer]=0 [hmac]=0 + ) + while IFS= read -r line; do + case "$line" in + *"@GET("*|*"@POST("*|*"@PUT("*|*"@DELETE("*|*"@PATCH("*|*"@HTTP("*) H[retrofit]=$((H[retrofit]+1));; + esac + case "$line" in + *"Request.Builder"*|*"HttpUrl"*|*".newCall("*) H[okhttp]=$((H[okhttp]+1));; + esac + case "$line" in + *"BearerTokens"*|*"defaultRequest {"*|*"client.get("*|*"client.post("*|*"httpClient.get("*|*"httpClient.post("*|*"HttpClient.get("*) H[ktor]=$((H[ktor]+1));; + esac + case "$line" in + *"ApolloClient"*|*".serverUrl("*) H[apollo]=$((H[apollo]+1));; + esac + case "$line" in + *"StringRequest"*|*"JsonObjectRequest"*|*"RequestQueue"*) H[volley]=$((H[volley]+1));; + esac + case "$line" in + *"@HiltAndroidApp"*|*"@AndroidEntryPoint"*|*"@HiltViewModel"*|*"@Provides"*|*"@Binds"*) H[hilt]=$((H[hilt]+1));; + esac + case "$line" in + *"org.koin."*|*"module {"*|*"single<"*|*"factory<"*|*"singleOf("*|*"factoryOf("*) H[koin]=$((H[koin]+1));; + esac + case "$line" in + *'"Bearer '*|*'"bearer '*|*"BearerTokens"*) H[bearer]=$((H[bearer]+1));; + esac + case "$line" in + *"HmacSHA"*|*'Mac.getInstance("Hmac'*) H[hmac]=$((H[hmac]+1));; + esac + done < <(grep -rEh --include='*.java' --include='*.kt' \ + '@(GET|POST|PUT|DELETE|PATCH|HTTP)\(|Request\.Builder|HttpUrl|\.newCall\(|BearerTokens|defaultRequest \{|client\.(get|post)\(|httpClient\.(get|post)\(|ApolloClient|\.serverUrl\(|StringRequest|JsonObjectRequest|RequestQueue|@HiltAndroidApp|@AndroidEntryPoint|@HiltViewModel|@Provides|@Binds|org\.koin\.|module \{|single<|factory<|"[Bb]earer |HmacSHA|Mac\.getInstance' \ + "$SOURCE_DIR" 2>/dev/null || true) + printf ' HTTP framework: Retrofit=%-5s OkHttp=%-5s Ktor=%-5s Apollo=%-5s Volley=%-5s\n' \ + "${H[retrofit]}" "${H[okhttp]}" "${H[ktor]}" "${H[apollo]}" "${H[volley]}" + printf ' DI framework: Hilt/Dagger=%-5s Koin=%-5s\n' \ + "${H[hilt]}" "${H[koin]}" + printf ' Auth signals: Bearer=%-5s HMAC/Sign=%-5s\n' \ + "${H[bearer]}" "${H[hmac]}" + echo + echo " Run with one of --retrofit / --okhttp / --ktor / --apollo / --volley /" + echo " --paths / --urls / --auth to inspect a single section." +fi + # --- Retrofit --- if [[ "$SEARCH_ALL" == true || "$SEARCH_RETROFIT" == true ]]; then section "Retrofit Annotations" @@ -90,16 +152,123 @@ if [[ "$SEARCH_ALL" == true || "$SEARCH_OKHTTP" == true ]]; then run_grep '(\.url\s*\(|\.addQueryParameter|\.addPathSegment|\.scheme\s*\(|\.host\s*\()' fi +# --- Ktor (Kotlin) --- +# Ktor doesn't use annotations. Endpoints appear as string args to +# client.get/post/etc., or are built via HttpRequestBuilder.url(...). Auth +# is configured via the bearer { loadTokens / refreshTokens } DSL. +if [[ "$SEARCH_ALL" == true || "$SEARCH_KTOR" == true ]]; then + section "Ktor — Client Calls" + run_grep '\b(client|httpClient|HttpClient)\.(get|post|put|delete|patch|head|request)\s*[<(]' + section "Ktor — Request Building / Default Request" + run_grep '(HttpRequestBuilder|defaultRequest\s*\{|\burl\s*\(\s*"|URLBuilder|URLProtocol)' + section "Ktor — Auth Plugin (Bearer / Refresh)" + run_grep '(\bbearer\s*\{|BearerTokens\s*\(|loadTokens\s*\{|refreshTokens\s*\{|\bAuth\s*\)\s*\{)' +fi + +# --- Apollo (GraphQL) --- +if [[ "$SEARCH_ALL" == true || "$SEARCH_APOLLO" == true ]]; then + section "Apollo — GraphQL Client" + run_grep '(ApolloClient|\.serverUrl\s*\(|\.subscriptionNetworkTransport|HttpNetworkTransport)' + section "Apollo — Operations" + run_grep '(\.query\s*\(\s*[A-Z]|\.mutation\s*\(\s*[A-Z]|\.subscription\s*\(\s*[A-Z])' +fi + # --- Volley --- if [[ "$SEARCH_ALL" == true || "$SEARCH_VOLLEY" == true ]]; then section "Volley Requests" run_grep '(StringRequest|JsonObjectRequest|JsonArrayRequest|ImageRequest|RequestQueue|Volley\.newRequestQueue)' fi +# --- Endpoint-shaped path literals --- +# Survives R8 obfuscation: even when call sites are inlined to a.b(c, "path"), +# the path strings themselves are not obfuscated. This produces a deduplicated +# inventory of likely API endpoints that other modes miss. +if [[ "$SEARCH_ALL" == true || "$SEARCH_PATHS" == true ]]; then + section "Endpoint-Shaped Path Literals (deduplicated)" + # Quoted strings that begin with / or / where the leading + # segment is a typical API root word. Cap segment count and length to keep + # the regex grounded. + # An endpoint-shaped string is one of: + # "/seg/seg..." — absolute path with >= 2 segments + # "api-root/seg/seg..." — relative path starting with a known + # API root keyword and containing >= 1 + # '/' followed by another segment + # Segments are URL-safe chars plus {} for path-template placeholders. + SEG='[A-Za-z0-9_{}.\-]+' + ROOT='(api|v[0-9]+|graphql|rest|mobile|auth|oauth|sso|users?|account|session|token|register|signup|signin|logout|password|verify|otp|sms|profile|customer|cart|basket|order|checkout|payment|invoice|product|catalog|inventory|search|category|favo[u]?rites?|wishlist|address|location|delivery|shipping|review|feedback|notification|push|message|chat|track|event|stat[a-z]*|metric|config|settings?|feature|flag|banner|content|media|upload|download|file|image|video|live|stream|webhook|callback)' + PATHS_REGEX="\"(/${SEG}(/${SEG})+/?|${ROOT}(/${SEG})+/?)\"" + # Filter out frequent false positives (MIME types, /proc, /sys, /dev). + EXCLUDE='^"(image|video|audio|text|application|content|font|model|multipart|message)/|^"/(proc|sys|dev|tmp|etc|usr|var|opt)/' + # Print a flat unique list rather than file:line — this is the inventory. + grep -rhoE --include='*.java' --include='*.kt' "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \ + | grep -Ev "$EXCLUDE" \ + | sort -u + echo + section "Endpoint-Shaped Path Literals — call sites" + grep $GREP_OPTS -E "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \ + | grep -Ev ":[0-9]+:.*${EXCLUDE#^}" || true +fi + # --- Hardcoded URLs --- +# A loose grep for http(s)://... drowns in compression-dictionary garbage and +# in third-party SDK URLs (Google, Firebase, AppsFlyer, Datadog, ...). The +# strict regex requires a syntactically valid hostname and rejects strings +# containing whitespace, angle brackets, or non-printable bytes. Hosts are +# then bucketed into "first-party candidates" vs "third-party (denylist)". if [[ "$SEARCH_ALL" == true || "$SEARCH_URLS" == true ]]; then - section "Hardcoded URLs (http:// and https://)" - run_grep '"https?://[^"]+' + HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + DENYLIST="$HERE/../references/third_party_hosts.txt" + # Hostname must have at least one dot and end in a 2+ letter TLD. + STRICT_URL='https?://[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)+\.[A-Za-z]{2,}(:[0-9]{1,5})?(/[^"<>[:space:]]*)?' + + TMP="$(mktemp)" + trap 'rm -f "$TMP"' EXIT + grep -rhoE --include='*.java' --include='*.kt' "$STRICT_URL" "$SOURCE_DIR" 2>/dev/null \ + | sort -u > "$TMP" + + # Extract host: strip scheme, take part up to first ':' or '/'. + HOSTS_TMP="$(mktemp)" + sed -E 's#^https?://##; s#[/:].*$##' "$TMP" | sort -u > "$HOSTS_TMP" + + if [[ -f "$DENYLIST" ]]; then + # Build a single combined regex from the denylist (one line each). + DENY_REGEX="$(grep -vE '^\s*(#|$)' "$DENYLIST" | tr '\n' '|' | sed 's/|$//')" + THIRD_HOSTS=$(grep -E "$DENY_REGEX" "$HOSTS_TMP" || true) + FIRST_HOSTS=$(grep -vE "$DENY_REGEX" "$HOSTS_TMP" || true) + else + THIRD_HOSTS="" + FIRST_HOSTS=$(cat "$HOSTS_TMP") + fi + + section "Likely First-Party Hosts (frequency-sorted)" + if [[ -n "$FIRST_HOSTS" ]]; then + while IFS= read -r h; do + [[ -z "$h" ]] && continue + n=$(grep -cE "://${h//./\\.}([/:\"]|$)" "$TMP" || true) + printf ' %5d %s\n' "$n" "$h" + done <<< "$FIRST_HOSTS" | sort -rn -k1 + else + echo " (none — every URL matched the third-party denylist)" + fi + + section "Third-Party Hosts (denylist matches, collapsed)" + if [[ -n "$THIRD_HOSTS" ]]; then + echo "$THIRD_HOSTS" | sed 's/^/ /' + else + echo " (none)" + fi + + section "All First-Party URLs (full strings)" + if [[ -n "$FIRST_HOSTS" ]]; then + while IFS= read -r h; do + [[ -z "$h" ]] && continue + grep -E "://${h//./\\.}([/:\"]|$)" "$TMP" | sed 's/^/ /' + done <<< "$FIRST_HOSTS" + fi + + rm -f "$HOSTS_TMP" "$TMP" + trap - EXIT + section "HttpURLConnection" run_grep '(openConnection|setRequestMethod|HttpURLConnection|HttpsURLConnection)' section "WebView URLs" @@ -109,9 +278,27 @@ fi # --- Auth patterns --- if [[ "$SEARCH_ALL" == true || "$SEARCH_AUTH" == true ]]; then section "Authentication & API Keys" - run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token)' + run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token|refresh[_-]?token)' + + # Request-signing schemes: a hardcoded HMAC / RSA secret in an APK is a + # security finding worth surfacing prominently. These patterns catch the + # common shapes of homegrown / SDK-issued request signers. + section "Request Signing (HMAC / signature schemes)" + run_grep '(HmacSHA(1|256|512)|Mac\.getInstance\("Hmac|SecretKeySpec\(|Signature\.getInstance\()' + run_grep -i '(x-signature|x-client-authorization|x-amz-signature|x-hmac|aws4-hmac|signRequest|signatureFor|computeSignature|signaturev[0-9])' + + # Hardcoded high-entropy strings adjacent to "secret"/"key" assignments + # are the canonical leaked-credential pattern. + section "Possible Hardcoded Secrets / Keys" + run_grep -i '(app[_-]?secret|client[_-]?secret|signing[_-]?key|hmac[_-]?secret|consumer[_-]?secret|private[_-]?key)' + section "Base URLs and Constants" run_grep -i '(BASE_URL|API_URL|SERVER_URL|ENDPOINT|API_BASE|HOST_NAME)' + + # Ktor BearerTokens / refresh DSL — common on Kotlin apps and lives on + # Ktor's public API, so it survives R8 unchanged. + section "Ktor Auth (Bearer + Refresh)" + run_grep '(BearerTokens|loadTokens\s*\{|refreshTokens\s*\{|\bbearer\s*\{)' fi echo diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/fingerprint.sh b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/fingerprint.sh new file mode 100755 index 0000000..c494358 --- /dev/null +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/fingerprint.sh @@ -0,0 +1,241 @@ +#!/usr/bin/env bash +# fingerprint.sh — Triage an APK/XAPK before decompiling. +# +# Detects mobile framework (Flutter, React Native, Cordova/Capacitor, +# Xamarin, KMP/native), HTTP-stack hints, obfuscation level, native libs, +# and notable third-party SDKs. +# +# Decompiling Java is mostly useless for Flutter / RN / Xamarin / Cordova +# apps — different tools are needed. Run this BEFORE Phase 2 to choose +# the right path. + +set -euo pipefail + +usage() { + cat < + +Prints a one-screen summary: + * mobile framework (with rationale) + * HTTP / DI / serialization stack hints + * obfuscation indicator + * native libraries (consolidated across split APKs) + * notable third-party SDKs found in assets/ +EOF + exit 0 +} + +[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage +INPUT="$1" +[[ ! -f "$INPUT" ]] && { echo "File not found: $INPUT" >&2; exit 1; } + +TMP="$(mktemp -d -t apkfp.XXXXXX)" +trap 'rm -rf "$TMP"' EXIT + +# Resolve to a list of APKs (handle XAPK = ZIP of APKs) +APKS=() +case "${INPUT,,}" in + *.xapk|*.apks|*.apkm) + unzip -q -o "$INPUT" -d "$TMP/xapk" + while IFS= read -r p; do APKS+=("$p"); done < <(find "$TMP/xapk" -maxdepth 2 -type f -name '*.apk') + ;; + *.apk) + APKS=("$INPUT") + ;; + *) + echo "Unsupported input: $INPUT" >&2; exit 1 ;; +esac + +# Aggregate ZIP listings from every APK in the bundle (split-aware view) +LISTING="$TMP/listing.txt" +: > "$LISTING" +for apk in "${APKS[@]}"; do + unzip -l -- "$apk" 2>/dev/null | awk '{print $NF}' >> "$LISTING" +done + +# Most class-level libs live inside classes*.dex, not as visible zip paths. +# Extract the type-name strings out of each dex with `strings` and append them +# to the listing so `has()` can match e.g. 'io/ktor/' or 'org/koin/'. +DEX_STRINGS="$TMP/dex_strings.txt" +: > "$DEX_STRINGS" +for apk in "${APKS[@]}"; do + for dex in $(unzip -Z1 -- "$apk" 2>/dev/null | grep -E '^classes[0-9]*\.dex$' || true); do + # DEX type descriptors look like "Lcom/foo/Bar;". Extract the inner + # slash-separated FQN so callers can match e.g. 'io/ktor/' directly. + unzip -p -- "$apk" "$dex" 2>/dev/null \ + | strings -n 8 \ + | grep -oE 'L[a-z][a-zA-Z0-9_]*(/[a-zA-Z0-9_$]+)+;' \ + | sed -E 's/^L//; s/;$//' \ + >> "$DEX_STRINGS" || true + done +done +sort -u "$DEX_STRINGS" -o "$DEX_STRINGS" + +has() { grep -qE "$1" "$LISTING" || grep -qE "$1" "$DEX_STRINGS"; } + +# ---------------------------------------------------------------------- +# Framework detection (priority order — first match wins) +# ---------------------------------------------------------------------- +FRAMEWORK="unknown" +RATIONALE="" + +if has '^lib/[^/]+/libflutter\.so$'; then + FRAMEWORK="Flutter" + RATIONALE="lib//libflutter.so present" + has '^lib/[^/]+/libapp\.so$' && RATIONALE+="; libapp.so contains AOT-compiled Dart" +elif has '^lib/[^/]+/libhermes\.so$' || has '^assets/index\.android\.bundle$' || has '^lib/[^/]+/libreactnativejni\.so$'; then + FRAMEWORK="React Native" + reasons=() + has '^lib/[^/]+/libhermes\.so$' && reasons+=("libhermes.so") + has '^lib/[^/]+/libreactnativejni\.so$' && reasons+=("libreactnativejni.so") + has '^assets/index\.android\.bundle$' && reasons+=("assets/index.android.bundle") + RATIONALE="${reasons[*]}" +elif has '^assets/www/index\.html$' || has '^assets/www/cordova\.js$' || has '^assets/public/index\.html$'; then + FRAMEWORK="Cordova / Capacitor (WebView hybrid)" + RATIONALE="assets/www/ or assets/public/ shell present" +elif has '^lib/[^/]+/libmonodroid\.so$' || has '^assemblies/'; then + FRAMEWORK="Xamarin / .NET MAUI" + RATIONALE="libmonodroid.so or assemblies/ present — code is in .NET DLLs" +elif has '^lib/[^/]+/libmaui\.so$'; then + FRAMEWORK=".NET MAUI" + RATIONALE="libmaui.so present" +elif has '^assets/flutter_assets/' && ! has '^lib/[^/]+/libflutter\.so$'; then + FRAMEWORK="Flutter (code-only split?)" + RATIONALE="flutter_assets/ but no libflutter.so in this APK — check splits" +else + # Native: distinguish Compose vs classic Android by androidx.compose presence + if has 'androidx\.compose'; then + FRAMEWORK="Native Android (Kotlin + Jetpack Compose)" + RATIONALE="androidx.compose.* libraries detected" + elif has '^META-INF/.*\.kotlin_module$'; then + FRAMEWORK="Native Android (Kotlin)" + RATIONALE="kotlin_module metadata present, no Compose markers" + else + FRAMEWORK="Native Android (Java/Kotlin)" + RATIONALE="no cross-platform framework markers found" + fi +fi + +# ---------------------------------------------------------------------- +# HTTP / DI / serialization stack hints +# ---------------------------------------------------------------------- +http=() +has 'retrofit2' && http+=("Retrofit") +has 'okhttp3' && http+=("OkHttp") +has 'io/ktor/' && http+=("Ktor") +has 'com/apollographql/' && http+=("Apollo (GraphQL)") +has 'com/android/volley' && http+=("Volley") + +di=() +has 'dagger/hilt/' && di+=("Hilt") +has '^META-INF/.*dagger.*' && di+=("Dagger") +has 'org/koin/' && di+=("Koin") +has 'javax/inject/' && [[ ${#di[@]} -eq 0 ]] && di+=("javax.inject") + +ser=() +has 'kotlinx/serialization/' && ser+=("kotlinx.serialization") +has 'com/google/gson/' && ser+=("Gson") +has 'com/squareup/moshi/' && ser+=("Moshi") +has 'com/fasterxml/jackson/' && ser+=("Jackson") + +# ---------------------------------------------------------------------- +# Obfuscation indicator (R8/ProGuard) — count single-letter dex packages +# ---------------------------------------------------------------------- +# Note: pipefail is on, so guard greps that may legitimately return 0 matches. +short_dirs=$( { grep -oE '^[a-z]{1,2}/' "$LISTING" || true; } | sort -u | wc -l | tr -d ' ') +if [[ "$short_dirs" -gt 30 ]]; then + OBFUSCATION="HIGH ($short_dirs single/double-letter dirs at root)" +elif [[ "$short_dirs" -gt 10 ]]; then + OBFUSCATION="MODERATE ($short_dirs short root dirs)" +else + OBFUSCATION="LOW (no significant short-name namespace pollution)" +fi + +# ---------------------------------------------------------------------- +# Native libraries (consolidated) +# ---------------------------------------------------------------------- +NATIVE=$(grep -E '^lib/[^/]+/[^/]+\.so$' "$LISTING" | sort -u || true) + +# ---------------------------------------------------------------------- +# Notable third-party SDKs (assets-based markers) +# ---------------------------------------------------------------------- +sdks=() +has '^assets/com/appsflyer/' && sdks+=("AppsFlyer") +has 'datadog\.buildId|com/datadog/' && sdks+=("Datadog") +has 'io/sentry/' && sdks+=("Sentry") +has 'com/google/firebase/' && sdks+=("Firebase") +has 'com/google/android/gms/' && sdks+=("Google Play Services") +has 'com/facebook/' && sdks+=("Facebook SDK") +has 'com/payu/' && sdks+=("PayU") +has 'com/stripe/' && sdks+=("Stripe") +has 'com/braintreepayments/' && sdks+=("Braintree") +has 'com/storyteller/' && sdks+=("Storyteller") +has 'zendesk/' && sdks+=("Zendesk") +has 'com/intercom/' && sdks+=("Intercom") +has 'com/segment/analytics' && sdks+=("Segment") +has 'com/amplitude/' && sdks+=("Amplitude") +has 'com/mixpanel/' && sdks+=("Mixpanel") +has 'com/onesignal/' && sdks+=("OneSignal") +has 'com/microsoft/clarity' && sdks+=("Microsoft Clarity") +has 'com/hotjar/' && sdks+=("Hotjar") +has 'com/instabug/' && sdks+=("Instabug") + +# BuildConfig.java is almost never obfuscated and often holds base URLs / flavor. +if has 'BuildConfig\.class$'; then + BUILDCONFIG="present (grep BuildConfig.java after decompile for base URLs / flavor)" +else + BUILDCONFIG="not detected in zip listing (still worth grepping after decompile)" +fi + +# ---------------------------------------------------------------------- +# Summary +# ---------------------------------------------------------------------- +echo "=== APK Fingerprint: $(basename "$INPUT") ===" +echo +echo "Framework: $FRAMEWORK" +echo " Rationale: $RATIONALE" +echo "Obfuscation: $OBFUSCATION" +echo +echo "HTTP stack: ${http[*]:-none detected}" +echo "DI: ${di[*]:-none detected}" +echo "Serialization: ${ser[*]:-none detected}" +echo "BuildConfig: $BUILDCONFIG" +echo +echo "Third-party SDKs: ${sdks[*]:-none detected}" +echo +echo "Native libraries (consolidated across splits):" +if [[ -n "$NATIVE" ]]; then + echo "$NATIVE" | sed 's/^/ /' +else + echo " (none)" +fi +echo + +# ---------------------------------------------------------------------- +# Recommendation +# ---------------------------------------------------------------------- +echo "Recommended next step:" +case "$FRAMEWORK" in + Flutter*) + echo " Java decompilation will yield ~no app code. The Dart logic lives in" + echo " libapp.so (AOT). Use tools designed for Flutter:" + echo " - reFlutter / Doldrums / blutter (extract Dart class structure)" + echo " - strings/rabin2 on libapp.so for endpoints & string constants" + ;; + React*) + echo " Java code is just the RN host. Real app logic is in JS/Hermes:" + echo " - if Hermes: hbctool disasm assets/index.android.bundle" + echo " - if JSC: js-beautify the bundle and grep for 'fetch('/'axios'" + ;; + Cordova*) + echo " All app code is in assets/www/ (or assets/public/). Just unzip and" + echo " inspect the HTML/JS — no Java decompile needed." + ;; + Xamarin*|.NET*) + echo " App logic is in .NET DLLs (assemblies/). Use ILSpy or dotPeek;" + echo " jadx will only show the Mono host." + ;; + *) + echo " Proceed with Phase 2: bash scripts/decompile.sh " + ;; +esac diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/lookup-name.sh b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/lookup-name.sh new file mode 100755 index 0000000..164d558 --- /dev/null +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/lookup-name.sh @@ -0,0 +1,85 @@ +#!/usr/bin/env bash +# lookup-name.sh — Query the mapping produced by recover-kotlin-names.sh. +# +# Modes: +# lookup-name.sh search by real-FQN substring +# lookup-name.sh -o resolve obf -> real +# lookup-name.sh -p list a real package +# lookup-name.sh --grep +# grep decompiled sources and annotate each hit with the real class name + +set -euo pipefail + +usage() { + cat < + lookup-name.sh -o + lookup-name.sh -p + lookup-name.sh --grep + + is the directory produced by recover-kotlin-names.sh +(must contain mapping.json). +EOF + exit 0 +} + +[[ $# -lt 2 ]] && usage +DIR="$1"; shift +[[ ! -f "$DIR/mapping.json" ]] && { echo "no mapping.json in $DIR" >&2; exit 1; } + +python3 - "$DIR" "$@" <<'PY' +import json, os, re, sys, subprocess +DIR = sys.argv[1] +args = sys.argv[2:] +MAP = json.load(open(os.path.join(DIR, "mapping.json"))) +REV = {} +for o, r in MAP.items(): + REV.setdefault(r, []).append(o) + +def search(q): + ql = q.lower() + for r in sorted(REV): + if ql in r.lower(): + print(r) + for o in sorted(REV[r]): + print(f" {o}") + +def by_obf(o): + if o not in MAP: + print(f"no mapping for {o}", file=sys.stderr); sys.exit(1) + print(f"{o} -> {MAP[o]}") + sibs = [s for s in REV[MAP[o]] if s != o] + for s in sorted(sibs): + print(f" sibling: {s}") + +def by_pkg(p): + pl = p.lower() + for r in sorted(REV): + if pl in r.rsplit(".", 1)[0].lower(): + print(r) + for o in sorted(REV[r]): + print(f" {o}") + +def grep_annot(pattern, sources): + res = subprocess.run( + ["grep", "-rEn", "--include=*.java", pattern, sources], + capture_output=True, text=True) + for line in res.stdout.splitlines(): + try: + path, lineno, content = line.split(":", 2) + except ValueError: + continue + rel = os.path.relpath(path, sources) + obf = rel.replace(os.sep, ".")[:-5] + suffix = f" // {MAP[obf]}" if obf in MAP else "" + print(f"{rel}:{lineno}:{content}{suffix}") + +if args[0] == "-o" and len(args) == 2: + by_obf(args[1]) +elif args[0] == "-p" and len(args) == 2: + by_pkg(args[1]) +elif args[0] == "--grep" and len(args) == 3: + grep_annot(args[1], args[2]) +else: + search(" ".join(args)) +PY diff --git a/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh new file mode 100755 index 0000000..824af60 --- /dev/null +++ b/plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh @@ -0,0 +1,140 @@ +#!/usr/bin/env bash +# recover-kotlin-names.sh — Rebuild a (obfuscated -> real) class-name map +# from Kotlin metadata strings left in decompiled sources. +# +# R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings — +# the Kotlin runtime (reflection, coroutines) needs them at runtime. Two +# annotations carry the original FQN: +# +# * @DebugMetadata(c = "", f = "", ...) +# emitted for almost every `suspend` function (every coroutine +# SuspendLambda). +# +# * @Metadata(... d2 = {"...L;..."} ...) listing internal +# class refs of the file. +# +# Typical recovery on a real-world app: 30-50 % of classes regain their real +# names — usually 100 % of the *Repository / *ViewModel / *UseCase / *Impl +# classes you actually want to read. + +set -euo pipefail + +usage() { + cat < [output-dir] + +Walks every *.java under , mines @DebugMetadata +and @Metadata annotations, and writes: + + /mapping.tsv tab-separated obf_fqn real_fqn file + /mapping.json same data as JSON { obf_fqn: real_fqn, ... } + /by_package/ one file per real package, listing + real_fqn obf_fqn file + +If [output-dir] is omitted, files are written next to the sources dir. +EOF + exit 0 +} + +[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage +SRC="$1" +OUT="${2:-$(dirname "$SRC")/mapping}" +[[ ! -d "$SRC" ]] && { echo "not a directory: $SRC" >&2; exit 1; } + +mkdir -p "$OUT/by_package" + +python3 - "$SRC" "$OUT" <<'PY' +import os, re, sys, json +from collections import defaultdict + +SRC, OUT = sys.argv[1], sys.argv[2] + +# @DebugMetadata(c = "com.foo.Bar$Inner$1", ...) +RE_DEBUG = re.compile(r'@DebugMetadata\([^)]*?c\s*=\s*"([^"]+)"', re.S) +# @Metadata(... d2 = { "...Lcom/foo/Bar;..." ...} ) +RE_DTWO = re.compile(r'@Metadata\([^)]*?d2\s*=\s*\{([^}]*)\}', re.S) +RE_LCLASS = re.compile(r'L([A-Za-z][\w/$]+);') +# jadx sometimes emits this comment for renamed classes +RE_RENAMED = re.compile(r'/\*\s*renamed from:\s*([\w.$]+)\s*\*/') + +# Skip third-party / framework trees — their names are already real. +SKIP_PREFIXES = ( + "kotlin.", "kotlinx.", "androidx.", "android.", "java.", "javax.", + "com.google.", "com.facebook.", "com.appsflyer.", "com.datadog.", + "io.ktor.", "io.sentry.", "io.realm.", "okhttp3.", "okio.", + "com.squareup.", "com.bumptech.", "com.airbnb.", "com.payu.", + "com.storyteller.", "zendesk.", "io.intercom.", "com.microsoft.", + "com.tinder.", "com.hotjar.", "com.amplitude.", "com.segment.", + "com.mixpanel.", "com.onesignal.", "com.stripe.", "com.braintreepayments.", + "retrofit2.", "dagger.", "javax.inject.", "org.jetbrains.", +) + +mapping = {} +file_real = {} +counts = defaultdict(int) + +for dp, _, files in os.walk(SRC): + for f in files: + if not f.endswith(".java"): + continue + path = os.path.join(dp, f) + rel = os.path.relpath(path, SRC) + obf = rel[:-5].replace(os.sep, ".") + if obf.startswith(SKIP_PREFIXES): + continue + try: + text = open(path, "r", errors="replace").read() + except OSError: + continue + real = None + + m = RE_DEBUG.search(text) + if m: + real = m.group(1).split("$", 1)[0] + counts["debug_meta"] += 1 + + if not real: + m = RE_DTWO.search(text) + if m: + for lm in RE_LCLASS.finditer(m.group(1)): + cand = lm.group(1).replace("/", ".").split("$", 1)[0] + if "." in cand and not cand.startswith(("kotlin.", "java.", "android")): + real = cand + counts["d2"] += 1 + break + + if not real: + m = RE_RENAMED.search(text) + if m: + real = m.group(1) + counts["renamed"] += 1 + + if real: + mapping[obf] = real + file_real[obf] = path + +with open(os.path.join(OUT, "mapping.tsv"), "w") as f: + f.write("obf_fqn\treal_fqn\tfile\n") + for k in sorted(mapping): + f.write(f"{k}\t{mapping[k]}\t{file_real[k]}\n") + +with open(os.path.join(OUT, "mapping.json"), "w") as f: + json.dump(mapping, f, indent=2, sort_keys=True) + +by_pkg = defaultdict(list) +for obf, real in mapping.items(): + pkg = real.rsplit(".", 1)[0] if "." in real else "(default)" + by_pkg[pkg].append((real, obf, file_real[obf])) + +for pkg, rows in by_pkg.items(): + safe = pkg.replace(".", "_") or "default" + with open(os.path.join(OUT, "by_package", f"{safe}.txt"), "w") as f: + for real, obf, p in sorted(rows): + f.write(f"{real}\t{obf}\t{p}\n") + +print(f"Recovered {len(mapping)} class names") +for k, v in counts.items(): + print(f" via {k}: {v}") +print(f"Real packages: {len(by_pkg)}") +print(f"Wrote {OUT}/mapping.tsv, mapping.json, by_package/") +PY