Mobile Visualizations for Reverse Engineering & Debugging

An introduction to coverage guided reverse engineering on iOS and Android

Luke M - @datalocaltmp

$whoami

  • @datalocaltmp on X/Twitter

  • Founded Signal 11 Research - particularly focused on Android/iOS security

  • Previously investigated privacy issues within mobile applications under theappanalyst.com

    • Involved lots of reversing; one instance resulted in Apple issuing a legal notice to Air Canada and many other apps secretly recording in-app screens.
  • Claimed bounties with: Meta, Amazon, Match.com, Biden Campaign, etc.

    • DEFCON 2024 - XR Village Talk - "Pwning through the Metaverse"

Resources

Pre-requisites

Content

  • Introduction

    • What are Control Flow Graphs & Coverage
    • Current tooling and gaps for mobile
  • Classic vs Coverage Guided RE

  • iOS Coverage

    • Example - TrollInstallerX Priv Esc.
  • Android Native Coverage

    • Example - Messenger Native Library
  • Android Java Coverage

    • Example - Messenger NotificationManager

Control Flow Graphs

  • Represent the flow of logic throughout a program

  • Nodes represent basic blocks of execution

    • Generally if you execute the first instruction you'll execute the last*
  • Paths represent branching statements

  • Many decompilers support CFG generation

    • IDA, Ghidra, Binary Ninja etc.
    • And even JADX to my surprise

Coverage

  • Metadata on which basic blocks have been executed

  • Generally captured in the drcov format and generated by:

  • Can help with answering some really important questions!

    • "What executed before this segfault?"
    • Backtraces do not include the executed blocks leading to a crash

Current Tooling & Mobile Gaps

  • The most mature tooling is Lighthouse which is available for IDA and Binja

    • But we're using Cartographer by NCCGroup for Ghidra 🐉
  • And for generating on-demand drcov files Lighthouse has their frida-drcov.py script

    • But if that worked right out of the box I wouldn't be here...
  • frida-drcov.py is very generous in it's instrumentation of everything

  • Even when restricting down to the module/thread

    • iOS coverage collection is not possible due to large performance hit
    • Certain Android native libraries are so large the number of intercepted functions causes crashes
  • There was no tooling for visualizing coverage on the Android Java layer

Classic Reverse Engineering & Debugging

  • Often we only have the binary sample to understand a programs logic

  • The workflow generally consists of both dynamic and static analysis.

    • Static - Decompiling w/ JADX, IDA, Ghidra, Binja etc.
    • Dynamic - Debugging w/ GDB, LLDB, or even Frida.
  • Problem:

    • Reasoning about a large and complex binary is very time consuming.
    • Static & Dynamic analysis artifacts are not often associated well.

Coverage Guided Reverse Engineering

  • Combines both Static and Dynamic analysis
  • Best illustrated via a program's Control Flow Graph (CFG)
    • As illustrated in the image to the right
    • can also be in the listing view as highlighted portions of code
  • Able to run binary operations on the coverage
    • i.e Input A produced coverage B - Input C produced coverage D
      • What is the intersection of coverage B and D
      • What executed in coverage B but not in coverage D
  • Coverage collected from dynamic analysis used to accelerate efforts
    • Focuses efforts on code that is actually executing

Generating iOS Coverage

  • iOS compiles their binaries into one large shared cache

    • Introduces some difficulties when collecting coverage as it's significantly larger to instrument
  • Requires we modify the standard tooling to account

    • i.e. hone in our instrumentation to the Objective-C & Native methods we're interested in.

Workshop Exercise #1 - iOS App Coverage Collection

Generating Android Coverage

  • Two layers of execution - Java and Native
  • Generating Android native layer coverage collection is well understood
    • ... but being well understood doesn't mean it's perfect
  • Generating Android Java layer coverage less so

Demo - Android Native Coverage Collection

  • Demo of collecting native layer coverage within Facebook Messenger
    • Specifically for libmsysinfra.so which handles messaging

Exercise #2 - Android Native Coverage Collection

  • Non-rooted example of collecting native coverage for fvsa
    • Uses frida-gadget.so rather than frida-server

Exercise #2 - Android Native Coverage Collection

  • Non-rooted example of collecting native coverage for fvsa
    • Uses frida-gadget.so rather than frida-server

What about Java layer coverage?

FlowFinder

  • Using JADX surprisingly has support for generating CFGs in the .dot format

    • Each basic block consists of Smali code
  • Not yet integrated into the JADX plugin eco-system

    • At creation the plugin frameworks were difficult to integrate with but seem to have improved

FlowFinder - Usage

  1. Produce .dot files using JADX.
  2. Import .dot files into FlowFinder.
  3. Generate associated frida scripts for .dot file.
  4. Execute frida scripts to create new .dot file annotate with coverage.
  5. Load new .dot file to see which blocks in the Java layer executed.

Demo - Android Java Coverage Collection

Exercise #2 - Android Java Coverage Collection

  • Practical demo of collecting java layer coverage within TrendMicro CTF apk

FlowFinder Shortcomings

  • Duplicate internal calls not able to distinguish associated node
  • Basic blocks that do not contain any calls are not able to be instrumented
  • frida can be finicky
    • Sometimes methods and classes just can’t be found
    • Java.deoptimizeEverything() could potentially solve the problem by forcing the VM to execute everything with the interpreter.

FlowFinder Future Improvements

  • Perhaps we need to rewrite the smali to add calls per basic block

    • Not a good solution because users would need to patch and resign .apk's
  • Is there a solution that could be fully implemented in our frida script?

    • Watch one Laurie Wired presentation on Packers later ....
    • Yes! We can use DexClassLoader to clone the implementations of our methods with injected intrumentation
  • Aiming to work on this immediately to improve functionality

In Summary

  • Generating visualizations to guide your reverse engineering/debugging tasks can really speed things up

  • Lots of additional benefits!

    • Diffing multiple coverage files to determine difference between inputs, execution frequency, coverage %
  • Solutions for iOS, Android Native, and Android Java layers

    • In various states of maturity
  • Good luck!

Questions?

Thanks!

Hi everyone... I'm Luke ... Datalocaltmp on twitter... Today I'm here to give you a workshop on Mobile Visualizations for Reverse Engineering and Debugging. The aim of this workshop is to give you an introduction and the tools to get started in coverage guided reverse engineering.

Quick obligatory whoami datalocaltmp Founded Signal 11 Research where I am the sole employee - working primarmily on Android and iOS security. Worked on consumer privacy with regards to apps on Android and iOS - causing some issues for big companies like AirCanada. Claimed a bunch of bounties and will be presenting at DEFCON this year at the XR Village.

* If you're looking to follow along today: * I'm going to be showing all my work in Ghidra - but once you have the coverage files they will load into IDA or Binja fine * For my work I use Cartographer in Ghidra and that's what today's workshop will be in - but if you're inclined you can go download the Lighthouse coverage tool for Binja or IDA. * for iOS you'll need an iPhone that is jailbroken - with TrollInstallerX - and Frida. I'm not expecting many to be able to follow this portion; but it's available to follow along from home later. * for Android - While you can use a non-rooted device by injecting a frida-gadget - I'm going to work under the assumption you have a rooted emulator or device - if you're lacking one I have a spare!

* So what are we covering today * I'll be making sure we're all on the same page when it comes to cfg and coverage - just so we start off on the right foot * I'll go into some examples of when coverage guided reverse engineering can help. * Then we'll get into the meat and potatoes of producing coverage for mobile

* Represent the flow of logic within a program * Basic blocks are represented as nodes * Paths represent conditional branches * Many decompilers support CFG generation * On the right is actually a CFG from a Meta app - and I wanted to know - which of the basic blocks were executing in the red and blue zones before a crash occurred later on?

* Well the way I can answer that is by generating Coverage for this function * The image on the right is a zoomed in version of the previous CFG with all the blocks that executed before the crash highlighted if they executed * Saved me a lot of time and breakpoints * But in general - when I say Coverage I mean "Metadata on which basic blocks executed within a program" * As it stands for native execution this generally is captured in the DRCOV file format * DRCOV is produced by a bunch of tools but primary ones that this crowd would likely be interested in are fuzzers and frida