Reverse Engineering APK Files Intro - Tech Projects/Documentations
image

Reverse Engineering APK Files Intro

Reverse engineering in android intro:

Case study: Android Package Kit (APK)

Author: Eze-Odikwa Tochukwu Jed

Introduction:

Android applications are software applications that run on the android operating system which runs on a variety of devices from smartphones, tablets, smart watches, televisions and a lot more. The android package file short for APK is a compiled form of high level language e.g.; java, kotlin and other resources which is packaged to run on the android OS.

What you will learn:

In this tutorial you will learn about:

  • Reverse engineering of APK files.
  • Dalvik byte code.
  • Smali/Backsmali.
  • Working with smali code.
  • Tool belts for reverse engineering APK files.

So what is reverse engineering?

In software engineering and in the android development circle it is known as decompiling because when you package resources at the time of development, compilation from source code to byte code takes place. Decompiling is a process of unzipping/taking apart the android package file and analyzing its byte code, source code.

A decompiler program examines the binary of the package and translates its contents from a low-level language form to a higher-level readable language form. The APK contains application data in the form of zipped Dalvik Executable (.dex) files. DEX files consist of the following components:

  • File Header
  • String Table
  • Class List
  • Field Table
  • Method Table
  • Class Definition Table
  • Field List
  • Method List
  • Code Header/Local Variable List

For further information on .dex files, refer to the official documentation.

Java Source Code: These are group of texts listing commands that can be compiled and are easily readable by a human. The human readable text of a java program is stored in a file usually in the same name as the class declared in it as the .java file extension. From there once you compile it,

javac file.java

It becomes a .class file, or a compiled java program that you can run on a JVM using:

java file.class

There are also tools to convert .class files to .jar files but for short the source code is in the .java files

So here is an example of a source code

# hello_world.java  
Class helloWorld {  
      Public static viod main(String[] args) {  
           System.out.println(“Hello World”);  
  }
}

And then you can compile this source code to byte code, and launch the program.

~> javac hello_world.java

So you’ve got (sort of) a program and you can run it

~> java hello_world
~> Hello World!

Java Byte Code: Byte code in java is an intermediate machine code which is independent, it is the code that lies between low-level and high-level language. This is also code that is generated by the compiler.

Byte code is not processed by the processor. It is processed by the Java Virtual Machine (JVM). The job of the JVM is to call all the required resources to compile the Java program and make the bytecode independent. It is the biggest reason why java is known as a platform-independent language. The intermediate code can run on any of the platforms such as Windows, macOS, and Linux.

Java virtual machine; The Java Virtual Machine is a program whose purpose is to execute other programs. The JVM has two primary functions: to allow Java programs to run on any device or operating system (known as the “Write once, run anywhere” principle), and to manage and optimize program memory. The most common interaction with a running JVM is to check the memory usage in the heap and stack.

Dalvik bytecode: The Dalvik bytecode consists of only one .dex file, which contains all classes of the application. The following figure shows the generation process of the .dex file. After the Java compiler creates the JVM bytecode, the Dalvik dx compiler deletes all .class files and recompiles them into Dalvik bytecode. Then dx merges them into one .dex file. This process includes the translation, reconstruction, and interpretation of the basic elements of the application (constant pool, class definition, and data segment). Constant pool describes all constants, including references, method names, and numeric constants. Class definition includes access flags, class names, and so on. data segment includes all function codes executed by the target VM, as well as related information about classes and functions (such as the number of registers used by the DVM, the list of local variables, and the size of the operand stack) and instance variables.

Dalvik Virtual Machine: was designed to support only the android operating system. It uses registers of the CPU to store the operands, so no requirement of pushing and popping of instructions hence making execution faster. The instructions operate on virtual registers, being those virtual registers memory positions in the host device. Register-based models are good at optimizing and running on low memory. Dalvik Virtual Machine uses its own byte-code and runs “.dex”(Dalvik Executable File) file.

Smali/Baksmali: smali/baksmali is an assembler/disassembler for the dex format used by dalvik, Android’s Java VM implementation. The syntax is loosely based on Jasmin’s/dedexer’s syntax, and supports the full functionality of the dex format.

We will go into codebase features of smali and hex codes on a latter update of this article. But for now without further a do let’s run some practicals on reverse engineering APK files.  

Practicals with Jadx-GUI:

There are far more aspects of reverse engineering than can be covered here, so this article will zero in on some of the most common issues we find in some production Android apps when we dig in with a decompiler. While there are many decompilers to choose from, I prefer jadx-gui because it’s a simple yet effective tool that has a user-friendly interface. We will use it to analyze the “InsecureBankv2.apk”, an open-source Android application that purposefully contains many vulnerabilities.

To download the APK click HERE.

Step 1 Analyzing the APK: Open the APK using jadx-gui and you will notice a menu on the left hand side of the screen containing dropdown sub-menus. This is common across all Android applications. Each dropdown menu is mostly self-explanatory. The “Source code” section contains the different files that make up the application’s code. The “com” usually contains the main part of the application code, including core functionalities, while other sections include supportive/complementary components such as libraries, frameworks and other resources utilized to create the application. The “Resources” section contains app assets, versions, certificates, properties, the Android Manifest, APK signature info, etc

jadx-GUI

Step 2 Exploring Android Manifest.xml: Let’s start by analyzing the app’s manifest, AndroidManifest.xml. This file is an important part of the application because it provides data about the application structure and metadata, its components and the requirements. This is done by establishing user permissions, app activities, intents, actions, etc. you can modify certain permissions here but note that this wont work most of the time if you don’t dig out functionalities in the codebase. Double clicking on the AndroidManifest.xml file in the left hand side menu opens the following file:

AndroidManifest.xml

Step 3 Searching for Hardcoded Values: Next, look for hardcoded credentials — that is, anything hardcoded in an app such as usernames, session tokens, secret keys, etc.. Attackers can often easily access this information and this app is no exception.

Hardcoded credentials can be found most anywhere within the app so we recommend using jadx’s search function to look for these values. I usually start by looking for hardcoded secret keys. These can be somewhat easy to recognize as they will usually be stored in an array of bytes. In this case, searching for the word ‘secret’ reveals a glimpse of what we might find in the app.

Be sure to go over every result especially when searching for a term like ‘secret’ because it likely reveals sensitive values. Navigating to com.android.InsecureBankv2.CryptoClass shows some valuable information, including an initialization vector for one of the utilized cryptography methods and the crypto key.

Cool! We have a bad crypto practice finding. Exploring the app will definitely yield many other related findings. However, we won’t go into too much further detail because this is an intentionally vulnerable app and we could spend a lot of time just playing around with terms and analyzing different classes for vulnerabilities. Other than searching for hardcoded data, you could also search for vulnerable functions being utilized, such as bad network functionalities, JavaScript interfaces, bad device memory management, lack of input validation on certain functions, and bad cryptography, to name a few.

There are a handful of professional tools used for analyzing APK files this shows the many functionalities of decompiling and working with APKs at this level. More articles on APK reverse engineering will be coming up but for now have fun with this tutorial as a form of practice before delving deeper.

References:

source.android.com/devices/tech/dalvik/dex-format

github.com/JesusFreke/smali

leave your comment


Your email address will not be published.

Uploading