Opening Pandora’s Box through ATFuzzer: Dynamic Analysis of AT Interface for Android Smartphones Imtiaz Karim Purdue University karim7@purdue.ede Fabrizio Cicala Syed Rafiul Hussain Purdue University ficiala@purdue.edu Omar Chowdhury Elisa Bertino University of Iowa omar-chowdhury@uiowa.edu ABSTRACT This paper focuses on checking the correctness and robustness of the AT command interface exposed by the cellular baseband processor through Bluetooth and USB. A device’s application processor uses this interface for issuing high-level commands (or, AT commands) to the baseband processor for performing cellular network operations (e.g., placing a phone call). Vulnerabilities in this interface can be leveraged by malicious Bluetooth peripherals to launch pernicious attacks including DoS and privacy attacks. To identify such vulnerabilities, we propose ATFuzzer that uses a grammarguided evolutionary fuzzing approach which mutates production rules of the AT command grammar instead of concrete AT commands. Empirical evaluation with ATFuzzer on 10 Android smartphones from 6 vendors revealed 4 invalid AT command grammars over Bluetooth and 13 over USB with implications ranging from DoS, downgrade of cellular protocol version (e.g., from 4G to 3G/2G) to severe privacy leaks. The vulnerabilities along with the invalid AT command grammars were responsibly disclosed to affected vendors. Among the vulnerabilities uncovered, 2 CVEs (CVE-201916400 and CVE-2019-16401) have already been assigned for the DoS and privacy leaks attacks, respectively. CCS CONCEPTS • Security and privacy → Mobile and wireless security; Distributed systems security; Denial-of-service attacks. KEYWORDS Android Smartphone Security and Privacy, Vulnerabilities, Attack ACM Reference Format: Imtiaz Karim, Fabrizio Cicala, Syed Rafiul Hussain, Omar Chowdhury, and Elisa Bertino. 2019. Opening Pandora’s Box through ATFuzzer: Dynamic Analysis of AT Interface for Android Smartphones. In 2019 Annual Computer Security Applications Conference (ACSAC ’19), December Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ACSAC ’19, December 9–13, 2019, San Juan, PR, USA © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-7628-0/19/12. . . $15.00 https://doi.org/10.1145/3359789.3359833 Purdue University hussain1@purdue.edu Purdue University bertino@purdue.edu 9–13, 2019, San Juan, PR, USA. ACM, New York, NY, USA, 15 pages. https: //doi.org/10.1145/3359789.3359833 1 INTRODUCTION Modern smartphones operate with two interconnected processing units—an application processor for general user applications and a baseband processor (also known as a cellular modem) for cellular connectivity. The application processor can issue ATtention commands (or, AT commands) [22] through the radio interface layer (RIL, also called AT interface) to interact with the baseband processor for performing different cellular network operations (e.g., placing a phone call). Most of the modern smartphones also accept AT commands once connected via Bluetooth or USB. Problem and scope. Since the AT interface is an entry point for accessing the baseband processor, any incorrect behavior in processing AT commands may cause unauthorized access to private information and inconsistent system states and crashes of the RIL daemon and the telephony stack. This paper thus focuses on the following research question: Is it possible to develop a systematic approach for analyzing the correctness and robustness of the baseband-related AT command execution process to uncover practically-realizable vulnerabilities? Incorrect execution of AT commands may manifest in one of the following forms: (1) Syntactic errors: the device accepts and processes syntactically invalid AT commands; and (2) Semantic violations: the device processes syntactically correct AT commands, but does not conform to the prescribed behavior. Successful exploitations of such invalid commands may enable malicious peripheral devices (e.g., a headset), connected to the smartphone over Bluetooth, to access phones’s sensitive information, such as, IMSI (International Mobile Subscriber Identity, unique to a subscriber) and IMEI (International Mobile Equipment Identity, unique to a device), or to downgrade the cellular protocol version or stop the cellular Internet, even when the peripheral is only allowed to access phone’s call and media audio. Prior efforts. The previous research [43, 44, 53] strives to identify the types of valid AT commands (i.e., commands with valid inputs/arguments conforming to 3GPP reference [1, 9, 11, 12, 52] or vendor-specific commands [13, 14, 18, 21] added for vendor customization) exposed through USB interfaces on modern smartphone platforms and the functionality they enable. Yet these studies have at least one of the following limitations: (A) The analyses [53] do not test the robustness of the AT interface in the face of invalid commands; (B) The analyses [53] only consider USB interface and thus leave the Bluetooth interface to the perils of both valid and invalid AT commands; and (C) The analyses [30, 43, 45, 49] are not general enough to be applicable to smartphones from different vendors. Challenges. Conceptually, one can approach our problem using one of the following two techniques: (1) static analysis; (2) dynamic analysis. As the source code of firmwares is not always available, a static analysis-based approach would have to operate on a binary level. The firmware binaries, when available, are often obfuscated and encrypted. Making such binaries amenable to static analysis requires substantial manual reverse engineering effort. To make matters worse, such manual efforts are often firmware-version specific and may not apply to other firmwares, even when they are from the same vendor. Using dynamic analysis-based approaches also often requires instrumenting the binary to obtain coverage information for guiding the search. Like static analysis, such instrumentation requires reverse engineering effort which again is not scalable. Also, during dynamic analysis, due to the separation of the two processors, it is often difficult to programmatically detect observable RIL crashes from the application processor. Finally, in many cases, undesired AT commands are blacklisted [2–4] and hence can cause rate-limiting by completely shutting down the AT interface. The only way to recover from such a situation is to reboot the test device which can substantially slow down the analysis process. Our approach. In this paper, we propose ATFuzzer which can test the correctness and robustness of the AT interface. One of the key objectives driving the design of ATFuzzer is discovering problematic input formats instead of just some misbehaving concrete AT commands. Towards this goal, ATFuzzer employs a grammar-guided evolutionary fuzzing-based approach. Unlike typical mutation-based [28, 35–37, 46, 59] and generation-based [20, 31, 54] fuzzers, ATFuzzer follows a different strategy. It mutates the production rules of the AT command grammars and uses sampled instances of the generated grammar to fuzz the test programs. Such an approach has the following two clear benefits. First, a production rule (resp., grammar) describing a valid AT command can be viewed as a symbolic representation for a set of concrete AT commands. Such a symbolic representation enables ATFuzzer to efficiently navigate the input search space by generating a diverse set of concrete AT command instances for testing. The diversity of fuzzed input instances is likely achieved because mutating a grammar can move the fuzzer to test a different syntactic class of inputs with high probability. Second, if ATFuzzer can find a problematic production rule whose sampled instances can regularly trigger an incorrect behavior, the production rule can then be used as a shred of possible abstract evidence which can contribute towards the identification of the underlying flaw causing the misbehavior. ATFuzzer takes grammars of AT commands as the seed. It then generates the initial population of grammars by mutating the seed grammars. For each generated grammar, ATFuzzer samples grammarcompliant random inputs and evaluate the fitness of each grammar based on our proposed fitness function values of the samples. Since code-coverage or subtle memory corruptions are not suitable to be used as the fitness function for such vendor-specific closed firmwares, we leverage the execution timing information of each AT command as a loose-indicator of code-coverage information. Based on the fitness score of each grammar, ATFuzzer selects the parent grammars for crossover operation. We design a grammaraware two-point crossover operation to generate a diverse set of valid and invalid grammars. After the crossover with a random probability, we incorporate three proposed mutation strategies to include randomness within the grammar itself. The intuition behind using both crossover and mutation operations is for testing the integrity of each command field as well as the command sequence. Findings. To evaluate the generality and effectiveness of our approach, we evaluated ATFuzzer on 10 Android smartphones (from 6 different vendors) with both Bluetooth and USB interfaces. ATFuzzer has been able to uncover a total of 4 erroneous AT grammars over Bluetooth and another 13 AT grammars over USB. Impacts of these errors range from complete disruption of cellular network connectivity to retrieval of sensitive personal information. We show practical attacks through Bluetooth that can downgrade or shutdown of Internet connectivity, and also enable illegitimate exposure of IMSI and IMEI when such impacts are not possible to achieve using valid AT commands. On top of that, the syntactically and semantically flawed AT commands over USB may also cause crashes, compound actions, and returns OK/ERROR while still getting processed. For instance, an invalid AT command ATDI in LG Nexus 5 induces the program to execute two valid AT commands— ATD (dial) and ATI (display IMEI), simultaneously. These anomalies add a new dimension to the attack surface when blacklisting or access control mechanisms are put in place to protect the devices from valid yet menacing AT commands. The vulnerabilities along with the invalid AT command grammars were responsibly disclosed to affected vendors. Among the vulnerabilities unearthed, 2 CVEs (CVE-2019-16400 [5] and CVE-2019-16401 [6]) have already been assigned. Contributions. The paper has the following contributions: (1) We propose ATFuzzer— an automated and systematic framework that leverages grammar-guided genetic programming for dynamic testing of the AT command interface in modern Android smartphones. We have made our framework opensource alongside the corpus of AT command grammars we tested. The tool and its detailed documentation are publicly available at https://github.com/Imtiazkarimik23/ATFuzzer (2) We show the effectiveness of our approach by uncovering 4 problematic AT grammars through Bluetooth and 13 problematic grammars through USB interface on 10 smartphones from 6 different vendors. (3) We demonstrate that all the anomalous behavior of the AT program exposed through Bluetooth are exploitable in practice by adversaries whereas the anomalous behavior of AT programs exposed through USB would be effectively exploitable even when valid but unsafe AT commands are blacklisted. The impact of these vulnerabilities ranges from private information exposure to persistent denial-of-service attacks. Host Machine/Bluetooth Peripherals Smartphone User Space Kernel  Space Bluetooth API Applications Native Daemons RILD /dev/ttyABC Modem Driver Smartphone Bluetooth Application JNI /dev/ttyACM* (Linux) COM* (Windows) Baseband Processor AT command Injector Bluetooth Hal Communication channel Interface Bluetooth Stack Rfcomm socket (a) USB RILD Modem Driver Application Level System Level Baseband Processor Vendor Configuration (b) Bluetooth Figure 1: AT Interface for Android Smartphones connected to a host machine through USB interface 2 BACKGROUND We now give a brief primer on AT commands. We then discuss how we obtain the list of a smartphone-supported AT commands and their respective grammars. 2.1 AT Commands Along with the AT commands defined by the cellular standards [11], vendors of cellular baseband processors and operating systems support vendor-specific AT commands [17, 18, 21] for testing and debugging purposes. Based on the functionality, different AT commands have different formats; differing in number and types of parameters. The following are the four primary uses of AT commands: (i) Get/read a parameter value, e.g., AT + CFUN? returns the current parameter setting of +CFUN which controls cellular functionalities; (ii) Set/write a parameter, e.g., AT + CFUN = 0 turns off (on) cellular connectivity (airplane mode); (iii) Execute an action, e.g., ATH causes the device to hang up the current phone call; (iv) Test for allowed parameters, e.g., AT + CFUN =? returns the allowed parameters for +CFUN command. Note that, +CFUN is a variable which can be instantiated with different functionality (e.g., +CFUN=1 refers to setting up the phone with full functionality). 2.2 AT Interfaces for Smartphones AT commands can be invoked by an application running on the smartphone, or from another host machine or peripheral device connected through the smartphone’s USB or Bluetooth interface (shown in Figure 1). While older generations of Android smartphones used to allow running AT commands from an installed application, recent Android devices have restricted this feature to prevent arbitrary applications from accessing device’s sensitive resources illegitimately through AT commands. Contrary to installed applications, nearly all Android phones allow executing AT commands over Bluetooth, whereas, for USB, devices require minimal configuration to set up to activate this feature. Android smartphones typically have different parsers for executing AT commands over these interfaces. 2.3 Issuing AT Commands Over Bluetooth and USB In this section, we present the details pertaining to issuing AT commands over Bluetooth and USB. 2.3.1 Bluetooth. For executing AT commands over Bluetooth, the injecting host machine/peripheral device needs to be paired with the Android smartphone. The Bluetooth on a smartphone may have multiple profiles (services), but only certain profiles e.g. handsfree profile (HFP), headset profile (HSP) supports AT commands. Figure 1 (right) shows the flow of AT command execution over Bluetooth. When a device is paired with the host machine, it establishes and authorizes a channel for data communication. After receiving an AT command, the system-level component of the Bluetooth stack recognizes the AT command with the prefix "AT" and compares it against a list of permitted commands (based on the connected Bluetooth profile). When the parsing is completed, the AT command is sent to the application-level component of the Bluetooth stack in user space where the Bluetooth API takes the action as per the AT command issued. Similar to the example through USB, if a baseband related command is invoked e.g.,ATD ;, the RILD is triggered to deliver the command to the baseband processor. Contrary to USB, only a subset of AT commands related to specific profiles is accepted/processed through Bluetooth. 2.3.2 USB. If a smartphone exposes its USB Abstract Control Model (ACM) interface [49], it creates a tty device such as /dev/ttyACM0 which enables the phone to receive AT commands over the USB interface. On the other hand, in phones for which the USB modem interface is not included in the default USB configuration, switching to alternative USB configuration [49] enables communication to the modem over USB. The modem interface appears as /dev/ttyACM* device node in Linux whereas it appears as a COM* port in Windows. Figure 1 (left) shows the execution path of an AT command over USB. When the AT command injector running on a host machine sends a command through /dev/ttyACM* or COM* to a smartphone, the ttyABC (ABC is a placeholder for actual name of the tty device) device in the smartphone receives the AT command and relays it to the native daemon in the Android userspace. The native daemon takes actions based on the type of command. If the command is related to baseband, e.g., ATD ;, the RILD (Radio Interface Layer Daemon) is triggered to deliver the command to the baseband processor which executes the command– makes a phone call to the number specified by . On the other hand, if the command is operating system-specific (e.g., Android, iOS, or Windows), such as +CKPD for tapping a key, the native daemon does not invoke RILD. 2.4 AT Commands and Their Grammars We obtain the list of valid AT commands and their grammars from the 3GPP standards [1, 8–12, 52]. Note that, not every standard AT commands are processed/recognized by all smartphones. This is because different smartphone vendors enforce different whitelisting and blacklisting policies for minimizing potential security risks. Also, vendors often implement several undocumented AT commands. Any problematic input instances that ATFuzzer finds, we check to see whether it is one of the vendor-specific, undocumented AT commands following the approach by Tian et al. [53]. We do not report as invalid the undocumented, vendor-specific AT commands that ATFuzzer discovers since they have already been documented [53]. We aim at finding malformed AT command sets that are due to the parsing errors in the AT parser itself. 3 OVERVIEW OF OUR APPROACH In this section, we first present the threat model and then formally define our problem statement. Finally, we provide a high-level overview of our proposed mechanism with a running example. 3.1 Threat Model For Bluetooth and USB AT interfaces exposed by modern smartphones, we define the following two different threat models. 3.1.1 Threat model for Bluetooth. For the threat model over Bluetooth, we assume a malicious/compromised Bluetooth peripheral device (e.g. headphones, speaker) is paired to an Android device over Bluetooth. We assume the malicious Bluetooth device is connected through its default profile. For instance, the victim smartphone which is connected to the malicious headphone has only given audio permissions to the headphone. Also, there can be the case when the adversary sets up a fake peripheral device through the man-in-the-middle (MitM) attacks exploiting known vulnerabilities of Bluetooth pairing and bonding [7, 39, 51] procedures. We do not assume the presence of malicious apps on the device. 3.1.2 Threat model for USB. For USB, we assume a malicious USB host, such as a PC or USB charging station controlled by an adversary, that tries to attack the connected Android phone via USB. We assume the attacker can get access to the exposed AT interface even if the device is inactive. We also neither require a malicious app to be installed on the device nor the device’s USB debugging to be turned on. 3.2 Problem Statement Let I be the set of finite strings over printable ASCII characters, R = {ok, error} denoting the parsing status, and A be a set of actions (e.g., phone-call ∈ A). The AT interface of a smartphone can be viewed as a function P from I to R × 2(A∪{nop,⊥}) , that is, P : I → R × 2(A∪{nop,⊥}) in which nop refers to no operation whereas ⊥ captures undefined behavior including a crash. nop is used to capture the behavior of P ignoring an AT command, possibly, due to blacklisting or parsing errors. Given the smartphone AT interface under test PTest and a reference AT interface induced by the standard PRef , we aim to identify concrete vulnerable AT command instances s ∈ I such that PTest and PRef do not agree on their response for s, that is, PRef (s) PTest (s). Given pairs ⟨r 1 , a 1 ⟩, ⟨r 2 , a 2 ⟩ ∈ R × 2(A∪{nop,⊥}) , we write ⟨r 1 , a 1 ⟩ = ⟨r 2 , a 2 ⟩ if and only if r 1 = r 2 and a 1 = a 2 . Note that, a 1 and a 2 are both sets of actions as one command can mistakenly trigger multiple actions. Note that, there can be a reason PRef and PTest can legitimately disagree on a specific input AT command s ∈ I as s can be blacklisted by PTest . Due to CVE-2016-4030 [2], CVE-2016-4031 [3], and CVE-2016-4032 [4], Samsung has locked down the exposed AT interface over USB with a command whitelist for some phones. In this case, we do not consider s to be a vulnerable input instance. Precisely, when s is a blacklisted command, we observed that PTest often returns ⟨ok, nop⟩. Finally, we instantiate the oracle PRef through manual inspection of the standard. 3.3 Running Example To explain ATFuzzer’s approach, we now provide a partial, example context-free grammar (CFG) of a small set of AT commands (see Figure 3 for the grammar and Figure 2 for the partial Abstract Syntax Tree (AST) of the grammar) which we adopted from the original 3GPP suggested grammar [11, 52]. In our presentation, we use the bold-faced font to represent non-terminals and regular-faced font to identify terminals. We use “·” to represent explicit concatenation, especially, to make the separation of terminals and non-terminals in a production rule clear. We use [. . . ] to define regular expressions in grammar production rules and [. . . ]∗ to represent the Kleene star operation on a regular expression denoted by [. . . ]. In our example, Dnum can take as an argument any alphanumeric string up to length n. Our production rules are of the form: s → α · B1 {ϕ} where α denotes a terminal, B1 represents a non-terminal, and ϕ represents a condition that imposes additional well-formedness restrictions on the production. In the above example, we show the correct AT command format for making a phone call. Examples of valid inputs generated from this grammar can be— ATD ∗ 752# + 436644101453; 3.4 Overview of ATFuzzer In this section, we first touch on the technical challenges that ATFuzzer faces and how we address them. We conclude by providing the high-level operational view of ATFuzzer. 3.4.1 Challenges. For effectively navigating the input search space and finding vulnerable AT commands, ATFuzzer has to address the following four technical challenges. C1: (Problematic input representation). The first challenge is to efficiently encode the pattern of problematic inputs. It is crucial as the problematic AT commands that have similar formats/structures but are not identical may trigger the same behavior. For instance, both ATD123 and ATD1111111111 test inputs are problematic (neither of them is a compliant AT command due to missing a trailing semicolon) and have a similar structure (i.e., ATD followed by a phone number), but are not the same concrete test inputs. While processing these problematic AT commands, one of our test devices, however, stops cellular connectivity. Mutation in the concrete input level will require the fuzzer to try a lot of inputs of the same vulnerable structure before shying away from that abstract input space. This may limit the fuzzer from testing diverse classes of inputs. C2: (Syntactic correctness). As shown in Figure 3, most of the AT commands have a specific number and type of arguments, e.g., +CFUN= has two arguments: CFUNarg1 and CFUNarg2. The second challenge is to effectively test this structural behavior and type, thoroughly by generating diverse inputs that do not comply with the command structure or the argument types. C3: (Semantic correctness). Each argument of an AT command may have associated conditions. For instance, Lenдth(Dnum) ≤ n in the fifth production rule of Figure 3. Also, arguments may correlate with each other, such as, one argument defines a type on which another argument is dependent. For instance, +CTFR= refers to a service that causes an incoming alert call to be forwarded to a specified number. It takes four arguments— the first two of them are number and type. Interestingly, the second argument defines the format of the number given as the first argument. If the dialing command cmd AT cmd ... ... ... ... ; dgrammar +CFUN? cfungrammar ... ... cmd cmd_AT D Dnum Darg +CFUN = CFUNarg1 CFUNarg2 I G [0-9]* [0-1] [a-zA-Z0-9+*#]* Figure 2: Partial Abstract Syntax Tree(AST) of the reference grammar (Grey-box denotes non-terminal symbols and white box indicates terminal symbols) command cmd cmd cmd_AT dgrammar cfungrammar cmd Dnum Darg CFUNarg1 CFUNarg2 number1 number2 number type subaddr satype → AT · cmd → dgrammar cfungrammar → ϵ cmd_AT → cmd ; cmd → D · Dnum · Darg ; → +CFUN? +CFUN =CFUNarg1, CFUNarg2 → +CTFR = number, type, subaddr, satype → [a − zA − Z 0 − 9 + ∗#]∗ {Lenдt h(Dnum) ≤ n } →I G ϵ → [0 − 9]∗ {CFUNarg1 ∈ Z and 0 ≤ CFUNarg1 ≤ 127} → [0 − 1] {CFUNarg2 ∈ Z and 0 ≤ CFUNarg2 ≤ 1} → number1 number2 → [a − zA − Z 0 − 9 + ∗#]∗ {if type = 145} → [a − zA − Z 0 − 9 ∗ #]∗ {if type = 129} → 145 129 → [a − zA − Z 0 − 9 + ∗#]∗ → [0 − 9]∗ {if satype = ϵ, satype = 128} . . . Figure 3: Partial reference context-free grammar for AT commands. string includes access code character “+”, then the type should be 145, otherwise, it should be 129. These correlations are prevalent in many AT commands. Hence, the third challenge is to systematically test conditions associated with the arguments of commands to cover both syntactical and semantic bugs. C4: (Feedback of a test input). The AT interface can be viewed as a black-box providing only limited output of the form: OK (i.e., correctly parsed) or ERROR (i.e., parsing error). The final challenge is to devise a mechanism that can provide information about the code-coverage of the AT interface for the injected test AT command and thus effectively guides us through the fuzzing process. 3.4.2 Insights on addressing challenges. For addressing C1, we use the grammar itself as the seed of our evolutionary fuzzing framework rather than using a particular instance (i.e., a concrete test input) of the grammar. This is highly effective as the mutation of a production rule can influence the fuzzer to test a diverse set of inputs. Also, when a problematic grammar is identified, it can serve as abstract evidence of the underlying flaw in the AT interface. Finally, as grammar can be viewed as a symbolic representation of concrete input AT commands, mutating a grammar can enable the fuzzer to cover large diverse classes of AT commands. The insight here is that testing diverse input classes are likely to uncover diverse types of issues. To address challenges C2 and C3, at each iteration, ATFuzzer chooses parents with the highest fitness scores and switches parts of the grammar production rules among each other. This causes changes to not only the structural and type information in the child grammars but also forms two very different grammars that try to break the correlation of the arguments. For instance, suppose that the ATFuzzer has selected following two production rules from two different parent grammars: +CFUN = CFUNarg1, CFUNarg2 and +CTFR = number, type, subaddr, satype. After applying our proposed grammar crossover mechanisms, the resultant child grammar production rules are: +CFUN = CFUNarg1, type, subaddr, satype and +CTFR = number CFUNarg2. The production rule +CFUN takes only two arguments whereas our new child grammar creates a production rule that has four arguments. The same reasoning also applies for +CTFR. Thus the new grammars with modified production rules would test this structural behavior precisely. Furthermore, +CTFR’s first argument number is correlated with its second argument type. In the modified child grammars, type, however, has been replaced with CFUNarg2. Recall from our grammar definition, type takes argument from the set {145, 129} whereas +CFUNarg2 takes argument from the set {0, 1}. Therefore, this single operation completes two tasks at once— it not only tests the correlation among two arguments of +CTFR but also tests conditions of both +CFUN and +CTFR. Crossing over grammar production rules creates a higher change in the input format and it aims to explore the diverse portions of the input space to create highly unusual inputs. To test both the structural aspects, we use three very different mutation strategies which create little change to the grammar (compared to crossover) but prove highly effective for checking the conditions associated with commands. For addressing C4, we use the precise timing information of injecting an AT command and receiving output. We keep an upper bound on this time, i.e., a timer (T ). If the output is not received within T , we suspect the AT interface has become unresponsive possibly due to the blacklisting mechanism enforced by several vendors. We use this timing information as a loose-indicator for the code-coverage information. Our intuition is to explore as much of the AT interface as possible. Higher timing loosely indicates that the test command traverses more basic-blocks than the other inputs with lower timing. We try to leverage this simple positive correlation to design a feedback edge (i.e., a fitness function) of the closed-loop. The timing information, however, cannot help to infer how many new basic-blocks a test input could explore. Since our focus is mainly on baseband related AT commands, an error in the AT interface has a higher probability of causing disruptions in the baseband which also trickles down to cellular connectivity. Figure 4: Overview of ATFuzzer framework We leverage this key insight and consider both the cellular Internet connectivity information from the target device and the device’s debug information (Logcat) as an indication of the baseband health after running an AT command. Using this information, we devise our fitness function for guiding ATFuzzer. 3.4.3 High-level description of ATFuzzer. ATFuzzer comprises of two modules, namely, evolution module and evaluation module, interacting in a closed-loop fashion (see Figure 4). The evolution module is bootstrapped with a seed AT command grammar which is mutated to generate Psize (refers to population size, a parameter to ATFuzzer), different versions of that grammar. Concretely, new grammars are generated from parent grammar(s) by ATFuzzer through the following high-level operations: (1) Population initialization; (2) Parent selection; (3) Grammar crossover; (4) Grammar mutation. Particularly relevant is the operation of parent selection in which ATFuzzer uses a fitness function to select higher-ranked (parent) grammars for which to apply the crossover/mutation operations (i.e., steps 3 and 4) to generate new grammars. Choosing the higher-ranked grammars to apply mutation is particularly relevant for generating effective grammars in the future. Evaluating fitness function requires the evaluation module. For a given grammar д, evaluation module samples several д-compliant commands to test. It uses the AT command injector (as shown in Figures 1 and 4) to send these test commands to the device-undertest. The fitness function uses the individual scores of the concrete д-compliant instances to assign the score to д. condition is met, such as, total time of testing or the number of iterations. Algorithm 1 describes the high-level steps of ATFuzzer’s evolution module. 4.1.1 Initialization. The evolution module starts with initializing the population P (Line 1 in Algorithm 1) by applying both our proposed crossover and mutation strategies with three parameters: the population size Psize ; the probability Ppop of applying crossover and mutation on the grammar; the tournament size Tsize . The key-insight of using Ppop is that it correlates with the number of syntactic and semantic bugs explored. The higher the value of Ppop is, the diverse the initial population is and vice versa. The diverse the initial population is, the higher the number of test inputs that check syntactic correctness is and vice versa. Therefore, to explore both syntactic and semantic bugs, we vary the values of Ppop aiming to strike a balance between grammar diversity. To assess the fitness, the evolution module invokes the evaluation module (Line 3-8) with the generated grammars. Algorithm 1: ATFuzzer Data: Psize , Ppop , GAT , Tsize Result: Gbest : Best Grammar 1 P ← InitializePopulation(Psize, Ppop, GAT ); 2 3 4 while stopping condition is not met do for each grammar Gi ∈ P do Generate random input I AssesFitness(Gi, I) if Fitness(Gi ) > Fitness(Gbest ) then Gbest = Gi ; 5 6 7 end 8 end 9 Q = {} 10 4 ATFuzzer: DETAILED DESIGN In this section, we discuss our proposed crossover and mutation techniques for the evolution module followed by the fitness function design used by the evaluation module. Pa ← ParentSelection(P, Tsize ) Pb ← ParentSelection(P, Tsize ) Ca, Cb ← GrammarBasedCrossover(Pa, Pb ) Q = Q ∪ {Mutate(Ca ), Mutate(Cb )} 13 14 15 end 16 P←Q 17 18 4.1 P for size 2 times do 11 12 end Evolution Module Given the AT grammar (shown in Figure 3), ATFuzzer’s evolution module randomly selects at most n cmds to generate the initial seed AT grammar denoted as GAT . The evolution module yields the grammars Gbest with the highest scores until a certain stopping 4.1.2 Parent selection for the next round. We use the tournament selection technique to get a diverse population at every round. We perform “tournaments” among P grammars (Line 12-13 in Algorithm 1). The winner of each tournament (the one with the highest fitness score) is then selected for crossover and mutation. In what follows, we discuss in detail our tournament selection technique addressing the functional and structural bloating problems of evolutionary fuzzers [54]. Restraining functional bloating. We leverage another insight in selecting grammars at each round of the tournament selection procedure to reduce functional bloating [54]— the continuous generation of grammars containing similar mutated production rules— which adversely affects diverse input generation in evolutionary fuzzing. At each round, we randomly select grammars from our population. This is due to the fact that while running an evolutionary fuzzing, the range of fitness values becomes narrow and reduces the search space it focuses on. For example, at any round, if the fuzzer finds a grammar that has a mutated production rule related to +CFUN causing an error state in the AT interface, then all the grammars containing this mutated rule will obtain high fitness values. If we then only select parents based on the highest fitness, we would inevitably fall into functional bloating and would narrow down our focused search space with grammars that are somehow associated with this mutated version of +CFUN only. To constraint this behavior, we perform the tournament selection procedure in which we randomly choose Tsize (where T denotes the set of selected grammars for the tournament and Tsize ≤ Psize ) number of grammars from the population P. The key insight of choosing randomly is to give chances to the lower fitness grammars in the next round to ensure a diverse pool of candidates with both higher and lower fitness scores. Restraining structural bloating. After running ATFuzzer for a while, i.e., after a certain number of generations, the average length of individual grammar grows rapidly. This behavior is characterized as structural bloating. Referring to the AT grammar in Figure 3, multiple cmds (production rules) can contribute to generating the final commands that are sent to the AT command injector for evaluation. These commands can grow indefinitely, but do not induce any structural changes, and thus cause structural bloating. These input commands, therefore, hardly contribute to the effectiveness of the fuzzer. To limit this behavior, we restrict the grammar to have at most three cmds at each round to generate the input AT commands for testing. c1 c2 +CFUN = CFUNarg1 , CFUNarg2 +CTFR = number, type ,subaddr, satype c1 c2 +CFUN = number , CFUNarg2 +CTFR = CFUNarg1, type ,subaddr, satype Figure 5: Examples of one-point and two-point grammar cossover mechanisms. 4.1.3 Grammar crossover. In the grammar crossover stage, ATFuzzer strives to induce changes in the grammar aiming to systematically break the correlation and structure of the grammar. For this, we take inspiration from traditional genetic programming and apply our custom two-point crossover technique among the grammars. Two-point crossover. ATFuzzer picks up two random production rules from the given parent grammars and generates two random numbers c 1 and c 2 within ℓ where ℓ is the minimum length between the two production rules. ATFuzzer then swaps the fields of the two production rules that are between points c 1 and c 2 . Figure 5 shows how ATFuzzer performs the two-point crossover operation on production rules +CTFN = and +CTFR = (a subset of the AT grammar in Figure 3) used for controlling the cellular functionalities and for urgent call forwarding, respectively. By applying two-point crossover on +CFUN = CFUNarg1, CFUNarg2 and +CTFR = number, type, subaddr, satype, ATFuzzer generates +CFUN = number, CFUNarg2 and +CTFR = CFUNarg1, type, subaaddr, satype which in turn contribute in generating versatile inputs. Algorithm 2: Two-Point Grammar Crossover Data: ParentGrammar Pa , ParentGrammar Pb Result: Pa ,Pb 1 Randomly pick production rule R a from P a and R b from Pb 2 3 4 5 6 7 R a ← R a1 , R a2 , ..., R al Rb ← Rb1 , Rb2 , ..., Rbm c 1 ← random integer chosen from 1 to min(l, m) c 2 ← random integer chosen from 1 to min(l, m) if c > d then swap c 1 and c 2 end for i from c to d − 1 do swap grammar rules of R ai , Rb i 11 end 8 9 10 +CTFR = number, subaddr,satype Trimming a production rule +CTFR= number, type, subaddr, satype +CTFR = number type, subaddr,satype Negating a production rule +CTFR = number,type, satype, subaddr,satype Adding a production rule Figure 6: Example of three grammar mutation strategies. 4.1.4 Grammar mutation. During crossover operation, ATFuzzer constructs grammars that may have diverse structures which are, however, not enough to test the constraints and correlations associated with a command and its arguments. This is due to the fact that AT commands have constraints not only with the fields but also with the commands itself. Therefore, generating versatile grammars that can generate such test inputs is an important aspect of ATFuzzer design. To deal with this pattern, we propose three mutation strategies— addition, trimming, and negation. We use AT + CTFR = number, type, subaddr, satype (one of the example grammars presented in Figure 3) to illustrate these mutation strategies with examples shown in Figure 6. Addition. With our first strategy we randomly insert/add a field chosen from the production rule of the given grammar at a random location. For instance, applying this mutation strategy (shown in Figure 6) to one of the grammars +CTFR = number, type, subaddr, satype yields +CTFR = number, type, satype, subaddr, satype containing an additional argument added after the second argument of the actual grammar. The mutation also has changed the type (string) of third argument (subaddr) of the actual grammar to integer (satype) in the new grammar. ATFuzzer, thereby, tests the type correctness along with the structure of the grammar. Trimming. Our second mutation strategy is to randomly trim an argument from a production rule for the given grammar. Referring to Figure 6, applying this to our example grammar for +CTFR, we obtain a production rule AT + CTFR = number, subaddr, satype which also deviates from the original grammar with respect to both the structure and type. Negation. Our last mutation strategy focuses on the constraints associated with the arguments of a command. Referring to the AT grammar in Figure 3, we encode the constraints with additional conditions (denoted with {. . . }) in the grammar production rules. With the negation strategy, we randomly pick a production rule of the grammar and choose a random argument that has a condition associated with it. We negate the condition which we use to replace the original one at its original place in the production rule. Figure 6 demonstrates how we negate the production rule associated with number used to represent a phone number. The number is a string type with a constraint on its length. We negate this condition with the following three heuristics: (i) Generating strings that are longer than the specified length; (ii) Generating strings that contain not only alphanumeric characters but also special characters; and (iii) Generating an empty string. Algorithm 3: Grammar Mutation Data: Grammar Ga , Tunable parameters : P α , P β , Pγ Result: Mutated Ga 1 Randomly pick production rule R a from G a 2 3 4 5 6 7 8 9 10 11 12 13 14 R a ← R a1 , R a2 , ..., R al c ← random integer chosen from 1 to l P ← Generate random probability from (0, 1) if P α ≥ P then trim argument R ac from production rule R a end if P β ≥ P then replace Pc {ϕ } with Pc {¬ϕ }in production rule R a end if Pγ ≥ P then d ← random integer chosen from 1 to l add argument P ac at position l in production rule R a end 4.2 Evaluation Module The primary task of the evaluation module is to generate a number of test inputs (i.e., concrete AT command instances) for the grammar received from the evolution module. It then evaluates the test inputs with the AT command injector, and finally evaluate the grammar itself based on the scores of the generated test inputs. To what follows we explain how the evaluation module calculates the fitness score of a grammar. 4.2.1 Fitness evaluation. At the core of ATFuzzer is the fitness function that guides the fuzzing and acts as a liaison for the coverage information. We devise our fitness function based on the timing information and baseband related information of the smartphone. Our fitness function comprises of two parts: (1) Fitness score of the test inputs generated from a grammar; (2) Fitness score of the grammar in the population. Fitness score of the test inputs of a grammar. The fitness evaluator of ATFuzzer generates N inputs from each grammar and calculates the score for each input. We define this fitness function for an input AT command instance x as: fitness(x) = α × timingscore + (1 − α) × disruptionscore where α is a tunable-parameter that controls the impact of timingscore and disruptionscore . Let t x be the time required for executing an AT command x (0 ≤ x < N ) on the smartphone under test. Execution time of an AT command is defined as the time between when the AT command is sent and when the output is received by the AT command injector. Note that, we normalize the execution time with input length. Let t 1 , t 2 , ..., t N be the time for executing N AT commands, we define the timing score for instance x in a population of size N as ti follows: timingscore = . t1 + t2 + .... + tN Note that while running AT commands over Bluetooth, the commands and their responses are transmitted in over-the-air (OTA) Bluetooth packets. To compute the precise execution time of the AT command on the smartphone, we take off the transmission and reception times from the total running time. Also, to make sure Bluetooth signal strength change does not interfere with the timing information, our system keeps track of the RSSI (Received Signal Strength Indication) value and carries out the fuzzing at a constant RSSI value. We define disruptionscore based on the following four types of disruption events: (i) Complete shutdown of SIM card connectivity; (ii) Complete shutdown of cellular Internet connectivity; (iii) Partial disruption in cellular Internet connectivity; (iv) Partial disruption of SIM card connectivity with the phone. For cases (i) and (ii), complete shutdown causes denial of cellular/SIM functionality, recovery from which requires rebooting the device. ATFuzzer uses adb reboot command which takes ∼ 15 − 20 seconds to restart the device without entailing any manual intervention. On the contrary, partial shutdown for the cases (iii) and (iv) induce denial of cellular/SIM functionality for ∼ 3 − 5 seconds and thus does not call for rebooting the device to recuperate back to its normal state. These events are detected and monitored using the open-source tools available to us from Android, e.g., logcat, dumpsys, and tombstone. While injecting the AT commands we use these tools to detect the events on run time. We take into account if there is a crash in the baseband or the RIL daemon. We assign a score between 0 − 1 to a disruption event in which 0 denotes no disruption at all (i.e., the device is completely functional) with no adverse effects and 1 denotes complete disruption of cellular or SIM card connectivities. Fitness score of a grammar. After computing the fitness scores for all the concrete input instances, we calculate the grammar’s score by taking the average of all instance scores. 5 EVALUATION Our primary goal in this section is to evaluate the effectiveness of ATFuzzer by following the best possible practices [26, 55] and guidelines [34]. We, therefore, first discuss the experiment setup and evaluation criteria, and then evaluate the efficacy of our prototype against the widely used AFL [59] fuzzer— customized for our context. 5.1 Experiment Setup ATFuzzer setup. We implemented ATFuzzer with ∼4000 lines of Python code. We encoded the grammars (with JSON) for a corpus of 90 baseband-related AT commands following the specification in the 3GPP [11] documentation and extracting some of the vendorspecific AT commands following the work of Tian et al. [53]. During its initialization, ATFuzzer receives as input the name of the AT command, retrieves the corresponding grammar that will be used as the seed (GAT in algorithm 1) from the file, generates the initial grammar population and realizes the proposed crossover and mutation strategies. Hence, our approach is general and easily adaptable to other structured inputs, since it is not bound to any specific grammar structure. Since testing a concrete AT command instance requires 15-20 seconds on average (because of checking the cellular and SIM card connectivity after executing a command and for rebooting the device in case of AT interface’s unresponsiveness for blacklisting), we set Psize to 10 which we found through empirical study to be the most suitable in terms of ATFuzzer’s stopping condition. Following the same procedure, we test 10 concrete AT commands in each round for a given grammar. We set the probability Ppop to 0.5 to ensure uniform distribution in the grammar varying ratio. Conceptually, one can argue about testing at a “batch” mode to chop the average time for fuzzing a AT grammar. For instance, injecting 10 AT commands together and then checking the cellular and SIM connectivity at once. Though This design philosophy is intuitive, but fails to serve our purpose due to the following fact. Though it may be able to detect permanent disruptions, it is unable to detect temporary disruptions to cellular or SIM connectivity. For instance, even if the second AT command in the batch induces a temporary disruption, there will be no trace of disruptions at all by the time when the tenth (i.e., the last) AT command will be executed. Target devices. We tested 10 different devices (listed in Table 1) from 6 different vendors and with 6 different android versions to diversify our analysis. For Bluetooth, we do not require any configuration on the phone. For running AT commands over USB some phones require additional configuration. For additional details, see Appendix A.1. Device Android Version Build Number Baseband Vendor Baseband USB Config OS Interface Samsung Note2 4.3 None Linux 4.3 Samsung Exynos 4412 Samsung Exynos 4412 Qualcomm Snapdragon 801 N7100DD UFND1 Samsung Galaxy S3 LG G3 JSS15J. I9300XU GND5 JSS15J. I9300XX UGND5 MRA58K I9300XX UGNA8 None Linux None Linux HTC Desire 10 lifestyle 6.0.1 Qualcomm Snapdragon 400 5.1.1 sys.usb.config mtp,adb,diag, modem, modem_mdm, diag_mdm sys.usb.config diag,adb Windows Bluetooth and USB LG Nexus 5 1.00.600.1 8.0_g CL800193 releasekeys LMY48I MPSS.DI.2.0.1. c1.13-00114 -M8974AA AAANPZM1.43646.2 3.0.U205591 @60906G_01. 00.U0000. 00_F Bluetooth and USB Bluetooth and USB Bluetooth and USB Motorola Nexus 6 6.0.1 MDM9625_ 104662.22. 05.34R fastboot oem bptools-on Windows Bluetooth and USB Huawei Nexus 6P 6.0 .2.6.1.c400004-M899 4FAAAAN AZM-1 G955US QU5CRG3 fastboot oem enable-bp-tools Windows Bluetooth and USB None Linux Bluetooth and USB 22.126.12.00.00 None None Bluetooth g8998-001221708231715 None None Bluetooth 6.0 Samsung Galaxy S8+ 8.0.0 Huawei P8 Lite ALE-L21 Pixel 2 5.0.1 8.0.0 Qualcomm Snapdragon 800 MOB30M Qualcomm Snapdragon 805 MDA89D Qualcomm Snapdragon 810 R16NW.G95 Qualcomm 5USQU5CRG3 Snapdragon 835 ALEHiSilicon L21CO2B140 Kirin 620 (28 nm) OPD3.1708 Qualcomm 16.012 MSM8998 Snapdragon 835 M8974A2.0.50.2.26 Linux Bluetooth and USB Table 1: List of the devices we tested, with software information, USB configuration required and the operating system we used to fuzz each device. 5.2 Evaluation Criteria ATFuzzer has three major components— grammar crossover, mutation and feedback loop— to effectively test a target device. We, therefore, aim to answer the following research questions to evaluate ATFuzzer: • RQ1: How is the bug-finding capability of ATFuzzer over Bluetooth? • RQ2: How is the bug-finding capability of ATFuzzer over USB? • RQ3: How effective is our grammar-aware crossover? • RQ4: How effective is our grammar-aware mutation? • RQ5: When using grammars, how much does the use of timing feedback increase fuzzing performance? • RQ6: Is ATFuzzer more efficient than other state-of-the-art fuzzers for testing AT interface? To tackle RQ1-RQ2, we let our ATFuzzer run over USB and Bluetooth each for one month to test 10 different smartphones listed in Table 1. ATFuzzer has been able to uncover a total of 4 erroneous AT grammars inducing a crash, downgrade and information leakage over Bluetooth and 13 erroneous AT grammars over USB. Based on the type of actions and responses to the problematic AT command instances, we initially categorize our results as syntactic and semantic problematic AT grammars, and further categorize the syntactically problematic grammars into three separate classes— (i) responds ok with composite actions; (ii) responds ok with an action; and (iii) responds error with an action. Here, an action can be either crash (i.e., any disruption event defined in Section 4), leakage of any sensitive information, or executing a command, e.g., hanging up a phone call. We summarize ATFuzzer’s findings for Bluetooth in Table 2 and for USB in Table 3. To answer the research questions RQ3-RQ5, we evaluate ATFuzzer by disabling one of its components at a time. We create three new instances of ATFuzzer— ATFuzzer without crossover, ATFuzzer without mutation, and ATFuzzer without fitness evaluation. To what follows we evaluate these three variants with the AT grammar (in Figure 3) and compare their efficacy of discovering bugs against original ATFuzzer. Moreover, to answer the research question RQ5, we create our variation of AFL (American Fuzzy Lop). To perform a fair comparison, we run all our experiments on Nexus5 for each variations of ATFuzzer and our version of AFL each for 3 days. 5.3 Findings Over Bluetooth (RQ1) Unlike USB, Bluetooth does not require any pre-processing or configuration to the phone to execute AT commands. Besides this, over-the-air Bluetooth communications are inherently vulnerable to MitM attacks [7, 39, 51]. All these enable the adversary to readily exploit the vulnerabilities over Bluetooth with sophisticated attacks. 5.3.1 Results. We first discuss the results that relate to invalid AT commands and then we discuss the attacks and impacts of both invalid and valid AT commands. (1) Syntactic errors – responds ok with actions. ATFuzzer uncovered four problematic grammars in these categories in seven different Android smartphones. We observer that the target device Class of Bugs Syntatctic – returns OK with action Correctly formatted command Grammar and Command Instance action/implication cmd →D.Dnum.Darg1.Darg2 Dnum →[A − Z 0 − 9 + #]∗ Darg1 →I G ϵ Darg2 →;. Darg3 Darg3 →[A, B, C]+ E.g. ATD + 4642048034I; AB; C cmd →D.Dnum.Darg1.Darg2 Dnum →[A − Z 0 − 9 + #]∗ Darg1 →I G ϵ Darg2 →;. Darg3 Darg3 →[A, B, C]+ E.g. ATD + 4642048034I; AB; C cmd →+CIMI.Arg1 Arg1 →[a − zA − Z 0 − 9 + #]∗ E.g. AT + CIMI; ; ; ; abc cmd →+CGSN.Arg1 Arg1 →[a − zA − Z 0 − 9 + #]∗ E.g. AT + CGSN; ; ; ; abc## cmd →+CIND? crash/internet connectivity disruption Nexus5 LG G3 cmd →Arg.D.Dnum.Darg; Arg →[a − zA − Z ] Dnum →[a − zA − Z 0 − 9 + ∗#]∗ Darg →I G ϵ E.g. ATD ∗ ∗61 ∗ +1812555673 ∗ 11 ∗ 25#; Nexus6P HTC S8plus S3 Note2 Huawei P8lite ✓ Pixel 2 ✓ crash/downgrade cmd →+CHUP Nexus6 ✓ ✓ read/IMSI leak ✓ ✓ ✓ read/IMEI leak ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ read/leaks call status, call setup stage, internet service status, signal strength, current roaming status, battery level, call held status execution (cutting phone calls)/ DoS execution/ call forwarding, activating do not disturb mode, activating selective call blocking ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Table 2: Summary of ATFuzzer’s Bluetooth parser findings. responds to the invalid AT command and also performs an action. For instance, ATFuzzer found a specific variant of ATD grammar ATDA; A; B in Nexus5 which is syntactically incorrect, but returns OK and make the cellular Internet connectivity temporarily unavailable. Beside this, the concrete instances of the same grammar also downgrade the cellular connectivity from 4G to the 3G/2G in Nexus6 and Nexus6P smartphones thus entails severe security and privacy impacts. 5.3.2 Attacks with invalid AT commands. We now present three practical attacks that can be carried out using the invalid grammars uncovered through ATFuzzer. Denial of service. The adversary using a malicious Bluetooth peripheral device (e.g., Bluetooth headphone with only call audio and media permissions) or a MitM instance may exploit the invalid AT command, e.g., ATDB; A; B and temporarily disrupt the Internet connectivity of the Pixel 2 and Nexus 5 phones. To cause long term disruptions in Internet connectivity, the adversary may inject this command intermittently and thus prevent the user from accessing the Internet. Note that there is no other valid AT command that controls the Internet connectivity over Bluetooth and thus it is not possible to achieve this impact using a valid AT command. Downgrade. The same invalid grammar (shown in table 2) exploited in the previous DoS attack in Nexus 5 phone can also be exploited to downgrade the cellular connectivity on Nexus6 and Nexus6P phones. Similar to previous DoS attack, such downgrade of cellular connectivity is not possible with any valid AT commands running over Bluetooth. Downgrade (also known as bidding-down) attacks have catastrophic implications as they open the avenue to perform over-the-air man-in-the-middle attacks in cellular networks [25, 40]. IMSI & IMEI catching. ATFuzzer uncovered the invalid variations (AT + CIMI; ; ; ; ; abc and AT + CGSN123df) of two valid AT commands (+CIMI and +CGSN) which enable the adversary to illegitimately capture the IMSI and IMEI of a victim device over Bluetooth. Exploiting this, any Bluetooth peripheral connected to the smartphone can surreptitiously steal such important personal information of the device and the network. We have successfully validated this attack in Samsung Galaxy S3, Samsung Note 2 and Samsung Galaxy S8+. One thing to be noted here, after manual testing we found out the valid versions of these two commands also leak IMSI and IMEI. We argue that even if there is a blacklist/firewall policy put into place to stop the leakage through valid AT commands, yet it will not be sufficient because it will leave the scope to use the invalid versions of the command (that ATFuzzer uncovered) to expose this sensitive information. The impact of this attack is particularly more fatal than that of the previous two attacks. This is because the illegitimate exposure of IMSI and IMEI through Bluetooth provides an edge to the adversary to further track the location of the user or intercept phone calls and SMS using fake base stations [32, 33] or MitM relays [50]. Samsung has already acknowledged the vulnerabilities and is working on issuing patches to the affected devices. We also summarize the findings of ATFuzzer in Table 2. CVE-2019-16401 [6] has been assigned to this vulnerability along with other sensitive information leakage for the affected Samsung devices. 5.3.3 Attacks with valid AT commands. We summarize ATFuzzer’s other findings in which we demonstrate that the exposed AT interface over Bluetooth allows the adversary to run valid AT commands to attain malicious goals that may negatively affect a device’s expected operations. The results are particularly interesting as Bluetooth interface has yet not been systematically examined. Information leak. The adversary can use a valid AT command to learn the whole set of private information about the phone. The malicious Bluetooth peripheral device can get the call status, call setup state, Internet service status, the signal strength of the cellular network, current roaming status, battery level, and call hold status for the phone using this valid AT commands. DoS attacks. A malicious peripheral can exploit the AT + CHUP command to prevent the victim device to receive any incoming phone call. From the previous information leakage (e.g., call status) attack, an attacker can probe periodically to detect whether there is a phone call or not. Whenever he detects there is a phone call, the attacker injects AT + CHUP to cut the phone call. To make the matters worse, the attack is transparent to the victim, i.e., there is no indication on the mobile screen that an attack is going on. The victim device user perceives either there is no incoming call or abrupt call drops due to poor signal quality or network congestions. CVE-2019-16400 [5] has been assigned for this along with other reported denial of service attacks in Samsung phones. Call forwarding. If the victim device is subscribed to call forwarding service, the adversary may exploit the ATD command to forward victim device’s incoming calls to an attacker-controlled device. Exploiting this, the adversary first prevents the victim device from receiving the incoming calls and then learns sensitive information, such as password or pin for two-factor authentication possibly sent by an automated teller. Note that such call forwarding is also transparent to the user since the user is unaware of any incoming calls. Activating do not disturb mode. The adversary using a malicious Bluetooth peripheral or MitM instance can turn on the do not disturb mode of the carrier through ATD command. Similar to call forwarding attack, it is also completely transparent to the user as no visible indication of do not disturb mode is displayed on the device. While the user observes all the network status bars and the Internet connectivity, the device, however, does not receive any call from the network. Selective call blocking. A variation of the previous attack is also possible in which the adversary may allow the victim phone to receive selective calls by intermittently turning on/off the do not disturb mode. This may force the user to receive calls only from selective users not affecting others. 5.4 Findings over USB (RQ2) We now discuss findings over USB. (1) Syntactic errors – responds ok with composite actions. It is one of the interesting classes of problematic grammars for which the AT interface of the affected devices respond to invalid AT commands with ok, but performs multiple actions together. These invalid commands are compositions of invalid characters and two valid AT commands with no semicolon as their separator. For instance, ATFuzzer generated an invalid command ATIHD + 4632048034; using two valid grammars for ATD and ATI (as shown in Figure 3) and invalid characters for which the target device returns ok but places a phone call to 4632048034 and shows the manufacturer, model revision, and IMEI information simultaneously. (2) Syntactic errors – responds ok with an action. In this type of syntactically problematic grammars, the target device responds to an invalid command instance with ok but performs an action. For instance, the grammar cmd →Arg1. I .Arg2 in Table 3 can be instantiated with an invalid command instance ATHIX which returns sensitive device information. (3) Syntactic errors – responds error with an action. In this class of syntactic errors, the AT interface recognizes the inputs as faulty by acknowledging with error, but it still executes the action associated with the command and even does worse by crashing the RIL daemon and inducing complete disruptions in the cellular Internet connectivity. It basically reveals a fundamental flaw in the AT interface— if a command is considered as erroneous, it should not be executed. For example, the grammar cmd →D . Dnum in Table 3 can be instantiated with ATD+4632048034 (a variation of the ATD production rule in Figure 3) which is supposed to start a cellular voice call. Instead, the grammar returns error in the form of NO CARRIER and induces the cellular Internet connectivity to go down completely for a certain amount of time (15-20 seconds). We have also found grammars for which the device returns other error statuses, e.g., ERROR, NO CARRIER, CME ERROR, ABORTED, and NOT SUPPORTED, but still executes those invalid commands. (4) Semantic errors. This class of grammars conforms with the input pattern defined by the standards [11], but induces disruptions in the cellular connectivity for which the recovery requires rebooting the device. The grammars of this class are shown in Table 3. Possible exploitation. It may appear that the implications of invalid AT commands over USB are negligible as compared to the valid AT commands which may wreak havoc by taking full control of the device. We, however, argue that if AT interface exposure is restricted through blacklisting the critical and unsafe valid AT commands by the parser in the first place, the adversary will still be able to induce the device to perform same semantic functionalities using invalid AT commands. This is due to the uncovered vulnerabilities for which the parser will fail to identify the invalid AT commands as the blacklisted commands and thus allows the adversary to achieve same functionalities as the valid ones. 5.4.1 Efficacy of grammar-aware crossover (RQ3). ATFuzzer without crossover (by disabling the crossover in ATFuzzer) uncovered only 3 problematic grammars as compared to ATFuzzer with all proposed crossover and mutations (Table 4). This is due to the fact that ATFuzzer without crossover cannot induce enough changes in the structure and type of the arguments of parent grammars, as a result of which it reduces the search space. 5.4.2 Efficacy of grammar-aware mutation (RQ4). Since ATFuzzer without mutation cannot induce changes in the arguments and the respective conditions, it uncovered only 2 problematic grammars. ATFuzzer without crossover, however, performs slightly better than that of the ATFuzzer without crossover. This also justifies our intuition that mutation strategies play a vital role in any fuzzer as compared to crossover techniques. Without mutation, a fuzzer unlikely generates interesting inputs for the system under test. 5.4.3 Efficacy of timing feedback (RQ5). We observed that ATFuzzer without feedback performs better than the other two (RQ2 and RQ3) variations. ATFuzzer without feedback uncovered 5 problematic grammars and thus is less effective than ATFuzzer with feedback. AT interface being a complete black box with little to no feedback we had to resort to various creative ways including timing information to generate feedback score. However, This resorts to an upper bound for the coverage information and loosely dictates ATFuzzer. 5.4.4 Comparison with other state-of-the-art fuzzer (RQ6). We compare the effectiveness of ATFuzzer against AFL (American Fuzzy Class of Bugs Syntatctic –returns OK with composite actions Syntatctic – returns OK with an action Syntatctic – returns ERROR with an action Semantic – returns OK with an action Grammar and Command Instance action/implication Nexus5 LG G3 Nexus6 Nexus6P HTC cmd →I.Arg.D.Dnum.Darg; Arg →[a − zA − Z ] Dnum →[a − zA − Z 0 − 9 + ∗#]∗ Darg →I G ϵ E.g. ATIHD + 4642048034I; cmd →+COPN; Arg Arg →[i I ]∗ E.g. AT + COPN; III cmd →Arg1.I.Arg1 →[X H ] E.g. ATHIX read, execution/ leaks manufacturer, model revision and IMEI ✓ ✓ ✓ ✓ ✓ read/ leaks list of operators, manufacturer, model revision and IMEI read/ leaks manufacturer, model revision and IMEI ✓ ✓ ✓ ✓ cmd →Arg1.I.Arg1.Arg1 Arg1 →X E.g. ATXIX cmd →Arg1.Arg2.Arg3 Arg1 →+C I M I I +C EER Arg2 →∗ ; Arg3 →Q Z E.g. AT + CIMI ∗ Q cmd →+CLCC;Arg1 Arg1 →[a − zA − Z 0 − 9]∗ E.g. AT + CLCC; ABC123 cmd →Arg1; Arg2 Arg1 →+CO P N + CG M I +CG M M + CG M R Arg2 →[X E] E.g. AT + COPN; X cmd →Arg1.Arg2.Arg3 Arg1 →+C I M I I +C EER Arg2 →; ∗ Arg3 →ˆ[Q Z ] E.g. ATI; L cmd →Arg1; Arg2 Arg1 →+CO P N + CG M I +CG M M + CG M R Arg2 →ˆ[X E] E.g. AT + CGMM; O cmd →Arg.D.Dnum.Darg; Arg →[a − zA − Z ] Dnum →[a − zA − Z 0 − 9 + #]∗ Darg →I G ϵ E.g. ATMD + 4632048034 cmd →+CUSD=,String String →[a − zA − Z 0 − 9 + ∗#]∗ E.g. AT + CUSD =, ABC read/ leaks manufacturer, model revision and IMEI ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ read/leaks current call list ✓ ✓ read/leaks list of operators, IMEI, model and revision information of the device ✓ ✓ ✓ read/ leaks IMSI, manufacturer, model revision and IMEI ✓ ✓ ✓ read /leaks list of operators, IMEI, model and revision information of the device ✓ ✓ ✓ ✓ crash/ internet connectivity disruption ✓ ✓ ✓ ✓ ✓ cmd →+CCFC=Arg1,Arg2,Arg3, crash/ internet connectiv✓ ity disruption 145,32,Arg4,13,27 Arg1 →[1 − 5] Arg2 →[1 − 2] Arg3 →[0 − 9]∗ Arg4 →[a − zA − z0 − 9]∗ E.g. AT + CCFC = 3, 2, 732235, 145, 32, cA4{NYv, 13, 27 cmd →+COPS = 0,Arg1,Arg2,2 crash/internet connectiv✓ ity disruption Arg1 →[0 1] ∗ Arg2 →[a − zA − z] E.g. AT + COPS = 0, 1, c19v6fC, 2 Table 3: Summary of ATFuzzer’s findings over USB. Lop) [59]. Since current versions of AFL require instrumenting the test programs, we implemented a modest string fuzzer that adopts five mutation strategies (walking bit flips, walking byte flips, known integers, block deletion and block swapping) employed by AFL and incorporated our proposed timing-based feedback loop to it. We evaluate this AFL variant with 80 different seeds (consisting of valid ✓ ✓ read/ leaks IMSI, manufacturer, model revision and IMEI crash/ internet connection disruption S8plus ✓ and invalid command instances of randomly chosen 40 different AT reference grammars). Table 4 shows that the AFL variant uncovered 2 different problematic grammars whereas ATFuzzer uncovers 9 unique grammars after running for 3 days. Though we decided to compare our tool with AFL, which is the best choice we had as AFL is considered the state-of-the-art tool for fuzzing, we do not claim the comparison to Fuzzing Approach ATFuzzer ATFuzzer w/o feedback ATFuzzer w/o crossover ATFuzzer w/o mutation Modified AFL Problematic Grammars 9 5 3 2 2 Table 4: Result obtained with different fuzzing approaches on Nexus5 over a period of 3days for each approach. be ideal. Because AFL relies heavily on code average information and for our case, we replaced the coder coverage with the best available substitute, i.e., coarse-grained timing information as a loose indicator to code coverage. We acknowledge that this is a best-effort approach and the evaluation may be sub-deal. 6 RELATED WORK In this section, we mainly discuss the relevant work on the following four topics: AT commands, mutation-based fuzzing, grammar-based mutation, grammar-based generation. AT commands. Most of the previous work related to AT commands follow investigate how an adversary can misuse valid AT commands to attack various systems. The work from Tian et al. [53] can be considered the most relevant to our work, however, it is significantly different in the following three aspects: (i) Firstly, they only show the impact of AT commands over USB as they consider the functionality and scope of AT commands over Bluetooth too limited to study. We, however, demonstrate the dire consequences of AT commands over Bluetooth interface with the uncovered invalid and valid AT commands. (ii) Secondly, they only show the impact of valid AT commands whereas we demonstrate the impact of invalid AT commands exploring different attack surfaces. (iii) Finally, one of the primary objectives of our work is to test the robustness of the AT interface, which is a different and complimentary end objective than theirs. BlueBug [24] exploits a Bluetooth security loophole on few Bluetooth-enabled cell phones and issues AT commands via a covert channel. It, however, relies on the Bluetooth security loophole to attack and does not apply to all phones. In contrast, we have demonstrated a variety of attacks using valid and invalid AT commands running over Bluetooth which do not rely on any specific Bluetooth assumptions and also applicable to all the modern smartphones we had in our corpus. Injecting AT commands on android baseband was previously discussed on the XDA forum [23]. Pereira et al. [43, 45] used AT commands to flash malicious images on Samsung phones. Hay [29] discovered that AT interface can be exploited from Android bootloader and discovered new commands and attacks using the AT interface. AT commands have been used to exploit modems other than smartphones as well. Most prominently, USBswitcher [44, 49] and [43] demonstrate how these commands expose actions potentially causing security vulnerabilities in smartphones. Some other work use AT commands as a part of their tool, for instance, Mulliner et al. [42] use the AT commands as feedback while fuzzing SMS of phones. Xenakis et al. [57, 58] devise a tool using AT commands to steal sensitive information from baseband. None of them, however, actually analyzes or discovers bugs in the AT parser itself. Mutation based fuzzers. Initial mutation-based fuzzers [41] used to mutate the test inputs randomly. To make this type of fuzzers more effective, a huge amount of work has been carried out to develop sophisticated techniques to improve mutation strategies— coverage information through instrumenting the binary [28, 36, 37, 59]; resource usage information [35, 46]; control and data flow features [48]; static vulnerability prediction models [38]; data-driven seed generation [55]; high-level structural representation of seed file [47]. There are also a few mutation-based fuzzers that incorporate the idea of grammars rather than inputs. Wang et al. [56] use grammars to guide mutation whereas Aschermann et al. [26] rely on code coverage feedback. Simulated annealing power schedule with genetic fuzzing has also been incorporated in [27]. However, due to the black-box nature of our system and structural pattern of AT command inputs, none of the existing concepts suffice fuzzing AT parser. Generation-based fuzzers. Generation based fuzzers generate inputs based on a model [19, 20, 31, 54], specification or defined grammars. However, to the best of our knowledge, no fuzzer discovers a class of bugs at the grammar-level, rather generates concrete input instances. There are also some generation-based, more precisely, defined grammar-based fuzzers [16] [15] which use manually specified grammars as inputs. For instance, Mangeleme is an automated broken HTML generator and fuzzer, and Jsfunfuzz [15] uses specific knowledge about past and present vulnerabilities and uses grammar rules to produce inputs that may cause problems. Both of them are, however, random fuzzers. 7 DISCUSSION Defenses. Our findings show that current implementations of baseband processors and AT command interfaces fail to correctly parse and filter out some of the possible anomalous inputs. In this paper, we do not explicitly explore defenses for preventing malicious users from exploiting these flaws. However, we show that restricting the AT interface through access control policies, black-listing should not work due to the parsing bugs and invalid AT commands, that the parser executes. Completely removing the exposure of AT modem interface over Bluetooth and USB can resolve the problem. Other than that at a conceptual level, having a formal grammar specification of the supported AT command grammar may provide a better way to test implementations of the AT interface. Another aspect that requires particular attention is the deployment of stricter policies that filter out anomalous AT commands. Responsible disclosure. Given the sensitive nature of our findings, we have reported these to the relevant stakeholders (e.g., respective modems and devices vendors and manufacturers). Moreover, following the responsible disclosure policy we have waited 90 days before making our findings public. Currently to our knowledge Samsung has been working to issue a patch to mitigate the vulnerabilities. 8 CONCLUSION AND FUTURE WORK The paper proposes ATFuzzer for testing the correctness of the AT interface exposed by the baseband processor in a smartphone. Towards this goal, ATFuzzer leverages a grammar-guided evolutionary fuzzing-based approach. Unlike generational fuzzers which use the input grammar to generate syntactically correct inputs, ATFuzzer mutates the production rules in the grammar itself. Such an approach enables ATFuzzer to not only efficiently navigate the input search space but also allows it to exercise a diverse set of input AT commands. Future work. In the future, we want to apply hybrid fuzzing in our problem domain. In the hybrid fuzzing paradigm, a black-box fuzzer’s capabilities are enhanced through the use of lightweight static analysis (e.g., dynamic symbolic execution, taint analysis). Such an approach would, however, require us to address the issues concerning firmware binaries’ practice of employing obfuscation and encryption. ACKNOWLEDGEMENT We thank the anonymous reviewers for their suggestions. This work is supported by NSF grants CNS-1657124 and CNS-1719369, Intel, and a grant by Purdue Research Foundation. REFERENCES [1] [n.d.]. AT Commands For CDMA Wireless Modems. http://www.canarysystems. com/nsupport/CDMA_AT_Commands.pdf. [2] [n.d.]. CVE-2016-4030. https://nvd.nist.gov/vuln/detail/CVE-2016-4030. [3] [n.d.]. CVE-2016-4031. https://nvd.nist.gov/vuln/detail/CVE-2016-4031. [4] [n.d.]. CVE-2016-4032. https://nvd.nist.gov/vuln/detail/CVE-2016-4032. [5] [n.d.]. CVE-2019-16400. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16400. [6] [n.d.]. CVE-2019-16401. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16401. [7] [n.d.]. CWE-325: Missing Required Cryptographic Step - CVE-2018-5383. In Cernegie Mellon University ,CERT Coordination Center. https://www.kb.cert.org/ vuls/id/304725/. [8] [n.d.]. Digital cellular telecommunications system (Phase 2+); AT Command set for GSM Mobile Equipment (ME) (3GPP TS 07.07 version 7.8.0 Release 1998). https://www.etsi.org/deliver/etsi_ts/100900_100999/100916/07.08.00_ 60/ts_100916v070800p.pdf. [9] [n.d.]. Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; AT command set for User Equipment (UE) (3GPP TS 27.007 version 13.6.0 Release 13). https://www.etsi. org/deliver/etsi_ts/127000_127099/127007/13.06.00_60/ts_127007v130600p.pdf. [10] [n.d.]. Digital cellular telecommunications system (Phase 2+); Specification of the Subscriber Identity Module -Mobile Equipment (SIM-ME) interface (3GPP TS 51.011 version 4.15.0 Release 4). https://www.etsi.org/deliver/etsi_TS/151000_ 151099/151011/04.15.00_60/ts_151011v041500p.pdf. [11] [n.d.]. Digital cellular telecommunications system (Phase 2+), Universal Mobile Telecommunications System UMTS, LTE, AT command set for User Equipment UE. https://www.etsi.org/deliver/etsi_ts/127000_127099/127007/10.03.00_60/ts_ 127007v100300p.pdf. [12] [n.d.]. Digital cellular telecommunications system (Phase 2+); Use of Data Terminal Equipment - Data Circuit terminating; Equipment (DTE - DCE) interface for Short Message Service (SMS) and Cell Broadcast Service (CBS) (GSM 07.05 version 5.3.0). https://www.etsi.org/deliver/etsi_gts/07/0705/05.03.00_60/gsmts_ 0705v050300p.pdf. [13] [n.d.]. EVDO and CDMA AT Commands Reference Guide. https://www. multitech.com/documents/publications/manuals/s000546.pdf. [14] [n.d.]. HUAWEI MU609 HSPA LGA Module Application Guide. https://www.paoli.cz/out/media/HUAWEI_MU609_HSPA_LGA_Module_ Application_Guide_V100R002_02(1).pdf. [15] [n.d.]. jsfunfuzz [online]. https://github.com/MozillaSecurity/funfuzz/tree/master/src/funfuzz/js/ jsfunfuzz. [16] [n.d.]. Mangleme [Online]. https://github.com/WebKit/webkit/tree/master/Tools/mangleme. [17] [n.d.]. Motorola AT Command Set. https://ipfs.io/ipfs/ QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Motorola_ phone_AT_commands.html. [18] [n.d.]. Neo 1973 and Neo FreeRunner GSM modem, AT Command set. http: //wiki.openmoko.org/wiki/Neo_1973_and_Neo_FreeRunner_gsm_modem. [19] [n.d.]. Peach Fuzzer Platform [online]. https://www.peach.tech/. [20] [n.d.]. Radamsa [online]. https://gitlab.com/akihe/radamsa. [21] [n.d.]. Sony Erricsson AT Command set. https://www.activexperts.com/smscomponent/at/sonyericsson/. [22] [n.d.]. Wikipedia. https://en.wikipedia.org/wiki/Hayes_command_set. [23] [n.d.]. XDA Forum [online]. https://forum.xda-developers.com/galaxy-s2/help/how-to-talk-to-modemcommands-t1471241. [24] M.Herfurt A. Laurie, M. Holtmann. [n.d.]. The bluebug. AL Digital Ltd. https: //trifinite.org/trifinite_stuff_bluebug.html#introduction. [25] Iosif Androulidakis. 2011. Intercepting mobile phone calls and short messages using a gsm tester. In International Conference on Computer Networks. Springer, 281–288. [26] Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for Deep Bugs with Grammars. In Proceedings of the Network and Distributed System Security Symposium (NDSS). [27] Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2329–2344. [28] S. Gan, C. Zhang, X. Qin, X. Tu, K. Li, Z. Pei, and Z. Chen. [n.d.]. CollAFL: Path Sensitive Fuzzing. In 2018 IEEE Symposium on Security and Privacy (SP), Vol. 00. 660–677. https://doi.org/10.1109/SP.2018.00040 [29] Roee Hay. 2017. fastboot oem vuln: android bootloader vulnerabilities in vendor customizations. In 11th {USENIX } Workshop on Offensive Technologies ( {WOOT } 17). [30] Roee Hay and Michael Goberman. 2017. Attacking Nexus 6 & 6P Custom Bootmodes. (2017). https://www.docdroid.net/dxKUj5c/attacking-nexus-66p-custom-bootmodes.pdf. [31] Christian Holler, Kim Herzig, and Andreas Zeller. [n.d.]. Fuzzing with Code Fragments. [32] Syed Rafiul Hussain, Omar Chowdhury, Shagufta Mehnaz, and Elisa Bertino. 2018. LTEInspector: A Systematic Approach for Adversarial Testing of 4G LTE. In 25th Annual Network and Distributed System Security Symposium, NDSS, San Diego, CA, USA, February 18-21. [33] Syed Rafiul Hussain, Mitziu Echeverria, Omar Chowdhury, Ninghui Li, and Elisa Bertino. 2019. Privacy Attacks to the 4G and 5G Cellular Paging Protocols Using Side Channel Information. (2019). [34] George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18). ACM, New York, NY, USA, 2123–2138. https://doi.org/10.1145/3243734.3243804 [35] Caroline Lemieux, Rohan Padhye, Koushik Sen, and Dawn Song. 2018. PerfFuzz: Automatically Generating Pathological Inputs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). ACM, New York, NY, USA, 254–265. https://doi.org/10.1145/3213846.3213874 [36] Caroline Lemieux and Koushik Sen. 2018. FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, New York, NY, USA, 475–485. https://doi.org/10.1145/3238147.3238176 [37] Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: Program-state Based Binary Fuzzing. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 627–637. https://doi.org/10.1145/3106237. 3106295 [38] Yuwei Li, Shouling Ji, Chenyang Lv, Yuan Chen, Jianhai Chen, Qinchen Gu, and Chunming Wu. 2019. V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing. CoRR abs/1901.01142 (2019). arXiv:1901.01142 http://arxiv.org/abs/1901.01142 [39] Angela Lonzetta, Peter Cope, Joseph Campbell, Bassam Mohd, and Thaier Hayajneh. 2018. Security vulnerabilities in Bluetooth technology as used in IoT. Journal of Sensor and Actuator Networks 7, 3 (2018), 28. [40] Ulrike Meyer and Susanne Wetzel. 2004. A man-in-the-middle attack on UMTS. In Proceedings of the 3rd ACM workshop on Wireless security. ACM, 90–97. [41] Barton P. Miller, Louis Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM 33, 12 (Dec. 1990), 32–44. https://doi.org/10.1145/96267.96279 [42] Collin Mulliner and Charlie Miller. 2009. Fuzzing the Phone in your Phone (Black Hat USA 2009). [43] André Pereira, Manuel Correia, and Pedro Brandão. 2014. Charge your device with the latest malware.. In BlackHat Europe. [44] André Pereira, Manuel Correia, and Pedro Brandão. 2014. USB Connection Vulnerabilities on Android Smartphones: Default and Vendors’ Customizations. In Communications and Multimedia Security, Bart De Decker and André Zúquete (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 19–32. [45] André Pereira, Manuel Correia, and Pedro Brandão. 2014. USB connection vulnerabilities on android smartphones: Default and vendorsâĂŹ customizations. [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] In IFIP International Conference on Communications and Multimedia Security. Springer, 19–32. Theofilos Petsios, Jason Zhao, Angelos D. Keromytis, and Suman Jana. 2017. SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities. CoRR abs/1708.08437 (2017). arXiv:1708.08437 http://arxiv.org/ abs/1708.08437 Van-Thuan Pham, Marcel Böhme, Andrew E. Santosa, Alexandru Razvan Caciulescu, and Abhik Roychoudhury. 2018. Smart Greybox Fuzzing. CoRR abs/1811.09447 (2018). arXiv:1811.09447 http://arxiv.org/abs/1811.09447 Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. Vuzzer: Application-aware evolutionary fuzzing. In Proceedings of the Network and Distributed System Security Symposium (NDSS). P. Roberto and F. Aristide. 2014. Modem interface exposed via USB.. In BlackHat Europe. https://github.com/ud2/advisories/tree/master/android/samsung/nocve2016-0004. David Rupprecht, Katharina Kohls, Thorsten Holz, and Christina Pöpper. [n.d.]. Breaking LTE on layer two. Mike Ryan. 2013. Bluetooth: With low energy comes low security. In Presented as part of the 7th {USENIX } Workshop on Offensive Technologies. Wireless Solutions Telit. [n.d.]. AT Commands Reference Guide. https://www.telit.com/wp-content/uploads/2017/09/Telit_AT_Commands_ Reference_Guide_r24_B.pdf. Dave (Jing) Tian, Grant Hernandez, Joseph I. Choi, Vanessa Frost, Christie Raules, Patrick Traynor, Hayawardh Vijayakumar, Lee Harrison, Amir Rahmati, Michael Grace, and Kevin R. B. Butler. 2018. ATtention Spanned: Comprehensive Vulnerability Analysis of AT Commands Within the Android Ecosystem. In 27th USENIX Security Symposium (USENIX Security 18). Baltimore, MD, 273–290. https://www.usenix.org/conference/usenixsecurity18/presentation/tian. Spandan Veggalam, Sanjay Rawat, Istvan Haller, and Herbert Bos. 2016. IFuzzer: An Evolutionary Interpreter Fuzzer Using Genetic Programming. In Computer Security – ESORICS 2016, Ioannis Askoxylakis, Sotiris Ioannidis, Sokratis Katsikas, and Catherine Meadows (Eds.). Springer International Publishing, Cham, 581– 601. J. Wang, B. Chen, L. Wei, and Y. Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In 2017 IEEE Symposium on Security and Privacy (SP). 579–594. https: //doi.org/10.1109/SP.2017.23 Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2018. Superion: GrammarAware Greybox Fuzzing. CoRR abs/1812.01197 (2018). arXiv:1812.01197 http: //arxiv.org/abs/1812.01197 Christos Xenakis and Christoforos Ntantogian. 2015. Attacking the baseband modem of mobile phones to breach the users’ privacy and network security. In Cyber Conflict: Architectures in Cyberspace (CyCon), 2015 7th International Conference on. IEEE, 231–244. Christos Xenakis, Christoforos Ntantogian, and Orestis Panos. 2016. (U) SimMonitor: A mobile application for security evaluation of cellular networks. Computers & Security 60 (2016), 62–78. M. Zalewski. [n.d.]. American fuzzy lop. [online]. http://lcamtuf.coredump.cx/afl/. A APPENDIX A.1 Target Devices Configuration. In this section, we provide additional detailed information about the required set up for the devices we tested. Some of the devices we tested expose their modem functionality by default and therefore required no additional configuration (also listed in Table 1). On the other hand, for the devices that do not expose any modem, it was necessary to root them and set a specific type of USB configuration. The USB configuration can be changed by setting sys.usb.config property. All the devices can be accessed through ADB (Android Debug Bridge) and Fastboot tools. With ADB it is possible to access the device’s file system, reboot it in different modes, such as bootloader mode, rooting it, and finally change the device’s properties directly with the command setprop . With fastboot, it is possible to operate the device in bootloader mode, install new partitions and change pre-boot settings required for rooting. For LG Nexus 5, we had to set sys.usb.config from the default “mnt,adb” to “diag,adb” through adb shell. This setting allows to access the phone in diagnostic mode and therefore to communicate with the AT command interface. For Motorola Nexus 6 and Huawei Nexus 6P, the USB configuration can be changed by first rebooting the phone in bootloader mode and then issuing the command “fastboot oem bp-tools-on” and “fastboot oem enable-bp-tools” to Nexus 6 and Nexus 6P, respectively as reported in [30]. After establishing serial communication with the device, it is possible to communicate with the smartphone through the AT interface.