See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/328399576 Intelligent Deception Detection through Machine Based Interviewing Conference Paper · July 2018 DOI: 10.1109/IJCNN.2018.8489392 CITATIONS READS 0 92 6 authors, including: Keeley Crockett Wasiq Khan Manchester Metropolitan University Liverpool John Moores University 146 PUBLICATIONS   1,321 CITATIONS    20 PUBLICATIONS   9 CITATIONS    SEE PROFILE SEE PROFILE Georgios Emmanouil Boultadakis National Technical University of Athens 11 PUBLICATIONS   58 CITATIONS    SEE PROFILE Some of the authors of this publication are also working on these related projects: Special Session: The Role of Computational Intelligence Technologies in Controlling Borders at the 2018 IEEE World Congress on Computational Intelligence, Rio de Janeiro, Brazil - 8-13 July 2018 View project Dialogue Act Classification View project All content following this page was uploaded by Wasiq Khan on 27 January 2019. The user has requested enhancement of the downloaded file. Intelligent Deception Detection through Machine Based Interviewing James O’Shea1, Keeley Crockett1, Wasiq Khan1, Philippos Kindynis2, Athos Antoniades2 , Georgios Boultadakis3 1 School of Computing, Mathematics and Digital Technology, Manchester Metropolitan University, Manchester, M1 5GD, UK, K.Crockett@mmu.ac.uk 2 Stremble Ventures LTD, 59 Christaki Kranou, 4042 Germasogeia, Limassol, Cyprus 3 European Dynamics, Brussels Abstract— In this paper an automatic deception detection system, which analyses participant deception risk scores from non-verbal behaviour captured during an interview conducted by an Avatar, is demonstrated. The system is built on a configuration of artificial neural networks, which are used to detect facial objects and extract non-verbal behaviour in the form of micro gestures over short periods of time. A set of empirical experiments was conducted based a typical airport security scenario of packing a suitcase. Data was collected through 30 participants participating in either a truthful or deceptive scenarios being interviewed by a machine based border guard Avatar. Promising results were achieved using raw unprocessed data on un-optimized classifier neural networks. These indicate that a machine based interviewing technique can elicit non-verbal interviewee behavior, which allows an automatic system to detect deception. Keywords- neural networks, avatar, deception detection I. INTRODUCTION Border control officers’ tasks rely on bilateral human interaction such as interviewing an individual traveller using verbal and non-verbal communication to both provoke response and interpret the traveler’s responses. Automated pre-arrival screening could greatly reduce the amount of time a participant spends at the border crossing point and may improve security control. Such a system would complement existing border control technology such as Advanced Passenger Information systems and future systems such as the new Entry/Exit System centralized border management system which will facilitate the automation of border control process (due for implementation in 2020) [1]. This paper presents initial work on an Automated Deception Detection system known as ADDS which is powered by a conversational agent avatar and is capable of quantifying the degree of deception on the part of the interviewee. ADDS forms part of the iBorderCtrl (Intelligent Portable Control System) [2] whose aim is to enable faster and more thorough border control for third country nationals crossing the land borders of EU Member States (MS) [2,3]. The final version of ADDS will utilize an advanced border control agent avatar which conducts an interview with a traveller. The avatar attitudes will be personalized to communicate with the traveler including utilizing subtle non-verbal communication cues to stimulate richer responses from them. A strong focus will be on identifying the impact on non-verbal communication expressed by the avatar on the performance of ADDS. Nonverbal behaviour is used by humans to communicate messages, which are transmitted through visual and auditory features such as facial expressions, gaze, posture, gestures, touch and non-linguistic vocal sounds [4]. A human being continually transmits nonverbal behavior, which can be produced subconsciously, in contrast to spoken language. The majority of work on the use of non-verbal behaviour (NVB) to determine a specific cognitive state has been undertaken by human observers, who are often prone to fatigue and produce different subject opinions. Hence, an automated solution is preferable. Related, but limited work has been done in the automated extraction of NVB from a learning system [5] to detect comprehension levels and also in detection of guilt and deception [6, 7]. Both of these examples have used artificial neural networks to first detect micro gesture patterns and then perform classification successfully. Time is also a factor, as interviewers need to interact longer with travelers to reach a conclusion on their deception intent. Such time comes at a premium in border control, resulting in short and potentially false positive results in the field. An automated system, which utilizes a few minutes of traveler time at the pre-crossing stage without increasing the amount of time they spend with a border control agent, could thus potentially increase efficacy while reducing cost. In this work deception detection in ADDS is performed by an implementation of the patented Silent Talker artificial intelligence based deception detector [6, 7]. The aim of research presented in this paper was to firstly produce a prototype trained artificial neural network (ANN) classifier to be used within the automatic deception detection system; secondly to investigate whether an avatar, machine based interviewing technique could be developed for a border security application which requires large volumes of interviews. Thus, the research question addressed in this paper can be stated as: Can a machine based interviewing technique elicit nonverbal behavior, which allows an automatic system to detect deception? This paper is organized as follows; Section II provides a description of prior work in the field of deception detection systems with emphasis on automation. The use of conversational agents is also examined in the border control context in terms of being used as avatar interviewers. Section III describes the ADDs system. Section IV presents the overall methodology of the data collection process and describes a series of border control scenarios, which are used to simulate truthful and deceptive behaviour of participants. Results and findings of a series of experiments are highlighted in Section V. Section VI presents the conclusions and future directions. II. PRIOR WORK A) Deception Detection Systems Human interest in detecting deception has a long history. The earliest records date back to the Hindu Dharmasastra of Gautama, (900 – 600 BC) and the Greek philosopher Diogenes (412 – 323 BC) according to Trovillo (1939). Today, the best-known method is the Polygraph [8], which was invented, by John Augustus Larson in 1921, to detect lies by measuring physiological changes related to stress. The Polygraph is a recording instrument, which displays physiological changes such as pulse rate, blood pressure, and respiration, in a form where they can be interpreted by a trained examiner as indicating truthful or deceptive behaviour. A polygraph test takes a minimum of 1.5 hours but can take up to four hours depending on the issue being tested for [8]. Individual scientific studies can be found which support [9] or deny [10] the validity of the Polygraph. A meta-study [11] conducted in 1985 found 10 studies from a pool of 250 were sufficiently rigorous to be included. From these they concluded that under very narrow conditions, the Controlled Question Test (CQT - the standard Polygraph test that could be used at border crossings) could perform significantly better than chance, but these results would still contain substantial numbers of false positive, false negative and inconclusive classifications. They also stated that many conditions needed to achieve this might be beyond the control of the examiner. Constructing a good set of control questions for this test requires substantial information about the interviewee's background, occupation, work record and criminal record to be collected before the exam. The polygraph requires physiological sensors on the traveler that would make both the set-up time and cost of an interview prohibitively expensive to apply to all travelers, thus typically if it is used, it is at a secondary stage for high-risk travelers. Functional Magnetic Resonance Imaging (fMRI) is a technique that measures changes in activity of areas of the brain indirectly by measuring blood flow (which changes to supply more oxygen to active areas of the brain). It has been proposed that there are reliable relationships between patterns of brain activation and deception that can be measured by fMRI. It has also been reported that although fMRI is seen as overcoming some weaknesses of the Polygraph, for example by having an explanatory model based on cognitive load [12] it is highly vulnerable to countermeasures (in common with EEG-based approaches). Voice Stress Analysis (VSA) is a technique that analyses physical properties of a speech signal as opposed to the semantic content. The technique is fundamentally based on the idea that a deceiver is under stress when telling a lie and that the pitch of the voice is affected by stress. More specifically, it claims that micro tremors, small frequency modulations in the human voice, are produced by the automatic or involuntary nervous system when an interviewee is lying. There have also been claims that the increased cognitive load of deception creates micro tremors [13]. The weight of scientific analysis is that, whatever the assumed underlying model, VSA performs no better than chance and has been described as “charlatanry” [14]. The most recent work in this area is contained in the INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language. Inspection of a sample of responses to the 2016 challenge shows them to be either paralinguistic, phonemic or a combination of the two, e.g. the Low Level Descriptors such as psychoacoustic spectral sharpness or phonetic features such as phonemes [15]. These techniques achieved approximately 67% using a technique called “Unweighted Average Recall” intended to take account of the fact that the Deceptive Speech Database (DSD) (dataset) from the University of Arizona was unbalanced (test set contained 24% deceptive / 76% truthful classes). We have not found evidence of a significant degree of paralinguistic research outside English. Facial Microexpressions are short-lived, unexpected expressions. There is said to be a small “universal” set of expressions of extreme emotion: disgust, anger, fear, sadness, happiness, surprise, and contempt, meaning they are common across cultures. A formalized method of encoding micro expressions was defined by Paul Ekman, who developed commercial tools for training interviewers to recognize them [16]. One of the resources is a manual on a Facial Action Coding System for training in expression recognition. This has generated a large body of research in automating FACS for applications such as lie detection Virtually all of the findings from micro expression studies are closer to a CKT than genuine lie detection, so they do not constitute persuasive evidence for using the technique at border crossings. B) Automated Deception Detection Silent Talker (ST) was designed to answer the criticisms of the psychology community that there are no meaningful single non-verbal indicators of deception (such as averted gaze), by combining information from many (typically 40) fine-grained nonverbal channels simultaneously and learning (through Artificial Neural Networks) to generalize about deceptive NVB from examples [6, 7]. In this respect, it does not depend on an underlying explanatory model in the same way as other lie detectors. However, it does have a conceptual model of NVB. This model assumes that certain mental states associated with deceptive behaviour will drive an interviewee’s NVB when deceiving. These include Stress or Anxiety (factors in psychological Arousal), Cognitive Load, Behavioral Control and Duping Delight. Stress and Anxiety are highly related, if not identical states. The key feature of ST, as a machine learning system, is that it takes a set of candidate features as input and determines itself which interactions between them, over time, indicated lying. Thus is not susceptible to errors caused by whether particular psychologists are correct about particular NVB gestures. Evidence to date is that no individual feature can be identified as a good indicator, only ensembles of features over a time interval provide effective classification. Early experiments with ST showed classification rates of between 74% and 87% (p<0.001) depending on the experimental condition [6]. There are no single, simple indicators of deception; ST uses complex interactions between multiple channels of microgestures over time to determine whether the behaviour is truthful or deceptive. A microgesture is a very fine-grained non-verbal gesture, such as the movement one eye from fully-open to half-open. This gesture could be combined with the same eye moving from half-open to closed indicating a wink or blink. Over a time interval, e.g. 3 seconds, complex combinations of microgestures can be mined from the interviewee’s behaviour. Microgestures are significantly different from micro-expressions (proposed in other systems), because they much more fine-grained and require no functional psychological model of why the behaviour has taken place [6]. [21] reported that human-like avatars that move realistically are more likeable and perceived as similar to real humans. This prior work suggests a strong potential for the use of avatars in border control interviews and the need for substantial research into the influential factors. The state of the art of this combination of technologies suggest that Avatars will be suitable for detecting deception in border crossing interviews, as they are effective extractors of information from humans [22] and therefore can applied to deception detection tasks. Secondly, they can provide dynamic responses to user inputs and can simulate affective signals [23]. III. AUTOMATIC DECEPTION DETECTION SYSTEM Figure 1 presents an abstract of the ADDS architecture as seen from within the final iBorderCtrl system. Each traveler who engages with the function of pre-traveller registration will be required (subject to providing informed consent) to undertake an interview with an avatar. In the final system the Avatar will adapt its attitude based upon the level of deception detected by ADDS on a question by question basis. For the purpose of training the neural network classifiers within ADDS in this paper a still image was used for the Avatar. C) Conversational Agents in the Border Control context. A Conversational Agent (CA) is an AI system that engages a human user in conversation to achieve some practical goal, usually a task perceived as challenging by the user. Embodied CAs offer the opportunity of more sophisticated communication through gesture and supplementing the dialogue with non-verbal communication [17]. The persona of an embodied CA is referred to as an Avatar and there is (limited) evidence supporting the use of an Avatar interviewer for automated border crossing control. Nunamaker [18] reported a group of experiments, culminating in an attempt to smuggle a concealed bomb past an avatar interviewer. These, collectively, suggest that an avatar can simulate affective signals during dialogue, can have a definable persona (gender, appearance) and can elicit cues to deception. In practice, such systems tend to rely on vocal features [18] or electrodermal activity and measure arousal as a proxy for deception. Hooi & Cho [19] have reported that perceived similarity of appearance between the avatar and interviewee reduces deceptive behaviour. Furthermore, Strofer et al. [20] observed that when interviewees believe that the avatar they are interacting with is controlled by a human, they produce more physiological responses (electrodermal), e believed to indicate deception. In a cognitive neuroscience review, de Borts and deGelder Fig.1. Automated Deception Detection System Architecture Also, for the purpose of the research conducted in this paper, the traveler information for the simulated border crossing interview will be captured through a series of scenarios (for deceptive participants), and a post experiment questionnaire. This information will then be used to populate a local database. In practice, the ADDS API will receive encrypted information about a specific traveller from the iBorderCtrl control system database and populate an instance of a trip/traveller in the ADDS back end database server. Classification was performed by the Silent Talker component of ADDS using an empirically determined risk level. The silent Talker component outputs the score for each of the questions and associated classification, the whole interview (score and classification) and the confirmation radio button responses. This updated the ADDS back end database server. In the final system, the ADDS Control module will use the risk scores to change the avatar attitude when the next question is asked to the traveller. In this work, the risk scores and classifications were simply stored in the ADDS local database for training, testing and validating the neural network classifiers. A) Silent Talker This work is specifically focused on the application specific development of the Silent Talker (ST) system (Fig.2.). ADDS utilizes 38 input channels to the deception network. They fall into 4 categories: eye data, face data, face angle data and 'other'. eye is half closed (1), otherwise false (0), the left eye may be considered open if neither of these pattern detectors is true. Channel Coder: The variations in the state of an object determined by specific pattern detector are referred to as a “channel”. Channel coding is the process of collecting these variations over a specific time period (i.e. over a specific number of video frames). Group Channel Coders: Group channel coders refers to the process of amalgamating and statistically summarizing the information from the individual channel coders to form a summary vector, which can be input to the final deception classifier. Deception Classifiers: Typically, the deception classifier is a single ANN trained to classify input vectors from the group channel coders as either truthful or deceptive. It is also possible to add other classifiers (for example to detect feeling of guilt) and combine these to obtain higher deception detection accuracy. B) Avatar Fig.2. Silent Talker component in ADDS ST uses features extracted from the non-verbal behaviour (NVB) of interviewees to determine whether they are lying or telling the truth. In this study, the application specific ST first receives a video stream (mobile app or web client) being received for classification The video arrives as a sequence of frames, each frame is processed in sequence and information from the frames is compiled or accumulated for the purpose of classification. The deception classifiers used in this paper are multi-layer perceptrons producing a continuous output in the range -1 to +1. Empirically determined thresholding of this output was used for the truthful and deceptive classifications. The consequence of this is that some frame sequences will be labelled as “unclassifiable.” If a single decision boundary were used, these would have outputs too close to the decision boundary to justify confidence in them. A simplified description of the components in Figure 2 now follows: The final ADDS system will use animation to pose each question which will be personalized for each border guard avatar in accordance with the travelers non-verbal state. A sample of a border guard avatar posing a question can be found here: http://stremble.com/iBorderCtrl/1/1/1/1.mp4). However as the development of ADDS, as a system was happening in parallel to the training and validation of the deception detection element, a still image of the male avatar developed by Stremble [24] was used within this work (Figure 3). The avatar is presented in a uniform to convey an air of authority. In this experiment, the avatar is shown as a still image and the speech is synthesized. One reason for this was to see if any emotional states were conferred on the (actually neutral) avatar by participants. Object Locators: Each object locator finds the position of a particular object, (e.g. the head, the left eye, the right eye etc.) in the current video frame. A typical object locator would consist of a back propagation Artificial Neural network (ANN) trained with samples extracted from video collected during a training experiment. Pattern Detectors: Pattern detectors detect particular states of objects located by the object locators. For example, for the left eye: left eye closed is true when the left eye is closed (1), otherwise false (0), left eye half closed is true when the left Fig.3. Male Avatar IV. METHODOLOGY This section describes the methodology to conduct a quantitative empirical study of non-verbal behaviour with samples of volunteer participants concerning truthful and deceptive conditions. The hypothesis tested was:  H0: A machine based interviewing technique cannot be used to detect deception from non-verbal behaviour All participants engaged in truthful or deceptive scenarios were required to pack a suitcase. The standard items relevant to both deceptive and truthful scenarios that are packed are: a pair of unisex trousers, a T-shirt, shower gel, a tube of toothpaste, a hand towel and a bottle of perfume in a box. If a participant took part in a deceptive scenario, then an intervention by a confederate took place that involved the confederate modifying the contents of the perfume box with a prohibited item depending on the scenario (S2.S5). The participant was also shown typical posters of prohibited items from airport baggage handling areas. The experimental methodology comprised a pre-interview task which sets up the scenario for truth-telling or deception, the interview itself and a debriefing stage which will include certain ethics aspects (confirmation of consent, permissions to use materials etc.) and some subjective ratings of feelings during the interview (e.g. subjective guilt etc.). Each participant was taken in to the debriefing room and asked to read the participant information sheet, invited to ask any questions and then sign the first part of the participant informed consent document. In the debriefing session, participants completed the second part of the informed consent form to confirm they still consented to their data to be used in the study. H1: A machine based interviewing technique can be used to detect deception from non-verbal behaviour at levels statistically greater than chance. A) Overview of data collection process Following an approved ethical process, data was collected through participants first performing a simulated deception task using different levels of arousal by varying the task. This was followed by a recorded interview by the Avatar Border Guard to collect video data for analysis of non-verbal behaviour. On completion of the interview, participants completed a questionnaire containing simple subjective selfratings of internal mental states. The experiment utilised a number of scenarios in order to test the hypothesis. The two major categories are Scenario Truthful and Scenario Deceptive. S1: Scenario Truthful  All participants will use their true identities as recorded in their identification documents.  All participants will answer questions about a real relative or friend who is an EU / UK citizen (equivalent of a Sponsor in border questions asked by EU border guards).  All participants will pack a suitcase with harmless items typical of going on a holiday  Participants will answer questions about identity, sponsor and suitcase contents.  All answers to questions can be answered truthfully. Scenarios Deceptive  All participant given fake identities (male / female) and short life history.  All participants are given a short description of a fake relative from the EU.  Participants are stratified into 1 of 4 blocks. These blocks are intended to have different degrees of emotional intensity / deceptive stakes (in decreasing order of intensity). o S2: Simulated biohazard infectious disease in test tube with informational video about weaponization o S3: Simulated biohazard infectious disease in test tube without informational video o S4: Simulated Drug package (soap powder in clear packet) o S5: Simulated Forbidden agriculture/ food product i.e. seeds. Participants will answer questions about identity, sponsor and suitcase contents. B) Questions for scenario Table I shows the questions that all participants answered during the experiment. Some of the questions come from the set of questions actually asked by border guards at the border crossing point. However, many of these questions are not practical to ask in the experimental scenario. Therefore, a methodology was devised to substitute a minimum-sized set of proxy questions, which cover the same psychological / cognitive properties. This was found by analyzing a set of questions that were provided by experts from the Hungarian National Police Polish Border Guards, State Border Guard of the Republic of Latvia and TRAINOSE (Greece). Table I: Experiment Interview Questions Question Number 1 2 3 4 5 6 7 8 9 10 11 12 13 Question What is your family name? What is in your case? Have you seen any posters of prohibited items? Are there any items from the lists of prohibited items in your case? How many items are in the case? If you open the case and show me what is inside, will it confirm that your answers were true? What is your first name? When were you born? Where were you born? What is your current citizenship? Please tell me the name of a friend or family member who can confirm your identity? What is the relationship of this person to you? Where does this person live? C) Interview conducted using Wizard of Oz Methodology Collection of data to train the deception detection component of ADDS uses the established “Wizard of Oz” methodology. In this method (figure 4) a human, called the “Wizard” manually controls a simulated Avatar to create an experiment which is experienced (as closely as possible) by the participants as if they were being interviewed by a real Avatar. In this experiment, the Wizard operated a web app via Wi-Fi, which controlled the display on the participant’s screen. The Wizard has access to a GUI allowing the selection of questions that are played in a window on the participant’s screen. During the experiment:  The participant aligned their face with the camera using on screen instructions.  The simulated Avatar maintained a neutral expression  The questions were delivered verbally to the participant by the static avatar through text to speech recordings.  Video of the participant was captured on a question-byquestion basis and stored for the purposes of training and testing. The start time of the question is recorded as when the Avatar starts speaking. Once the participant has finished answering the question, the Wizard clicks to progress to the next question. The time of this click was recorded as the end of a question. The wizard also had the option of repeating a question if necessary. 30 frames) to hold sufficient information of the channel states. Each slot is considered a single vector encoding the information/states for 38 channels. Furthermore, the vector is considered only if it is extracted from a valid slot. A valid slot always consists of the channel information for face and both eyes. Detailed explanation of slot validity is explained in previous work [6]. Table II: Experimental Dataset No. of Question per Interview 14 Total Participants 32 (17 Deceptive, 15 Truthful) 448 Total number of video files Deceptive participants No of Channel Analyzed Male (10) Female (7), Asian/Arabic (4) EU White (13) Male (12), Female (3) Asian/Arabic (6), EU White (9) 38 Total number of truthful vectors in dataset 43051 Total number of deceptive vectors in the dataset 43535 Truthful Participants Two strategies were employed for training, testing and validation: Bootstrapping and Leave one out which are described version in section V. These were used to develop suitable Artificial Neural Networks for the channels and stages shown in figure 2. The final classifying ANN (i.e. deception classifiers) produces an activation, which is thresholded to determine truthfulness or deceptiveness. This was not dynamic (i.e. continuously adaptive) but was optimized manually, based on the suitcase experiment data. Further manual optimization will be carried out after initial runs of the pilot. Deception risk scores obtained from the deception classifiers are bipolar in the range [-1, 1], where deceptive is +1 and truthful is -1. The deception score for a question is calculated as: 𝐷𝑞 = Fig. 4. The Wizard-of-Oz (WoZ) experiment D) Group design and stratification The video data of participants was recorded using the questions presented in Table I and are automatically cropped/segmented into question-by-question video files. Table II shows the dataset for truthful and deceptive participants. The data is captured using the built-in web-cam with the default video resolution of 640*480 and 30 frames per second (fps). The channel data is extracted from each question using a fixed sliding window (slot) of 1 second (i.e. ∑𝑛 𝑠=1 𝑑𝑠 𝑛 (1) Where, Dq is the deception risk score of current question, ds is the deception score of slot s and n is the total number of valid slots for the current question. Then the thresholding is applied as stated below: IF Question_risk (Dq) <= x THEN Indicates truthful ELSE IF Question_risk (Dq) >= y THEN Indicates deception ELSE Indicates unknown END IF Where x and y are determined empirically. Initial values were: x = 0.05 and y = +0.05. This indicates that the final decision for the question classification (i.e. truthful/deceptive) is made regardless of the quantity of valid slots. In other words, a question is classified as unknown when there is no single valid slot that exists (i.e. n = 0) or the 𝐷𝑞 value lies in between x and y. V. RESULTS AND FINDINGS A) Initial Results The dataset obtained from the group channel coders is fed into the deception classifier that firstly used the 10-fold crossvalidation strategy for train/validate/test the networks. The percentage split of the entire input data for training, validation and testing was 70:20:10 percent respectively. There are 38 inputs to the network with one hidden layer and single output. Networks are trained on varying number of neurons (i.e. 11-20 in these experiments) in the hidden layer to observe the impact on the performance. A bipolar sigmoid transfer function is used while training the networks. Maximum number of epochs are set to 10,000. The aim of this initial work is to establish whether a machine based interviewing technique can be used to detect deception from non-verbal behaviour, and there was no tuning of the classifiers attempted. With the exception of removing redundant duplicated vectors, the initial results presented in Table III and Table IV are derived from raw unprocessed data. Table III shows the percentage accuracy of the deception classifiers obtained using a varying number of neurons in the hidden layer and the aforementioned parameter settings. It is observed that the network performance is gradually increased while increasing the number of neurons. The highest training/validation/test average accuracy indicated 96.55% and 96.78% in terms of truthful and deceptive classification respectively with 20 neurons in the hidden layer. The trained networks with the optimum classification accuracy are then further used for the testing on unseen dataset. Table III: Results using 10 Fold Cross Validation No. of Hidden Layer Neurons Accuracy (%) Training Validation T T D D Test T D 11 94.13 95.04 93.68 94.41 94.30 93.69 12 94.45 95.63 93.75 94.74 93.62 94.96 13 94.92 95.77 94.41 95.14 94.31 95.15 14 94.85 96.26 94.29 95.67 94.23 95.46 15 96.19 96.19 95.40 95.50 95.50 95.41 16 96.16 96.40 95.58 95.80 95.45 95.91 17 96.56 96.98 95.90 96.32 95.75 96.22 18 96.81 97.17 96.14 96.52 95.91 96.28 19 97.23 97.11 96.48 96.48 96.67 96.45 20 97.28 97.50 96.53 96.81 96.55 96.78 B) Testing classifiers The strategy used for testing the classifiers is based on leaving one pair out (i.e. one truthful and one deceptive participant) for testing while training and validating the networks on the rest of the participants’ data (30 participants). Then the trained networks performance was tested using the unseen data of two participants. To examine the effect of totally unseen participants, 9 experimental runs were conducted, each involving the random selection of a pair of test participants (one truthful, one deceptive). Table IV shows the average test accuracy was measured to be 73.66% for deceptive tests and 75.55% for the truthful tests. These outcomes indicate a substantial decrease in the classification accuracy when compared with the classification outcomes presented in Table III. When using 10-fold cross-validation, the classifiers have seen some of the material (i.e. image vectors) from a test participant’s interview in the training set (but training and test sets of vectors were mutually exclusive). Consequentially, the cross validation approach builds a model containing some of the psychological properties of the people who it classifies. In the second case (leave one pair out strategy), it sees no material of participants and relies on the commonality between their behavior and the behaviors of the participants used for training. We postulate that a large number of participants will build a larger general model, which will improve classification accuracy on previously unseen cases. Table IV: Classification Outcomes using Unseen Participants Participant Test No Truthful Accuracy (%) Deceptive Gender Ethnicity Gender Ethnicity Truthful Deceptive 1 M EU M A/A 100 2 M A/A F EU 50 36 3 M A/A F EU 50 100 4 M EU F EU 90 100 5 M A/A M EU 100 10 6 M EU M EU 72 100 7 M A/A F EU 100 100 8 F EU F A/A 38 100 9 M EU M EU 80 60 75.55 73.66 Overall Accuracy (%) 57 It is also noted that for these initial experiments there is an insufficient amount of training data. Based on diversity of the participants (e.g. ethnicity, age, gender), a larger dataset would be more helpful to further generalize the classification networks. Despite of fair distribution of overall truthful and deceptive dataset (1.e. approximately 43000 vectors each), the unbalanced dataset in terms of ethnicity and gender might influence the deception classification network performance. For instance, the deceptive dataset consists of 4 Asian/Arabic participants compared to 13 of white EU. Likewise, in truthful scenario, there are 12 Male compared to only 3 female participants. In addition, the data used in this study was raw (apart from removal of redundant duplicated vectors), had not been preprocessed and no tuning of the ANN deception classifiers had taken place. VI. [6] [7] [8] [9] [10] CONCLUSIONS AND FURTHER WORK [11] This paper has described the first stage in development of an automated deception detection system (ADDS) which will be developed further to be utilized within the iBorderCtrl (Intelligent Portable Control System). An experiment was designed and conducted using a number of truthful and deceptive scenarios to test the hypothesis that a machine based interviewing technique could be used to detect deception from non-verbal behaviour during an interview conducted by a static avatar. The dataset collected for this experiment contained image vectors from 30 participants and contained diversity in terms of gender and ethnicity. Raw experimental participant data was used to train artificial neural network deception classifiers using two train-test strategies. The un-optimized networks gave (as expected) high results when utilizing a cross validation train-test strategy, whilst obtaining an average classification of 75% on both truthful and deceptive interviews when using a leave a pair out strategy. It was noted that given the diversity of the dataset, it might not have been large enough to train a classifier more effectively. Future work will involve capturing more data for diverse population representation and optimization of the neural network classifiers [12] [13] [14] [15] [16] [17] [18] [19] [20] ACKNOWLEDGEMENTS This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 700626. The authors would like to thank the iBorderCtrl consortium members for their feedback in developing ADDs in this project. [21] [22] [23] REFERENCES [1] [2] [3] [4] [5] European Parliament. (2016). Smart Borders: EU Entry/Exit System. Brussels: European Parliament. iBorderCtrl Intelligent Portable Control System [online], Available at http://www.iborderctrl.eu/ [Accessed 12/1/2018], Crockett, KA and O'shea, J and Szekely, Z and Malamou, A and Boultadakis, G and Zoltan, S (2017) Do Europe's borders need multifaceted biometric protection. Biometric Technology Today, 2017 (7). pp. 5-8. ISSN 0969-4765 Hall, J. A. (2007) ‘Nonverbal cues and communication.’ In Baumeister, R. F. and Vohs,K. D. (eds.) Encyclopedia of Social Psychology, California: SAGE Publications Inc., pp. 626-627. Holmes, M. Latham, A. Crockett, K, O’Shea, J. Near real-time comprehension classification with artificial neural networks: decoding e-Learner non-verbal behaviour, IEEE Transactions on Learning View publication stats [24] Technologies, Year: 2017, Volume: PP, Issue: 99, DOI: 10.1109/TLT.2017.2754497. Rothwell, J., Bandar, Z., O'Shea, J. and McLean, D., 2006. Silent talker: a new computer‐based system for the analysis of facial cues to deception. Applied cognitive psychology, 20(6), 757-777. Silent Talker Ltd [online], Available at: https://www.silenttalker.com/ [Accessed 5 Jan. 2018] International League of Polygraph Examiners (2016), Polygraph/Lie Detector FAQs. [online]. Available at: http://www.theilpe.com/faq_eng.html. [Accessed 16/01/2018]. Mangan, D.J., Armitage, T.E. and Adams, G.C., (2008). A field study on the validity of the Quadri-Track Zone Comparison Technique. Physiology & behavior, 95(1), 17-23. Honts, C.R. and Reavy, R., (2015). The comparison question polygraph test: A contrast of methods and scoring. Physiology & behavior, 143, 15-26. Saxe, L., Dougherty, D. and Cross, T., (1985). The validity of polygraph testing: Scientific analysis and public controversy. American Psychologist, 40(3), 355. Meijer, E.H., Verschuere, B., Gamer, M., Merckelbach, H. and Ben‐ Shakhar, G., (2016). Deception detection with behavioral, autonomic, and neural measures: Conceptual and methodological considerations that warrant modesty. Psychophysiology. Walczyk JJ, Igou FP, Dixon AP, Tcholakian T. Advancing lie detection by inducing cognitive load on liars: a review of relevant theories and techniques guided by lessons from polygraph-based approaches, Frontiers in Psychology, 4, 01 February 2013, [online] Available at: http://dx.doi.org/10.3389/fpsyg.2013.00014 [Accessed 16 Jan. 2018]. Eriksson, A. and Lacerda, F., (2007). Charlatanry in forensic speech science: A problem to be taken seriously. International Journal of Speech, Language and the Law, 14(2),169-193. Herms, R., (2016). Prediction of Deception and Sincerity from Speech using Automatic Phone Recognition-based Features. Interspeech 2016, pp.2036-2040. Ekman, P., (2016). Paul Ekman International Plc. [online] Available at: http://www.ekmaninternational.com/ [Accessed 18 December 2016]. Cassell, J., 2001. Embodied conversational agents: representation and intelligence in user interfaces. AI magazine, 22(4), p.67. Nunamaker, J.F., DErrICk, D.C., Elkins, A.C., Burgoon, J.K. and Patton, M.W., 2011. Embodied conversational agent-based kiosk for automated interviewing. Journal of Management Information Systems, 28(1), pp.17-48. Hooi, R. and Cho, H., 2013. Deception in avatar-mediated virtual environment. Computers in Human Behavior, 29(1), pp.276-284. Ströfer, S., Ufkes, E.G., Bruijnes, M., Giebels, E. and Noordzij, M.L., 2016. Interviewing suspects with avatars: Avatars are more effective when perceived as human. Frontiers in psychology, 7 de Borst, A.W. and de Gelder, B., 2015. Is it the real deal? Perception of virtual characters versus humans: an affective cognitive neuroscience perspective. Frontiers in psychology, 6, p.576. Derrick, D.C., Read, A., Nguyen, C., Callens, A. and De Vreede, G.J., 2013, January. Automated group facilitation for gathering wide audience end-user requirements. In System Sciences (HICSS), 2013 46th Hawaii International Conference on (pp. 195-204). IEEE. Pollina, D.A., Horvath, F., Denver, J.W., Dollins, A.B. and Brown, T.E., 2009. Development of technologies and test formats for credibility assessment. Polygraph, p.99. Stremble Ventures LTD, [online] Available at: http://stremble.com/ [Accessed 5 Jan. 2018]