Tech Strings in ocuments (aka Tech Extractor) April 2010 TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER1-.- . i Overview and History of Tech Strings in Documents Why is it important? II Limitations of capability advance to fingerprints II Examples and live demo TOP TD USA, AUS, CAN, GER, NZL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GERTw? '51. .4.- I- 5" Content-based Selection 0 How do you find DNI data if you don?t have a strong selector like IP or E?rnail address? 0 What if you only know keywords, part names, phrases etc. expected to be used by your target? TOP TD USA, AUS, CAN, GER, NZL TOP SECRETHCUMINTHREL To USA, aus, CAN, GER, NZL . atThe "Tech Extractor? is a way of finding valuable intelligence based on keywords in the content of DNI sessions but it is a departure from traditional "soft selection" which tends to bring back a lot of junk. . . .5 . . Tech. Extra Ctr? TOP TO USA, AUS, CAN, GER, NZL To 5 EC RETHCUM INTHRE To USA, AU 3, CAN, 6 BR, NZL J44 is soft selection? ii Soft selection, aka content based selection, is an approach at targeting traffic by looking for keywords or phrases rather than specific E?mail accounts Content based selection has suffered because of the poor design of content based selection engines TOP TO USA, AUS, CAN, GER, NZL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL Communication vs. DNI Cotent 1' Selection engines in use today were based on designs built to handle TELEX traffic TELEX is a highly Formatted content rich type of traffic that does not resemble raw DNI seen with Internet traf?c Raw Internet traffic contains HTML, web-pages, raw base-64 encoded documents etc. When think of DNI ?content? they are more referring to ?communication content? then raw DNI content. Current DNI selection does not allow you to restrict hits to the ?type? of traffic you want eg. Emails (including Webmail) or Documents TOP TO USA, AUS, CAN. GER, NZL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL munication vs. DNI Contt If an analyst tasks a Boolean equation ?bomb? and ?chemical? they likely want to see all communication that mentions ?bomb? and ?chemical? and not all web pages, news stories, blog posts etc. where those two words appear i What we need is a context?aware scanning engine that knows where it is inside of the raw DNI in order to properly apply analyst tasking TOP TO USA, AUS, CAN, GER, NZL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL 5/ oft Selection vs. Surgical Selection .. Existing selection techniques are blunt instruments XKEYSCORE contextual dictionaries provide an extremely sharp knife to make accurate selection decisions ?That?s not a a knife!? TOP TO USA, AUS, CAN, GER, NZL TOP SECRETHCOMINTHREL To USA, aus, can, GER, sz I . HT- is the Tech Extractor 0 The Tech Extractor was ?rst stab at context?aware scanning and it only focuses on three contexts: It E?mail Bodies I Chat Bodies Document Bodies: Microsoft Word, Excel, PowerPoint, Project, Visio Adobe PDF, - Rich Text Format (RTF) TOP TO USA, AUS, CAN, GER, NZL mo TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL es the Tech Extractor work? 0 The Tech Extractor works by scanning a list of keywords (or regular expressions) against those three contexts and then tags the results. 0 This is not ?Filtering and selection? and we?re not forwarding any data home XKS is simply tagging sessions with meta?data, much like we do with appids+?ngerprints TOP TO USA, AUS, CAN, GER, NZL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL moes the Tech Extractor work? 0 After the meta?data tag is applied, can then use that meta?data tag as part of a compliant query for traffic 0 It?s important to note, just like AppIDs+Fingerprints, Tech Extractor tags aren?t necessarily compliant by themselves. 0 You may need to add a valid foreign IP address, MAC address or country code before you query! TOP TO USA, AUS, CAN, GER, NZL Euhjent: E3, Ta: En: [lint-E: TLIE GMT EDDIE HTML Plain Text emaiLt Madel: Eng-Cu:- Fm WIDE-56024 Ri?ng Syrup-tam: .41:ij Camments: no fault fr:qu phczane is pmperljr kindly; can?rm the fault in detail when and in which cmnditican it creates pmblem related menticjn GEM HE: air En il'IEEf TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GERdoes the Tech Extractormw? Also this is not retrospective. II After a list is tasked, XKS will scan data collected from that point on looking hits. Any data previously collected and stored by XKS will not be scanned. TOP TO USA, AUS, CAN, GER, NZL here does XKS get its list of terms? TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL 0 provide the XKS team with lists of terms, called ?Tech Dictionaries? which can contain multiple category names (aka ?Tech Names?) 1' Only after the XKS team is supplied with those terms can the system begin scanning and tagging. GUI to allow entry of tech terms almost complete TOP TO USA, AUS, CAN, GER, NZL T0 5 EC RETHCOM INTHRE TU USA, AU 3, CAN, BR, NZL .- ?xtractor Tasking Rules Currently, all terms need to be classi?ed REL FVEY II Terms are case insensitive by default, but can be forced to be case sensitive Terms can hit as a substrings by default ex: ?ricin? will hit in ?pricing? However, terms can be forced to hit as a unique word (either by tasking them with a space at the beginning and end or by using a regular expression) TOP TO USA, AUS, CAN, GER, NZL La nguage Su ppo rt -Supports full foreign language tagging and querying -Ex look for common Arabic expressions in E-mails coming from the Pakistan tr . -.. EUIS Unplug Lwe Maul E?nza Medium nsH?nn may not know this senderMa?: as safE ManL E13 unsafe Sent: Thu 12:07? PM Anna 113:1?: TDP SEGHETHCUMIHTHRELTD CAN, GER, TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL ech Extractor Limitations 0 While terms tasked for the Tech Extractor are applied only to Document, E?mail and Chat bodies, that is still a lot of traffic! If the term is too generic (or short) you?re still likely to run into a lot of false hits. 0 Also, while you can limit your results by adding more search criteria (country code, IP address etc), the term will be scanning all data looking for hits TOP TO USA, AUS, CAN, GER, NZL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL ech Extractor vs. Fingerprints 0 Tech Extractor treats E?mail, Chat and Document bodies as a single ?context? 4' The XKS Fingerprint language gives you over 65+ contexts that can be used together to form powerful and specific signatures When terms are generic and are returning too many poor results through Tech Extractor, then it?s time to make the switch to the full fingerprint language TOP TO USA, AUS, CAN, GER, NZL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL hy use the Tech Extractor at all? 0 One of the most powerful feature of the Tech Extractor is that it shows you exactly which term hit in the meta? data results: Erem Type Tech Dictienarg Tech Name I Tech Value at Tech Fileneme [lecumem_hadg claeeic gem HLFI Ee?ecumeme and Se?ingeHSE Daily Break anne decument ham: claeeic gem ICCIEI Ea?ecumente am2 Statue Te email_hadg claeeic gem IMEI [lecumem_hadg claeeic gem IMSI email_hadg claeeic gem MSISDH TOP TO USA, AUS, CAN, GER, NZL '33 TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL - 1 Why not Fingerprints? 4' With fingerprints, you only see that the full equation (which can be very complex) was satisfied and you won?t see which speci?c terms from the equation hit. TOP TO USA, AUS, CAN, GER, NZL TOP TU USz?s, ?310.33, GR, NZL Live Demo TOP TO ?10.33, GER, MEL TOP SECRETHCUMINTHREL TU USA, AUS, CAN, GER, NZL More information: 0 On GCI-IQ wiki: TOP TO USA, AUS, CAN, GER, NZL T0 5 EC RETHCOM INTHRE TU USA, AU 3, CAN, BR, NZL .- .a To submit tasking 0 Please use the Excel Spreadsheet template developed by GCHQ CP 4' And then E?mail- 0 In the near future will be able to enter the terms themselves through a web?based GUI with the list TOP TO USA, AUS, CAN, GER, NZL