TOP SECRET STRAP1
Dickie reported: the next phase will see User Confidence Testing (Data Quality
Testing) performed as part of the next Stepping Stone. This is due to start on 9
October. The Content PUT has now been formed and will carry out the User
Confidence Testing.
Ann Bell suggested at Dickie's LISTEN 08 briefing that linguists record occurrences
of Pick Your Own Number (PYON) in B3M using the 'techref' method in the
proc_cmnt_text field. So, if you have e.g. two speakers with UK-code tels but one
speaker reveals he is based in another country, this may be a case of PYON and you
could enter 'techref PYON' in B3M. Dickie and Amy Stidston agreed to the
suggestion.
4.

B14 / MCS update (courtesy of Jerry Newsome)

a)

Voice

Diariser: over the summer we have successfully deployed version 5.0 of the Diariser.
The main difference between this and version 4.9 is the complete re-write of the
modem arbiter, which now ensures more effective identification of various V series
modems to assist OPC~CAP in de-modulating them.
Speaker ID: since the last VFUG in July we have deployed:
- Two systems for OPI~MENA: one against an Iraqi political target and one against a
Saudi leadership target.
- One system for OPI~RCIT targeting a Gazprom official.
- A 5 way multi-speaker system for OPI~LANG/OPI~SC against Afghan counter
narcotics targets
We have now cleared our order book for speaker id systems and so look forward to a
well deserved, quiet, peaceful and reflective Autumn. Seriously though, if you believe
we can help you with identifying your target of interest amongst the deluge of traffic
that you have to wade through, feel free to approach us and we will happily discuss
your requirements and hopefully offer a swift and accurate solution.
Language ID:
No new deployments. I have however, been concentrating on improving currently
deployed models using non-sigint language data accrued from our Speech to Text
(STT) corpora. Experimentation has shown that it is possible to improve accuracy to
a certain extent by "brute-forcing" the language id algorithm with vast volumes of
data, although this is by no means a substitute for actual "linguist-truthed" sigint
language data accrued from the target set that the system will eventually run against.
It is also quite problematic to service a requirement where we are seeking to identify
one target language from a number of others, some of which are unidentified.
Inevitably this will lead to false positive identifications of the target language as the
algorithm will assume that because a language is not included in the background
! of ! 7
3
This information is exempt from disclosure under the Freedom of Information Act 2000 and may be subject to exemption under
other UK information legislation. Refer disclosure requests to GCHQ on 01242 221491 x30306 (non-sec) or email infoleg@gchq

TOP SECRET STRAP1