IBM SVS 4.0 Research and Development Status
Update 6 for NYPD
Oct 16, 2012
IBM Confidential

© 2012 IBM Corporation.
All Rights Reserved.

 IBM SVS 4.0 Analytics Evaluation for NYPD/LMSI

 NYPD/LMSI/MTA Camera and Profile Mapping
 Evaluation Process
 Ground Truth Application and Framework
 Accuracy Evaluation

Topic

Meeting Date

Plan: Abandoned Object Detection

8/23

Plan: Near-Field People Search

8/28

Plan: Forensic Object Search

9/18

(Detection/Tracking/Color/Real World
Metric/Speed/Size/Object Classes/Duration/Histogram)

2

Review the Presentation Modification for
Forensic Object Search Plan

10/11

Start reporting results to NYPD

Mid Nov (After Internal QA) (Tentative)
© 2012 IBM Corporation

 NYPD Camera Taxonomy [2633 LMSI 265 

    

 

 

Indoor Trar: 

 

2012 IBM Corporation

Analytics Accuracy Evaluation Process

Ground Truth
Application

Annotate

Video
Files/
Cameras

Ground
Truth

Report

Evaluate

Meta Data Writer

SVS Analytics

Analytics
Results

NYPD Input
1.
2.
3.
4.
4

Selection of Video Files and Cameras
Ground Truth
Performance Metrics
Report Formats
© 2012 IBM Corporation

 IBM SVS Ground Truth Application and Framework
• Provide a framework for defining ground truth schema for each evaluation
profile/task
• Provide an application for annotating evaluation videos with ground truth
• Provide a programming library for reading/writing ground truth data and analytics
results for evaluation automation

5

© 2012 IBM Corporation

    

SVS Abandoned Object Evaluation

2012 IBM Corporation.
All Rights Reserved.

 

Data Set – Staged and Investigated Drops
NYPD Data: 557 drops, 63.24 hours (To be revised)

NYPD 1
9/1/09 -> 8/3/10
Staged Drops: 291
Video Count: 24
Duration: 38.23 hrs

TBD

NYPD 4
Ongoing
Staged Drops: TBD
Video Count: TBD
Duration: TBD

NYPD 2
8/26/10 -> 10/19/10
Staged Drops: 142
Video Count: 12
Duration: 11.47 hrs

NYPD 3
1/10/12 -> 4/3/12
Staged Drops: 124
Video Count: 29
Duration: 13.54 hrs

TBD

NYPD 5
TBD
Investigated Drops: TBD(Slide 22)
Video Count: TBD
Duration: TBD

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Data Set – Challenging Scenes
Lighting Changes

Camera Movement

“Sun Spots”

Challenging Weather Patterns

Stationary/Seated Pedestrians

…

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Annotation Details
Offset: 02:31

Offset: 04:41

1st box: Bag dropped
– wait until actor has
achieved complete
separation from bag

2nd box: End of
abandonment period
– wait until just
before actor picks up
or moves bag

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Performance Test Summary
 4.0 Detection Rate/False-Positive Analysis
 Guide 4.0 tuning efforts by analyzing its ability to detect abandoned objects, while
also considering false-positives, in a variety of configurations.

Data Set: Staged and Investigated Drops

 4.0 Adversarial Condition Analysis on Challenging Scenes
 Guide 4.0 tuning efforts by analyzing its ability to cope with particular categories of
false-positives in a variety of configurations.
Data Set: Challenging Scenes

 Detection Rate Comparison
 Compare 3.6.7 and 4.0 in terms of detection rate in their deployment configurations.
Data Set: Staged and Investigated Drops

 Adversarial Condition Comparison
 Compare 3.6.7 and 4.0 in terms of false positive count in their deployment
configurations.
Data Set: Challenging Scenes

 False-Positive Rate Comparison
 Compare 3.6.7 and 4.0 in terms of false-positive rate in their deployment
configurations.
Data Set: Live cameras
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 4.0 Detection Rate/False-Positive Analysis
Release 4.0
Data Set

Video1

…

VideoN

GT1

Config1

SSE1

Alerts1

…

…

…

ConfigN

SSEN

AlertsN

…

…

…

Config1

SSE1

Alerts1

…

…

…

ConfigN

SSEN

AlertsN

IBM Smart Vision Suite – In-depth Briefing

Evaluation

Aggregation

Evaluation

GTN

© 2012 IBM Corporation.
All Rights Reserved.

 4.0 Detection Rate/False-Positive Analysis
Note: Synthetic Results

“Detection
Rate”
“Signal to
Noise Ratio”

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 4.0 Adversarial Condition Analysis
Release 4.0

Video1

…

VideoN

Config1

SSE1

Results1

…

…

…

ConfigN

SSEN

ResultsN

…

…

…

Config1

SSE1

Results1

…

…

…

ConfigN

SSEN

ResultsN

IBM Smart Vision Suite – In-depth Briefing

Aggregation

© 2012 IBM Corporation.
All Rights Reserved.

 4.0 Adversarial Condition Analysis
Note: Synthetic Results

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Detection Rate Comparison (3.6.7 vs. 4.0)
GT1
Data Set

4.0 Deployment Configuration
3.6.7 Deployment Configuration
ConfigV1
ConfigV1

SSE1
SSE1

…
…

…
…

AlertsV1
AlertsV1

Evaluation

…
…

Aggregation

AlertsVN
AlertsVN

Evaluation

Video1

…

ConfigVN
ConfigVN

SSE1
SSE1

GTN

VideoN

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Detection Rate Comparison (3.6.7 vs. 4.0)
Note: Synthetic Results

“Detection
Rate”
“Signal to
Noise Ratio”

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Adversarial Condition Comparison (3.6.7 vs. 4.0)
Data Set

4.0 Deployment Configuration
3.6.7 Deployment Configuration
ConfigV1
ConfigV1

SSE1
SSE1

…
…

…
…

AlertsV1
AlertsV1

Video1

…
…

Aggregation

…

ConfigVN
ConfigVN

SSE1
SSE1

AlertsVN
AlertsVN

VideoN

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Adversarial Condition Comparison (3.6.7 vs. 4.0)
Note: Synthetic Results

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 False-Positive Rate Comparison (3.6.7 vs. 4.0)
3.6.7
Randomly Sampled
Subsets of
Cameras OR all
cameras if enough
computing
resources

Current
Production
Cameras
(CPC) – 29
Cameras

Tuning Set
Cameras
(TSC) – 14
cameras

Composite
Camera Set

SSE1
SSE2
Alerts
…

C1
C2

C1

SSEN

…
CG

FP
Categorization

C2

4.0

C1
C2
…

…

CY

SSE1

C1

Rejected
Cameras (RC)
- 4 cameras

C2
…

CN

CR

CG + CY + CR = CN
IBM Smart Vision Suite – In-depth Briefing

SSE2
Alerts
…
SSEN
© 2012 IBM Corporation.
All Rights Reserved.

 Discussion and Revision Logs

Date

Description

8/23/2012

First Review with NYPD: Dr Evan Levine and Sgt. Nelson Pimentel

8/24/2012

Revised the doc to reflect the meeting input from NYPD and submitted it for
NYPD review and further feedback

8/28/2012

Reviewed the changes with NYPD team

8/29/2012

Received Feedback from NYPD and responded to their questions. No
changes to slides was requested.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Appendix

21

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Investigated Drops
 The drops from operational cameras that Counter Terrorism would
consider suspicious.

 Make sure SVS 4.0 can continue to detect these cases.

TBD

TBD

TBD

TBD

TBD

TBD

TBD

TBD

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 General Data Collection Criteria
 Video data captured from the camera must be extracted to configure,
tune and evaluate the use-case.
 Video frame rate should be at least 10 frames per second.
 Video must be in a format that can be decoded by a DirectShow filter.
 Staged events should not occur during the first 20 seconds of the
video.
 Video quality should be as good as possible. The camera, network
and/or video encoding may need to be adjusted to improve quality.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Best Practices of Staging for Abandoned Object Detection
 Steps should be taken to avoid the introduction of bias as much as
possible as to the selection of camera and time of day (e.g., only
staging during the morning hours, when pedestrian traffic is light or
selecting a camera at a much higher frequency than others).
 When staging multiple events in succession, do not place an object in
the same or similar location as the previous object.
 Staging should be as realistic as possible. Actors should avoid
behavior that would not be present under normal circumstances.
 With the exception of the actor, personnel involved in staging should
remain outside the camera field of view for the duration of the test, if
possible.
 After dropping an object, the actor should exit the camera field of view
immediately, if possible.
 Avoid staging “impossible” scenarios (e.g., placing a bag behind a
trash can so that it is invisible to the camera).
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

    

Near-Field People Search Evaluation

2012 IBM Corporation.
All Rights Reserved.

 

Near Field People Search Profile
 Provide single attribute and combined attribute search capability on
people features: Baldness, Eyeglasses, Sunglasses, Head Color, Skin
Tone, and texture and tri-color combo search on torso area with 13-color
Palette
 Provide additional user interface to rank ordering search results based
on attribute(s) confidence
 Provide user interface to display confidence values of attributes of a
search result
 Provide complementary capability integration with Facial Recognition
Engine for real-time identity alerting
 Provide a People Search Framework to add new person attribute
 Provide Color Calibration Tool to manually calibration color using
artificial object colors and existing scene objects or automatically
correcting colors

26

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Suitable Camera Characteristics
~40 pixels
ear to ear
Downward angle too high

•
•
•

•
•

No more than 15 degrees downward angle.
Even and uniform lighting from above or slightly in
front of subject which does not result in dark
shadows
At least 30 pixels ear-to-ear when no Facial
Recognition deployed
At least 32 pixels pupil-to-pupil when Facial
Recognition (by Cognitec) deployed
Face is large as subject passes through turnstile

Too dark

Too far away – face is small when subjects
are passing through turnstile
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 4.0 People Search Requirements

Near Frontal: At least one
frame with both eyes and
mouth clearly visible

Not Frontal: No frames with both
eyes and mouth clearly visible

Required for People Search

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Data Set Collection Process

Not a Turnstile
Camera

MTA Cameras (807)

Unsuitable Turnstile
Suitable Turnstile
Camera
Camera
2-Minute Videos of
Randomly Sampled
Time Periods
V1

Turnstile Cameras
(265)
Cameras
Suitable
for NFPS
(66)

Randomly Sampled
Subset of Cameras

V2
…
VN

C1
C2

…

…
CN
V1
V2
…
VN

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Data Set 1 (Whole Camera Network)
Data Set 1a - Standard

Data Set 1b – Challenging

All 66 cameras classified as “suitable”
1 Video from each camera
Total 2hr12 minutes footage
Video uniformly sampled across
24-hour time period Mon-Sun
 Sample Period 7/11/12 -> 8/7/12











All 66 cameras classified as “suitable”
1 Video from each camera
Total 2hr12 minutes footage
Video uniformly sampled across
rush hour periods Mon-Fri


7am-8am



11:30am-1pm



4pm-6pm

 Sample Period 7/30/12 -> 8/24/12

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Camera Distribution of the Suitable Camera Dataset

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Data Set 2 (Single Cameras)

Data Set 2a
A Good Suitable Camera

Data Set 2b
A Marginal Unsuitable Camera

Both data sets:
• 66 videos/2 minutes each
• Total 2hr 12 minutes footage
• Sample Period 7/11/12 -> 8/7/12

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Annotation Details
 Each person whose face is visible for at least one frame is annotated.
 The following event attributes are included (defaults are in bold):
1.

Eye Region: Sunglasses = {Yes,No,Unknown}, Eyeglasses = {Yes,No,Unknown}

2.

Head Region: Bald = {Yes,No,Unknown}, Head Color = {red, green, blue, yellow, cyan,
magenta, brown, beige, orange, black, white, light gray, dark gray, unknown}, Hat =
{Yes,No,Unknown}

3.

Mouth Region: Beard = {Yes/No/Unknown}, Moustache = {Yes/No/Unknown}

4.

Skin Tone = {Dark,Medium,Light,Unknown}

5.

Torso Pattern = {Plaid, Stripes, Solid, Patterned Other, Unknown}

6.

Torso Color = {red, green, blue, yellow, cyan, magenta, brown, beige, orange, black, white,
light gray, dark gray, unknown}

7.

Large Amount of Skin in Torso = {Yes/No}

8.

Gender = {Male,Female,Unknown}

9.

Age = {Child (Under 13), Adolescent (13-18), Young Adult (18-25), Adult (26-35), Adult (3645), Adult (46-60), Senior (61+), Unknown}

10. Path: Passes through turnstile while approaching camera = {Yes/No}

The following frame attributes are included:
1.

Location of face, eyes, mouth and torso (as bounding boxes)

2.

Torso Visibility = {Visible, Off-Camera, Occluded}

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Annotation Details (in Ground Truth Application and
Schema)

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Handling Pose Variations

Face, both eyes, and mouth
(frontal)

Face and both eyes, no mouth
(frontal, looking down)

Face and single eye (profile,
looking down)

Face Only (turned away or
features not distinguishable)

IBM Smart Vision Suite – In-depth Briefing

Face, single eye and mouth
(profile)

© 2012 IBM Corporation.
All Rights Reserved.

 Special Note
Annotate all People Whose Faces are Visible for at least one
frame (it isn’t necessary that they move towards the camera
or pass through the turnstile)

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Torso Annotation

Torso not in camera view

Torso Visible
Align top of box
with top-most
shoulder point
Do NOT draw box. Instead, set Local
Attribute “Torso Visibility” to “Off-Camera”

Torso Occluded

Align bottom of
box with waistline

Default – no need to set
IBM Smart Vision Suite – In-depth Briefing

Do NOT draw box. Instead, set Local Attribute
“Torso Visibility” to “Occluded”
© 2012 IBM Corporation.
All Rights Reserved.

 Torso Color
•

Specify colors in the torso region.

•

Specify up to 3 colors (if there are more than 3 colors, choose the 3 most prominent colors).

•

If there is only a very small amount of a given color, do not include it.

•

Include only clothing, apparel and items being carried. Do NOT include any color from skin
tone.

White
Light Gray

Blue
Magenta
White

Red
Black
Blue

IBM Smart Vision Suite – In-depth Briefing

Black
Yellow
White

White
Yellow
Red

White
Blue
© 2012 IBM Corporation.
All Rights Reserved.

 Torso Color – Accounting for Skin

 If a large amount of the person’s skin is visible in the torso region,
set the attribute “Large Amount of Skin in Torso” to “Yes”

YES

YES

YES

IBM Smart Vision Suite – In-depth Briefing

NO

NO

NO

© 2012 IBM Corporation.
All Rights Reserved.

 Torso Pattern

Solid

Solid

Torso=Solid

IBM Smart Vision Suite – In-depth Briefing

Stripes

Plaid

Patterned Other

Torso=Patterned

© 2012 IBM Corporation.
All Rights Reserved.

 Path: Passes Through Turnstile While Approaching
Camera

If the person is approaching the
camera AND passes through the
turnstile (as shown above), leave
the attribute value set to “Yes”.
If the person does not pass
through the turnstile OR is not
approaching the camera, set the
value to “No”
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Performance Test Summary
 4.0 Face Capture Analysis
 Guide 4.0 tuning efforts by analyzing its ability to detect faces, while also considering
false-positives, in a variety of configurations.
Data Set: Dataset 1a and 1b and Dataset 2a and 2b

 Search Efficiency Comparison
 Compare Search Time between chronological ordering based and ranked ordered
based
Data Set: Dataset 1a and 1b and Dataset 2a and 2b

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 4.0 Face Capture Analysis For People Search – Track
Alignment
𝑇𝑃

Labeled Object in Ground-Truth
System Generated Face Track
Key Frame

𝑇𝑃
Example Face Track (key frame
indicated by yellow box)

𝐹𝑁
𝐹𝑁
𝐹𝑃

Time
Time
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 4.0 Face Capture Analysis for People Search
Note: Synthetic Results
Different points produced by
varying Face Capture
sensitivity parameter
𝑅𝑒𝑐𝑎𝑙𝑙 =

𝑇𝑃
𝑇𝑃 + 𝐹𝑁

𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Average Chronological Thumbnail Search Time

𝐶𝑤 =

𝜏𝑤 𝑋
2

Person of
Interest

…

…

Chronologically ordered set of TP (size 𝑋 )

…

…

Chronologically ordered synthesized set of TP (size 2 𝑋 )

…

…

Chronologically ordered synthesized set of TP (size 3 𝑋 )
w = window size multiplier
𝜏 = 𝑡𝑖𝑚𝑒 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑡𝑜 𝑠𝑐𝑎𝑛 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑡𝑕𝑢𝑚𝑏𝑛𝑎𝑖𝑙 (𝑒. 𝑔. , 0.5 𝑠𝑒𝑐𝑜𝑛𝑑𝑠)
X = Set of people that were successfully detected and tracked by the system TP
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Average Ranked Search Time
𝑤
𝑅𝑤 =
𝑋

𝜏
𝑖∈𝑋

𝟐𝒏−𝟏

𝟐𝒏−𝟏

𝑋𝑖𝐴
𝑗=1

𝑗

𝑿=

n = number of possible
search filters for person 𝑿𝒊

Set of True-Positives
Search Filters

Rank Position

ST=MED

10

𝑋1𝐴1

…
EG=YES

2

𝑋1𝐴2

…
𝑋1𝐴

𝑁−1

𝑋1𝐴

𝑁

…

…
13

ST=MED, EG=YES,
HC=YEL, TP=SOLID,
TC=WHT
ST=MED, EG=YES,
HC=YEL, TP=SOLID,
TC=WHT, HAIR=YES
IBM Smart Vision Suite – In-depth Briefing

…
3

© 2012 IBM Corporation.
All Rights Reserved.

 Comparison

Note: Synthetic Results

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Evaluation Process Illustration

Results

GT

Alignment

Recall,
Precision

Aligned (TP)

X

Comparative
Search

Comparison
results

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Discussion and Revision Logs

Dates

Description

8/28/2012

First Review with NYPD: Dr Evan Levine, Dir Rich Schroeder, and
Sgt. Nelson Pimentel

8/29/2012 Revised the deck based on the input from NYPD and
waiting for NYPD written feedback
9/18/2012 1. Revised the doc based on feedback (Slide 27 and 31)
from NYPD.
2. Reviewed and finalized the changes with NYPD.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Forensic Object Search Evaluation
𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑎𝑟

𝑆𝑝𝑒𝑒𝑑 =

100𝑝𝑥

Evaluation Profiles:

𝑠𝑒𝑐

(42 𝑀𝑃𝐻)

• Mid-Field People Search

𝑆𝑖𝑧𝑒 = 971𝑝𝑥 2 (79"x50"x153")

• Mid-Field Vehicle Search

𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛 = 3.1 𝑠𝑒𝑐

• Detection-Enhanced Tracking

𝐶𝑜𝑙𝑜𝑟 = 𝑌𝑒𝑙𝑙𝑜𝑤

• Outdoor Tracking
(Only Real-World Metrics)

© 2012 IBM Corporation.
All Rights Reserved.

 Evaluation Data Sets

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Camera Grouping
(Whole NYPD Network)

All Cameras
Outdoor Cameras Suitable for Object
Tracking Evaluation

Indoor Cameras Suitable for
Object Tracking Evaluation

People Only

“A” Quality
(Good)

Suitable for Midfield People Search
Suitable for Torso and Legs Color Classification

Vehicles
Only

Vehicles and People

Suitable for
Midfield
Vehicle Search

Suitable for Outdoor Detection
Enhanced Tracking

IBM Smart Vision Suite – In-depth Briefing

“B” Quality
(Challenging)

© 2012 IBM Corporation.
All Rights Reserved.

 Camera Group Breakdown

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Randomly Insert Annotation
Markers in Each Video

Data Sampling Process
Randomly Sampled
Subsets of
Cameras from each
group

All Suitable Camera
Groups

C1
…

CN

Randomly Sampled
Videos

M1
…
MN
…

V1

M1

…

…

VN

MN

…

M1

V1
…
VN

…
MN
…
M1

G1

…

…

…

MN
M1

G10

…
MN

C1

V1

…

…

CN

VN

…

…

MN

V1

M1

…
VN

…
M1

…
MN
…
M1
…
MN

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Data Sets from Suitable for Evaluation Camera Groups
Each suitable camera group(10) is used to generate two types of data sets

Data Set 1: 24 x 7 Samples
 Randomly sample 20% of each group (total 166 cameras)
 Collect 1 video sample per camera: the video is uniformly sampled across 24hour time period Mon-Sun
 Generate annotation markers for each sample video

Data Set 2: Rush Hour Samples
 Use the same 166 cameras sampled for Data Set 1
 Collect 1 video sample per camera: the video is uniformly sampled across rushhour time periods only - Mon-Fri (7am-8am, 11:30am-1pm, 4pm-6pm)
 Generate annotation markers for each sample video

Please Note: Different metrics use different camera groups with both
data set types above

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Annotation Details

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Annotation for All Objects is Time Consuming
High Activity Estimation
~1.5 hours annotation time per minute of
video
≅ 45 hours for 30 minutes of video
≅ 22.5 person days for 30 minutes of video
(assuming person spends 2 hours per day
doing nothing but annotation)

Following this approach means that
only a few videos/few cameras can be
annotated in a reasonable time period.
But general metadata indexing profiles
will be deployed on a wide variety of
cameras
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Single Complete Object Track Annotation
Single Video

Automatically generate “Triangle Markers” on Randomly Selected Video Frames

Annotate complete
path of closest object

Annotate complete
path of closest object

IBM Smart Vision Suite – In-depth Briefing

No objects in scene
set Object Class
attribute to “None”
© 2012 IBM Corporation.
All Rights Reserved.

 Single Complete Object Track Annotation Detail
Annotate the object closest to the triangle from
the point that it enters the visible image until it
leaves the visible image

Intermediate frames
are interpolated at
evaluation time

Object Classes
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Sub-Compact
Compact
Sedan
Station Wagon
Limousine
Small Jeep
Small SUV, Large SUV
Small Pickup Truck, Large Pickup Truck
Minivan, Normal Van
RV
Small School Bus, Large School Bus
Transit Bus, Double Decker Bus
Delivery Van
Motorcycle, Moped/Scooter
Bicycle
Person, Person on Horse, Horse
Carriage, Person on skateboard, Person
on scooter, Person on roller blades
Dog, cat, other animal
Other-Small, Other-Medium, Other-Large

If there are no object in the
scene, set to “None”
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Single Frame Object Annotation
Single Video

Automatically generate “Diamond Markers” on Randomly Selected Video Frames

Draw a bounding box
around all objects in
image

Draw a bounding box
around all objects in
image

IBM Smart Vision Suite – In-depth Briefing

No Objects in scene,
set Objects Present
attribute to “No”
© 2012 IBM Corporation.
All Rights Reserved.

 Single Frame Object Annotation Detail

Draw a bounding box
around all moving objects
in a visible image

If there are no objects in
scene, set to “No”
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Performance Test Summary (1/3)
 Time Window Retrieval Accuracy (Unfiltered Search) Comparison
Purpose: Compare 3.6.7 City Surveillance and a variety of profiles in 4.0 in terms of their
ability to detect and track moving objects
Camera Groups: Outdoor Detection Enhanced Tracking, Midfield Vehicle Search,
Midfield People Search-Outdoor A,B, Midfield People Search-Indoor A,B
Metrics: Recall, Duplicate Rate
4.0 Profiles Tested: Midfield Vehicle Search, Outdoor Detection Enhanced Tracking, Midfield
People Search
 Retrieval Precision (Signal to Noise Ratio) Comparison
Purpose: Compare 3.6.7 City Surveillance and a variety of profiles in 4.0 in terms of how
frequently indexed events describe actual moving objects.
Camera Groups: Outdoor Detection Enhanced Tracking, Midfield Vehicle Search,
Midfield People Search-Outdoor A,B, Midfield People Search-Indoor A,B
Metrics: Precision
4.0 Profiles Tested: Midfield Vehicle Search, Outdoor Detection Enhanced Tracking, Midfield
People Search
 Object Count Accuracy (Histogram) Comparison
Purpose: Compare 3.6.7 City Surveillance and a variety of profiles in 4.0 in terms of how well
each estimates the number of moving objects in the scene.
Camera Groups: Outdoor Detection Enhanced Tracking, Midfield Vehicle Search,
Midfield People Search-Outdoor A,B, Midfield People Search-Indoor A,B
Metrics: Object Count Accuracy
4.0 Profiles Tested: Midfield Vehicle Search, Outdoor Detection Enhanced Tracking, Midfield
People Search
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Performance Test Summary (2/3)
 Size, Speed (Pixels) and Duration Estimation Comparison
Purpose: Compare 3.6.7 City Surveillance and a variety of profiles in 4.0 in terms of
how well the size, speed and duration of moving objects is estimated.
Camera Groups: Outdoor Detection Enhanced Tracking, Midfield Vehicle Search,
Midfield People Search-Outdoor A,B, Midfield People Search-Indoor A,B
Metrics: Size, Speed and Duration Ratios
4.0 Profiles Tested: Midfield Vehicle Search, Outdoor Detection Enhanced Tracking,
Midfield People Search
 Width, Height, Length, Speed (World Metric) Estimation Evaluation
Purpose: Estimate how well the size and speed of moving objects are estimated in
terms of real world coordinates (feet, MPH) in 4.0.
Camera Groups: All
Metrics: Width, Height, Length and Speed Ratios
4.0 Profiles Tested: Midfield Vehicle Search, Outdoor Detection Enhanced Tracking,
Outdoor Tracking

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Performance Test Summary (3/3)
 General Object Color and Type Classification Comparison
Purpose: Compare 3.6.7 City Surveillance and a variety of profiles in 4.0 in terms of how
accurately the color and type of moving objects is assigned.

Camera Groups: Outdoor Detection Enhanced Tracking, Midfield Vehicle Search
Metrics: Color and Object type Accuracy
4.0 Profiles Tested: Outdoor Detection Enhanced Tracking, Midfield Vehicle Search
(Object Color only)
 Midfield People: Color Retrieval Comparison
Purpose: Compare 4.0 rank ordering on different data sets (“A” and “B” quality) and against
chronological ordering.
Camera Groups: Midfield People Search-Outdoor A,B, Midfield People Search-Indoor A,B
Metrics: Average Time to Find Person
4.0 Profiles Tested: Midfield People Search

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Time Window Retrieval Accuracy (Unfiltered Search)
Alignment Sets
GT

TP

FN

KF

System Generated Track
Labeled Object in Ground-Truth

1

1

0

1
Key Frame

2

1

0

1

𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁

3

1

0

3

𝐾𝐹
𝐷𝑢𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑅𝑎𝑡𝑒 =
𝑇𝑃

4

0

1

0

…
𝐺𝑇

0

1

0

IBM Smart Vision Suite – In-depth Briefing

𝑇𝑃 = 𝑆𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑜𝑏𝑗𝑒𝑐𝑡𝑠
𝑎𝑙𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑘𝑒𝑦 𝑓𝑟𝑎𝑚𝑒
𝐹𝑁 = 𝑆𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑜𝑏𝑗𝑒𝑐𝑡𝑠
𝑎𝑙𝑖𝑔𝑛𝑒𝑑 𝑤𝑖𝑡𝑕 𝑧𝑒𝑟𝑜 𝑘𝑒𝑦 𝑓𝑟𝑎𝑚𝑒𝑠
𝐾𝐹 = 𝑆𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑘𝑒𝑦 𝑓𝑟𝑎𝑚𝑒𝑠 𝑎𝑙𝑖𝑔𝑛𝑒𝑑
to a labeled object
© 2012 IBM Corporation.
All Rights Reserved.

 Detail: Time Window Retrieval Accuracy (Unfiltered
Search)
Evaluation Goal: Find out how often objects annotated in triangle frames were successfully detected by the
system as events.
A successful detection is determined by whether or not the object was captured at least once by a track “key
frame.” The capture rate is expressed as the proportion of successful detections to total objects annotated
(recall).
This evaluation measures the effectiveness of an “unfiltered search” on the SVS UI. In an unfiltered search,
all system generated events in a specified time window are in the result set. Note that how well the object is
tracked is not considered here. On the previous slide, an object is tracked well in the first example, but
poorly in the second example. However both are treated equally because a key frame was successfully
aligned with an annotated object in each case. Poor tracking will tend to result in poor color, size estimation,
etc., which are measured separately (see subsequent slides).
The second metric, duplicate rate, measures how often multiple events are generated for a single moving
object. For example, a duplication rate of 3 indicates that, on average, 3 events are indexed for each
detected moving object. While a duplicate rate of 1 is in some sense ideal, a value greater than 1 could be
better from a practical standpoint in that multiple records may help ensure that the user doesn’t miss an
object of interest while scanning through the results. At some point, duplication rate obviously becomes too
large that this benefit is diminished by the need to sift through a lot of uninteresting redundant results.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Time Window Retrieval Accuracy (Unfiltered Search)
Comparison
Note: Synthetic Results

Midfield
People
Search

Outdoor
Detection
Enhanced
Tracking

Midfield
Vehicle
Search

IBM Smart Vision Suite – In-depth Briefing

Midfield
People
Search

Outdoor
Detection
Enhanced
Tracking

Midfield
Vehicle
Search

© 2012 IBM Corporation.
All Rights Reserved.

 Retrieval Precision (Signal to Noise Ratio)
Example System Results

Ground-Truth (GT)
Track Frame (TF)

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =

𝑀
𝑖=1

Track Key Frame (KF)

M = number of

IBM Smart Vision Suite – In-depth Briefing

𝐾𝐹𝑖 𝐺𝑇𝑖
𝑀
𝑖=1 𝐾𝐹𝑖

frames

© 2012 IBM Corporation.
All Rights Reserved.

 Detail: Retrieval Precision (Signal to Noise Ratio)
Evaluation Goal: To quantify how often indexed events describe real moving objects.
On each diamond frame, the set of all key frame rectangles that intersect with ground-truth rectangles is formed and
the size of these intersection sets is summed over all diamond frames to form the numerator. The size of all sets of
key frames, whether they intersect with ground-truth rectangles or not, is summed over all diamond frames to form the
denominator. This result is the proportion of indexed events that are moving objects, as opposed “garbage” results
(e.g., key frame images that contain empty boxes).
We are computing recall (slide 65) and retrieval precision separately using different annotation markers. Technically,
we could compute them both on diamond markers, however there are 2 considerations that led us to decide to
compute TP on triangle frames:
1. It may take a lot of diamond frames to capture a reasonable number of system key frames. Every track has just 1
key frame, so the likelihood that a key frame will be present for a particular track on a diamond frame is small. As
a result, precision may have a relatively large margin of error, depending on how many diamond frames we can
annotate. Due to this issue, we want to limit the metrics that depend on key frames appearing in diamond frames.
2. In triangle frames, the annotator marks both color and object class and annotates the full object trajectory across
space and time. We operate on the set of true-positives (system results aligned with annotated objects in triangle
frames) in the evaluation of size, speed, duration, world metrics, color and object class. This obviously wouldn't be
possible if our pool of TP came from diamond frames instead, which are just single frame annotations without color
designations.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Retrieval Precision Comparison
Note: Synthetic Results
Signal to Noise Ratio

Midfield
People
Search

Outdoor
Detection
Enhanced
Tracking

IBM Smart Vision Suite – In-depth Briefing

Midfield
Vehicle
Search

© 2012 IBM Corporation.
All Rights Reserved.

 Object Count Accuracy Evaluation

𝐶𝑜𝑢𝑛𝑡𝑟𝑎𝑡𝑖𝑜 =

𝑀
𝑖=1
𝑀
𝑖=1

M = number of

𝑇𝐹𝑖
𝐺𝑇𝑖

frames

Ground-Truth (GT)
Track Frame (TF)
Track Key Frame (KF)

Note: Object count is expressed through the event statistics histogram on the
SVS UI. The Count Ratio is the proportion of tracked objects at a point in
time to the actual number of moving objects at the same point in time

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Object Count Accuracy Comparison
Note: Synthetic Results
Object Count Ratio

Midfield
People
Search

Outdoor
Detection
Enhanced
Tracking

IBM Smart Vision Suite – In-depth Briefing

Midfield
Vehicle
Search

© 2012 IBM Corporation.
All Rights Reserved.

 Size, Speed and Duration Estimation Evaluation (Pixel)

𝑆𝑖𝑧𝑒𝑟𝑎𝑡𝑖𝑜 =

𝑆𝑝𝑒𝑒𝑑𝑟𝑎𝑡𝑖𝑜

1
𝑀

1
=
𝑀

𝑀

𝑆𝑖𝑧𝑒𝑅𝑎𝑡𝑖𝑜𝑖

𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛𝑟𝑎𝑡𝑖𝑜 =

𝑖=1
𝑀

𝑆𝑝𝑒𝑒𝑑𝑅𝑎𝑡𝑖𝑜𝑖
𝑖=1

1
𝑀

𝑀

𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛𝑅𝑎𝑡𝑖𝑜𝑖
𝑖=1

Compute a ratio for all
values in cases where there
are multiple corresponding
tracks

𝑀 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑦𝑠𝑡𝑒𝑚𝑠 𝑡𝑟𝑎𝑐𝑘𝑠 𝑡𝑕𝑎𝑡 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑 𝑡𝑜
𝑠𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑎𝑙𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑠𝑦𝑠𝑡𝑒𝑚 𝑘𝑒𝑦 𝑓𝑟𝑎𝑚𝑒

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Detail: Size, Speed and Duration Estimation Evaluation
(Pixel)
Evaluation Goal: Determine how effectively objects can be found using size, speed (in pixel units) and
duration (in seconds) filters.
This evaluation operates on the set of true-positives formed by aligning annotated objects in triangle frames with
system generated events expressed as key frames, which takes place in the Time Window Retrieval Accuracy
evaluation.
A ratio is formed for size, speed and duration, in pixel units, by dividing the values found in the ground-truth by the
corresponding values generated by the system. Each ratio can be thought of as a “quality of estimation,” where a
value of 1 indicates a perfect estimate, a value less than 1 an under-estimate and a value greater than 1 an overestimate.
Sometimes multiple tracks are indexed for a single moving object. In these cases, a ratio is computed for each of
the corresponding tracks.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Size, Speed and Duration Estimation Comparison
Note: Synthetic Results

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Width, Height, Length, Speed (World Metric) Estimation
Evaluation
57”

180
”

68”

𝑊𝑖𝑑𝑡𝑕𝑟𝑎𝑡𝑖𝑜

1
=
𝑀

𝐻𝑒𝑖𝑔𝑕𝑡𝑟𝑎𝑡𝑖𝑜

1
=
𝑀

147”

72”

194”

𝑀

𝑊𝑖𝑑𝑡𝑕𝑅𝑎𝑡𝑖𝑜𝑖
𝑖=1

𝐿𝑒𝑛𝑔𝑡𝑕𝑟𝑎𝑡𝑖𝑜

𝑀

𝐻𝑒𝑖𝑔𝑕𝑡𝑅𝑎𝑡𝑖𝑜𝑖
𝑖=1

𝑆𝑝𝑒𝑒𝑑𝑟𝑎𝑡𝑖𝑜

74”

1
=
𝑀
1
=
𝑀

IBM Smart Vision Suite – In-depth Briefing

448”

102”

𝑀

= 𝐿𝑒𝑛𝑔𝑡𝑕𝑅𝑎𝑡𝑖𝑜𝑖
𝑖=1
𝑀

𝑆𝑝𝑒𝑒𝑑𝑅𝑎𝑡𝑖𝑜𝑖
𝑖=1

𝑀 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑦𝑠𝑡𝑒𝑚𝑠 𝑡𝑟𝑎𝑐𝑘𝑠 𝑡𝑕𝑎𝑡 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑
𝑡𝑜 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑎𝑙𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡
𝑜𝑛𝑒 𝑠𝑦𝑠𝑡𝑒𝑚 𝑘𝑒𝑦 𝑓𝑟𝑎𝑚𝑒
© 2012 IBM Corporation.
All Rights Reserved.

 Detail: Width, Height, Length, and Speed (World Metric)
Estimation Evaluation
Object Type: SUV

72”

194”

74”

Evaluation Goal: Determine how effectively objects can be found using the width, height,
length and speed of moving objects in real-world coordinates (inches/centimeters or miles
per hour/kilometers per hour).
The metrics are essentially identical to the metrics used for size and speed in pixel units. Ratios for width,
height, length and speed are computed for each detected and tracked moving object and the ratios are
separately averaged together to arrive at the final evaluation result. Also, “best” ratios are chosen when
multiple events are indexed for a single detected object.
The main difference between this evaluation and the evaluation in pixel units is how the ground-truth is
used to express the true object dimensions and speed in real-world coordinates.
In the image above, the dimensions come from its object type category of “SUV” in the ground-truth that
was assigned by the annotator. All objects that are assigned the same object type are considered to have
the same dimensions for the purpose of the evaluation. For each object type, the dimensions of a typical
model in that class are used. For example, for the category of “SUV” we might use the dimensions for a
Jeep Grand Cherokee. If the length of the vehicle is known, the ground-truth speed in MPH or KPH can be
easily computed.
IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Width, Height, Length, Speed (World Metric) Estimation
Evaluation
Note: Synthetic Results

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 General Object Color and Type Classification Evaluation
𝑄𝑢𝑎𝑙𝑖𝑡𝑦𝑜𝑏𝑗 =

𝑀

1
𝑀

𝑇𝑖 𝑄𝑢𝑎𝑙𝑖𝑡𝑦𝑐𝑜𝑙𝑜𝑟 =
𝑖=1

1
𝑀

𝑀 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑦𝑠𝑡𝑒𝑚𝑠 𝑡𝑟𝑎𝑐𝑘𝑠 𝑡𝑕𝑎𝑡 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑
𝑡𝑜 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑎𝑙𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡
𝑜𝑛𝑒 𝑠𝑦𝑠𝑡𝑒𝑚 𝑘𝑒𝑦 𝑓𝑟𝑎𝑚𝑒

𝑀

𝐶𝑖
𝑖=1

Corresponding System Tracks

Ground-Truth
T

C1

C2

…

Track1
C3

T

C1

C2

C3

…

TrackN
T

C1

C2

Results
C3

T

C

1

…

1

1

2

…

1

0

3

…

0

1

4

…

0

0

5

…

1

0.5

…

…

…

𝑇𝑃
IBM Smart Vision Suite – In-depth Briefing

…
1

0

0

1

© 2012 IBM Corporation.
All Rights Reserved.

 Detail: General Object Color and Type Classification
Evaluation
Evaluation Goal: Determine how effectively objects can be found using the color and object
type filters.
In this evaluation, the color and type that were assigned by the annotator for each moving object are
compared against the color and type that is captured by the system. For each moving object that was
successfully detected, a color value and type value are computed.
If the color is correct for a given moving object, a color value of 1 is assigned. If the object type is correct, a
type value of 1 is assigned. If the color is incorrect, a color value of 0 is assigned. If the object type is incorrect,
a type value of 0 is assigned.
In some cases, the object may be multi-colored. Both the annotator and the system can assign up to 3 colors
to a single object. In these cases, the color value is the proportion of the actual colors that were captured by
the system. For example, if the object is both yellow and black and the system only captures yellow, then the
color value is 0.5. On the other hand, if the system captures both yellow and black, the color value is 1.
The final evaluation result is the average color and type value for all correspondences between system tracks
and detected moving objects.
For color annotation of vehicles, the annotator is instructed to ignore any color associated with the windshield,
wheels or particular lighting conditions and to only consider color on the body. Although these “extraneous”
colors are part of the vehicle color in a literal sense, we think that it would be unusual for a person to include
these colors when describing a vehicle.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 General Object Color and Type Classification Comparison
Note: Synthetic Results
Color

Outdoor
Detection
Enhanced
Tracking

Object Type

Midfield
Vehicle
Search

IBM Smart Vision Suite – In-depth Briefing

Outdoor
Detection
Enhanced
Tracking
© 2012 IBM Corporation.
All Rights Reserved.

 Midfield People: Color Retrieval Evaluation
𝑤
𝐶𝑆𝑤 =
𝑋

𝜏
𝑖∈𝑋

𝟐𝒏−𝟏

𝟐𝒏−𝟏

𝑋𝑖𝐴
𝑗=1

𝑗

𝑿=

n = number of possible
search filters for person 𝑿𝒊

Set of True-Positives

Search Filters

Rank Position
8

Torso=Blue

𝑋1𝐴1

…
Pants=Black

2

𝑋1𝐴2

…

…

…
41

Torso=Blue,
Torso=Black

𝑋1𝐴

…

𝑁−1

𝑋1𝐴

𝑁

Torso=Blue,
Torso=Black,
Pants=Black
IBM Smart Vision Suite – In-depth Briefing

3

© 2012 IBM Corporation.
All Rights Reserved.

 Midfield People: Color Retrieval Comparison
Note: Synthetic Results

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Evaluation Process Illustration
Suitable Camera/Video
Groups

City Surveillance (3.6.7)

Events

Outdoor Tracking (4.0)

Events

Midfield People Search (4.0)

Events

Outdoor Detection Enhanced Tracking (4.0)

Events

Midfield Vehicle Search (4.0)

Events

Performance
Results

Evaluation

GT
Indoor, People Only, MPS “A” Quality

Outdoor, Vehicles Only, Suitable for MVS

Indoor, People Only, MPS “B” Quality

Outdoor, People and Vehicles, Suitable for ODET

Outdoor, People Only, MPS “A” Quality

Outdoor, Vehicles Only, NOT Suitable for MVS

Outdoor, People Only, MPS “B” Quality

Outdoor, People and Vehicles, NOT Suitable for ODET

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 Discussion and Revision Logs
Dates

Description

9/18/2012

First Review with NYPD: Dr Evan Levine, Dir Rich Schroeder,
and Sgt. Nelson Pimentel

9/21/2012

Revised the deck to include suggestion from 9/18 face-to-face
review meeting. Waiting for further NYPD review and
comments.

10/10/2012

Received feedback from NYPD on 9/25 and made best effort to
revise the presentation and respond to their questions. A Faceto-Face review of the changes is scheduled on 10/11.

10/16/20102

Removed Counting Histogram per Dr Evan’s Suggestion.
Changed “Best Match” based Metrics and its associated Slides:
73, 74, 76, 79 and 80.

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.

 End of the Plan

IBM Smart Vision Suite – In-depth Briefing

© 2012 IBM Corporation.
All Rights Reserved.