PETS 2009
Miami, Florida, 25 June 2009

In Conjunction with IEEE Computer Society Conference on Computer Vision and Pattern Recognition

PETS 2009 Benchmark Data

Overview

The datasets are multisensor sequences containing  different crowd activities.

Please e-mail datasets@pets2009.net if you require assistance obtaining these datasets for the workshop.

Aims and Objectives

The aim of this workshop is to employ existing or new systems for the detection of one or more of 3 types of crowd surveillance characteristics/events within a real-world environment. The scenarios are filmed from multiple cameras and involve up to approximately forty actors.

More specifically, the challenge includes estimation of crowd person count and density, tracking of individual(s) within a crowd, and detection of flow and crowd events.

News

06 March 2009: The PETS2009 crowd dataset is released.
01 April 2009: The PETS2009 submission details are released. Please see Author Instructions.

Preliminaries

Please read the following information carefully before processing the dataset, as the details are essential to the understanding of when notification of events should be generated by your system.  Please check regularly for updates.

Summary of the Dataset structure

The dataset is organised as follows:
Each subset contains several sequences and each sequence contains different views (4 up to 8). This is shown in the diagram below:

structure




Calibration Data

The calibration data (one file for each of the 8 cameras) can be found here. The ground plane is assumed to be the Z=0 plane. C++ code (available here) is provided to allow you to load and use the calibration parameters in your program (courtesy of project ETISEO). The provided calibration parameters were obtained using the freely available Tsai Camera Calibration Software by Reg Willson. All spatial measurements are in metres. 

The cameras used to film the datasets are:
 

view
Model
Resolution
frame rate
Comments
001
Axis 223M
768x576
~7
Progressive scan
002
Axis 223M 768x576 ~7
Progressive scan
003
PTZ Axis 233D
768x576 ~7
Progressive scan
004
PTZ Axis 233D 768x576
~7
Progressive scan
005
Sony DCR-PC1000E 3xCMOS 720x576
~7
ffmpeg De-interlaced
006
Sony DCR-PC1000E 3xCMOS 720x576 ~7
ffmpeg De-interlaced
007
Canon MV-1 1xCCD w 720x576 ~7
Progressive scan
008
Canon MV-1 1xCCD w 720x576 ~7
Progressive scan
Frames are compressed as JPEG image sequences.  All sequences (except one) contain Views 001-004.  A few sequences also contain Views 005-008.  Please see below for more information.

Orientation

The cameras are installed at the locations shown below to cover an approximate area of 100m x 30m (the scale of the map is 20m):

plan

The GPS coordinates of the centre of the recording are: 5126'18.5N 00056'40.00W

The direct link to the Google maps is as follows: Google Maps

Camera installation points are shown above and sample frames are shown below:


view 001 view 002 view 003 view 004




view 005 view 006 view 007 view 008





Synchronisation

Please note that while effort has been made to make sure the frames from different views are synchronised, there might be slight delays and frame drops in some cases. In particular View 4 suffers from frame rate instability and we suggest it be used as a supplementary source of information. Please let us know if you encounter any problems or inconsistencies.


Download

Dataset S0: Training Data

This dataset contains three sets of training sequences from different views provided to help researchers obtain the following models from multiple views:

Download



Dataset S1: Person Count and Density Estimation


Three regions, R0, R1 and R2 are defined in View 001 only  (shown in the example image). The coordinates of the top left and bottom right corners (in pixels) are given in the following table.


Region
Top-left
Bottom-right
R0
(10,10) (750,550)
R1
(290,160) (710,430)
R2
(30,130) (230,290)

Definition of crowd density (%): crowd density is based on a maximum occupancy (100%) of 40 people in 10 square metres on the ground. One person is assumed to occupy 0.25 square metres on the ground.
regions
Scenario: S1.L1 walking

Elements:  medium density crowd, overcast
Sequences: Sequence 1 with timestamp 13-57; Sequence 2 with timestamp 13-59. Sequences 1-2 use Views 001-004.
Subjective Difficulty: Level 1
Task: The task is to count the number of people in R0 for each frame of the sequence in View 1 only.  As a secondary challenge  the crowd density in regions R1 and R2 can also be reported (mapped to ground plane occupancy, possibly using multiple views).

Download [502 MB]




 


Sample Frames:


          
Scenario:
S1.L2 walking
         
Elements:  high density crowd, overcast
Sequences: Sequence 1 with timestamp 14-06; Sequence 2 with timestamp 14-31. Sequences 1-2 use Views 001-004.
Subjective Difficulty: Level 2
Task: This scenario contains a densely grouped crowd who walk from one point to another. There are two sequences corresponding timestamps 14-06 and 14-31.

The task related to timestamp 14-06 is to estimate  the crowd density in Region R1 and R2 at each frame of the sequence. 

The designated task for the sequence Time_14-31 is to determine both the total number of people entering through the brown line from the left side  AND  the total number of people exiting from purple and red lines,  shown in the opposite figure, throughout the whole sequence. The coordinates of the entry and exit lines are given below for reference.

Line
Start
End
Entry : brown
(730,250)
(730,530)
Exit1 : red
(230,170)
(230,400)
Exit2 : purple
(500,210)
(720,210)

Download [367 MB]

lines


Sample Frames:




Scenario: S1.L3 running
         

Elements:  medium density crowd, bright sunshine and shadows
Sequences:
Sequence 1 with timestamp 14-17; Sequence 2 with timestamp 14-33. Sequences 1-2 use Views 001-004.
Subjective Difficulty:
Level 3
Task: This scenario contains a crowd of people who, on reaching a point in the scene, begin to run. The task is to measure the crowd density in Region R1 at each frame of the sequence.

Download [476 MB]




Sample Frames:

    


Dataset S2: People Tracking

Scenario: S2.L1 walking
         

Elements: sparse crowd
Sequences:
Sequence 1 with timestamp 12-34 using Views 001-008 except View_002 for cross validation (see below).
Subjective Difficulty: L1
Task: Track all of the individuals in the sequence. If you undertake monocular tracking only, report the 2D bounding box location for each individual in the view used; if two or more views are processed, report the 2D bounding box location for each individual as back projected into View_002 using the camera calibration parameters provided (this equates to a leave-one-out validation). Note the origin (0,0) of the image is assumed top-left. Validation will be performed using manually labelled ground truth.

Download [997 MB]



Sample Frames:



Scenario: S2.L2 walking       

Elements: medium density crowd
Sequences:
Sequence 1 with timestamp 14-55 using Views 001-004.
Subjective Difficulty:
L2
Task:
Track the  individuals marked A and B (see figure) in the sequence and provide 2D bounding box locations of the individuals in View_002 which will be validated using manually labelled ground truth. Note the origin (0,0) of the image is assumed top-left. Note that individual B exits the field of view and returns toward the end of the sequence.

Download [442 MB]




frame_0507.jpgframe_0507.jpgframe_0507.jpgframe_0507.jpg



Scenario:
S2.L3 Walking
         

Elements: dense crowd
Sequences:
Sequence 1 with timestamp 14-41 using Views 001-004.
Subjective Difficulty:
L3
Task: Track the individuals marked A and B in the sequence and provide 2D bounding box information in View_002 for each individual which will be validated using manually labelled ground truth.

Download [259 MB]




frame_0507.jpgframe_0507.jpgframe_0507.jpgframe_0507.jpg


Dataset S3: Flow Analysis and Event Recognition

Scenario: S3.Multiple Flows
         

Elements: dense crowd, running
Sequences:
Sequences 1-5 with timestamps 12-43 (using Views 1,2,5,6,7,8) , 14-13, 14-37, 14-46 and 14-52. Sequences 2-5 use Views 001-004.
Subjective Difficulty:
L2
Task: Detect and estimate the multiple flows in the provided sequences, mapped onto the ground plane as a occupancy map flow.  Further details of the exact task requirements are contained under Author Instructions.  These would be compared with ground truth optical flow of major flows in the sequences on the ground plane.  

Download [760 MB]



Sample Frames:





Scenario: S3.Event Recognition
         

Elements: dense crowd
Sequences: Sequences 1-4 with timestamps 14-16, 14-27, 14-31 and 14-33. Sequences 1-4 use Views 001-004.
Subjective Difficulty: L3
Task: This dataset contains different crowd activities and the task is to provide a probabilistic estimation of each of the following events: walking, running, evacuation (rapid dispersion), local dispersion, crowd formation and splitting at different time instances. Furthermore, we are interested in systems that can identify the start and end of the events as well as transitions between them.

Download [1.2 GB]


 

Sample Frames:





Additional Information

The scenarios can also be downloaded from ftp://ftp.cs.rdg.ac.uk/pub/PETS2009/ (use anonymous login). Warning: ftp://ftp.pets.rdg.ac.uk is not listing files correctly on some ftp clients. If you experience problems you can connect to the http server at http://ftp.cs.rdg.ac.uk/PETS2009/.

Legal note: The video sequences are copyright University of Reading and permission is hereby granted for free download for the purposes of the PETS 2009 workshop and academic and industrial research. Where the data is disseminated (e.g. publications, presentations) the source should be acknowledged.