Object tracking is one of the most important tasks in video analysis. Many methods have been proposed such as TLD (Tracking, Learning, Detection), Meanshift but they show good accuracy in laboratory cases, not in real ones. The accuracy is a deviation between computed object coordinates and the real ones. One of the reasons is the lack of information about the tracked object and environment changes. If the algorithm has the prior information about the tracked object, then it will be able to perform with higher accuracy. Some of the newest object tracking methods such as GOTURN uses trained CNN (convolutional neural network) and has a better accuracy as it knows how tracked object look like in different situations such as light intensity change and object turn.
Using only the classifier we can find the object that was in the training set. But if its appearance is changing, it will be lost when deviation will be higher than the trust limit. So it is important to have independent parts of prior and posterior information about the tracked object. Prior information is given by detector (CNN) and posterior information – by tracking algorithm (TLD). One of the biggest problem of detector is high computational complexity of its algorithm. Due to computational reasons we place classifier in parallel with the tracker.
Based on existing tracking algorithms TLD is chosen as basic algorithm. Convolutional neural network (CNN) is used for object detection. We selected this type of neural networks due to the advantages listed in the literature review.
The algorithm has the next work principle. The initial video frame is processed by CNN, which classifies objects and determines its position on the frame. Then, the information is transferred to TLD tracker and it tracks the necessary object. Periodically the object position from classifier and tracker is comprised. If the difference is bigger than the threshold, TLD will get a new tracked object position which is determined by the NN.
Comments (0)