Deep Active Learning For Object Detection Journal?

1 Introduction

Labeled training data is highly valuable and the basic requirement of supervised learning. Active learning aims to expedite the process of acquiring new labeled data, ordering unlabeled samples past the expected value from annotating them. In this newspaper, we propose novel active learning methods for object detection. Our main contributions are

(i) an incremental learning scheme for deep object detectors without catastrophic forgetting based on [Käding et al., 2016b], (ii)

active learning metrics for detection derived from uncertainty estimates and

(3) an arroyo to leverage selection imbalances for active learning.

While active learning is widely studied in classification tasks [Kovashka et al., 2016, Settles, 2009]

, it has received much less attending in the domain of deep object detection. In this work, nosotros propose methods that tin be used with whatsoever object detector that predicts a class probability distribution per object proposal. Scores from individual detections are aggregated into a score for the whole image (see

Fig. one). All methods rely on the intuition that model uncertainty and valuable samples are likely to co-occur [Settles, 2009]. Furthermore, we evidence how the counterbalanced pick of new samples can amend the resulting functioning of an incrementally learned system.

In continuous exploration application scenarios, east.g., in camera streams, new data becomes available over fourth dimension or the distribution underlying the problem changes itself. We simulate such an environment using splits of the PASCAL VOC 2012 [Everingham et al., 2010] dataset. With our proposed framework, a deep object detection system can exist trained in an incremental manner while the proposed aggregation schemes enable selection of valuable data for notation. In result, a deep object detector can explore unknown data and arrange itself involving minimal human supervision. This combination results in a complete arrangement enabling continuously irresolute scenarios.

one.1 Related Piece of work

Object Detection using CNNs

An important contribution to object detection based on deep learning is R-CNN [Girshick et al., 2014]. It delivers a considerable improvement over previously published sliding window-based approaches. R-CNN employs selective search [Uijlings et al., 2013]

, an unsupervised method to generate region proposals. A pre-trained CNN performs feature extraction. Linear SVMs (one per form) are used to score the extracted features and a threshold is applied to filter the large number of proposed regions. Fast R-CNN

[Girshick, 2015] and Faster R-CNN[Ren et al., 2015] offer farther improvements in speed and accuracy. Afterward on, R-CNN is combined with feature pyramids to enable efficient multi-scale detections [Lin et al., 2017]. YOLO [Redmon et al., 2016] is a more recent deep learning-based object detector. Instead of using a CNN as a black box characteristic extractor, it is trained in an endtoend style. All detections are inferred in a single laissez passer (hence the name "Yous Only Await Once") while detection and classification are capable of independent functioning. YOLOv2 [Redmon and Farhadi, 2017] and YOLOv3 [Redmon and Farhadi, 2018] meliorate upon the original YOLO in several aspects. These include among others different network architectures, unlike priors for bounding boxes and because multiple scales during grooming and detection. SSD [Liu et al., 2016] is a single-pass approach comparable to YOLO introducing improvements like assumptions near the aspect ratio distribution of bounding boxes likewise equally predictions on dissimilar scales. Every bit a result of a series of improvements, it is both faster and more accurate than the original YOLO. DSSD [Fu et al., 2017] further improves upon SSD in focusing more than on context with the help of deconvolutional layers.

Agile Learning for Object Detection

The authors of [Abramson and Freund, 2006]

advise an active learning system for pedestrian detection in videos taken by a camera mounted on the front of a moving auto. Their detection method is based on AdaBoost while sampling of unlabeled instances is realized by hand-tuned thresholding of detections. Object detection using generalized Hough transform in combination with randomized decision copse, chosen Hough forests, is presented in

[Yao et al., 2012]

. Here, costs are estimated for annotations, and instances with highest costs are selected for labeling. This follows the intuition that those examples are most likely to exist difficult and therefore considered most valuable. Another agile learning approach for satellite images using sliding windows in combination with an SVM classifier and margin sampling is proposed in

[Bietti, 2012]. The combination of active learning for object detection with oversupply sourcing is presented in [Vijayanarasimhan and Grauman, 2014]. A office-based detector for SVM classifiers in combination with hashing is proposed for employ in large-scale settings. Active learning is realized by selecting the near uncertain instances for labeling. In [Roy et al., 2016], object detection is interpreted as a structured prediction problem using a version space approach in the then called "deviation of features" infinite. The authors suggest different margin sampling approaches estimating the future margin of an SVM classifier.

Similar our proposed arroyo, most related methods presented above rely on uncertainty indicators similar least conviction or 1vs2. Nonetheless, they are designed for a specific type of object detection and therefore can not exist applied to deep object detection methods in general whereas our method tin. Additionally, our method does not propose single objects to the man annotator. It presents whole images at once and requests labels for every object.

Active Learning for Deep Architectures

In [Wang and Shang, 2014] and [Wang et al., 2016], doubt-based agile learning criteria for deep models are proposed. The authors offering several metrics to approximate model incertitude, including least confidence, margin or entropy sampling. Wang et al. additionally draw a self-taught learning scheme, where the model'due south prediction is used as a characterization for farther preparation if uncertainty is below a threshold. Another type of margin sampling is presented in [Stark et al., 2015]. The authors propose querying samples according to the caliber of the highest and second-highest class probability. The visual detection of defects using a ResNet is presented in [Feng et al., 2017]. The authors suggest ii methods: uncertainty sampling (i.e., defect probability of 0.v) and positive sampling (i.e., selecting every positive sample since they are very rare) for querying unlabeled instances as model update after labeling. Another piece of work which presents dubiousness sampling is [Liu et al., 2017]. In add-on, a query by commission strategy likewise as active learning involving weighted incremental dictionary learning for active learning are proposed. In the work of [Gal et al., 2017]

, several uncertainty-related measures for active learning are presented. Since they employ Bayesian CNNs, they tin make employ of the probabilistic output and employ methods like variance sampling, entropy sampling or maximizing mutual information. Finally, the authors of

[Beluch et al., 2018] show that ensamble-based doubtfulness measures are able to perform all-time in their evaluation. All of the works introduced in a higher place are tailored to active learning in classification scenarios. Nearly of them rely on model doubtfulness, similar to our applied selection criteria.

Likewise estimating the uncertainty of the model, further retraining-based approaches are maximizing the expected model alter [Huang et al., 2016] or the expected model output alter [Käding et al., 2016a] that unlabeled samples would crusade after labeling. Since each bounding box within an image has to be evaluated according its active learning score, both measures would exist impractical in terms of runtime without further modifications. A more consummate overview of full general active learning strategies can exist establish in [Kovashka et al., 2016, Settles, 2009].

2 Prerequisite: Active Learning

In active learning, a value or metric is assigned to any unlabeled case to determine its possible contribution to model comeback. The current model's output tin can be used to estimate a value, equally tin statistical properties of the example itself. A loftier means that the instance should be preferred during selection considering of its estimated value for the electric current model.

In the following section, we propose a method to adapt an active learning metric for nomenclature to object detection using an aggregation process. This method is applicable to whatsoever object detector whose output contains class scores for each detected object.

Nomenclature

For classification, the model output for a given example is an estimated distribution of class scores over classes . This distribution can be analyzed to determine whether the model made an uncertain prediction, a proficient indicator of a valuable example. Different measures of uncertainty are a common pick for selection, eastward.thou., [Ertekin et al., 2007, Fu and Yang, 2015, Hoi et al., 2006, Jain and Kapoor, 2009, Kapoor et al., 2010, Käding et al., 2016c, Tong and Koller, 2001, Beluch et al., 2018].

For example, if the divergence between the two highest form scores is very depression, the instance may exist located shut to a decision boundary. In this case, information technology can exist used to refine the determination boundary and is therefore valuable. The metric is defined using the highest scoring classes and :

This procedure is known as 1vs2 or margin sampling [Settles, 2009]. Nosotros use 1vs2 as office of our methods since its operation is intuitive and it can produce ameliorate estimates than e.thousand., to the lowest degree conviction approaches [Käding et al., 2016a]. However, our proposed aggregation method could exist applied to any other active learning measure out.

3 Active Learning for Deep Object Detection

Using a classification metric on a unmarried detection is straightforward, if class scores are bachelor. Though, accumulation metrics of individual detections for a complete image can be done in many different ways. In the section beneath, we suggest simple and efficient aggregation strategies. Afterwards, nosotros hash out the problem of class imbalance in datasets.

3.1 Aggregation of Detection Metrics

Possible aggregations include calculating the sum, the boilerplate or the maximum over all detections. However, for some aggregations, it is non clear how an prototype without any detections should be handled.

Sum

A straightforward method of aggregation is the sum. Intuitively, this method prefers images with lots of uncertain detections in them. When aggregating detections using a sum, empty examples should be valued zero. It is the neutral element of addition, making it a reasonable value for an empty sum. A depression valuation effectively delays the option of empty examples until there are either no ameliorate examples left or the model has improved plenty to actually produce detections on them. The value of a single example can be calculated from the detections in the following way:

Average

Some other possibility is averaging each detection's scores. The average is not sensitive to the number of detections, which may make scores more than comparable betwixt images. If a sample does non comprise any detections, it will be assigned a zero score. This is an arbitrary rule considering there is no true neutral element w.r.t. averages. Withal, we cull zero to keep the behavior in line with the other metrics:

Maximum

Finally, individual detection scores tin be aggregated by calculating the maximum. This can upshot in a substantial information loss. However, it may also prove beneficial considering of increased robustness to dissonance from many detections. For the maximum aggregation, a nada score for empty examples is valid. The maximum is not affected by nil valued detections, because no single detection's score tin be lower than zero:

3.2 Handling Selection Imbalances

Class imbalances tin can pb to worse results for classes underrepresented in the grooming set up. In a continuous learning scenario, this imbalance can be countered during option by preferring instances where the predicted grade is underrepresented in the grooming set. An instance is weighted by the following dominion:

where denotes the predicted class. We assume a symmetric Dirichlet prior with , meaning that we have no prior knowledge of the course distribution, and estimate the posterior after observing the full number of training instances as well as the number of instances of grade in the training prepare. The weight is then defined as the changed of the posterior to prefer underrepresented classes. It is multiplied with earlier aggregation to obtain a terminal score.

4 Experiments

In the following, we nowadays our evaluation. Showtime we bear witness how the proposed aggregation metrics are able to enhance recognition operation while selecting new data for annotation. After this, we will clarify the gained improvements when our proposed weighting scheme is applied. This paper describes work in progress. Code volition be made available after briefing publication.

Data

We use the PASCAL VOC 2012 dataset[Everingham et al., 2010] to appraise the effects of our methods on learning. To specifically measure incremental and active learning operation, both training and validation fix are split into parts A and B in two different random ways to obtain more general experimental results. Part B is considered "new" and is comprised of images with the object classes bird, cow and sheep (offset way) or tvmonitor, cat and gunkhole (second way). Part A contains all other 17 classes and is used for initial training. The training set for part B contains 605 and 638 images for the first and second way, respectively. The decision towards VOC in favor of recently published datasets was motivated by the conditions of the dataset itself. Since it mainly contains images showing fewer objects, it is possible to split the data into a known and unknown function without having images containing classes from both parts of the splits.

Active Exploration Protocol

Before an experimental run, the part B datasets are divided randomly into unlabeled batches of ten samples each. This fixed consignment decreases the probability of very similar images beingness selected for the same batch compared to e'er selecting the highest valued samples, which would atomic number 82 to less various batches. This is valuable while dealing with information streams, e.g., from camera traps, or information with depression intra-form variance. The structure of various unlabeled data batches is a well known topic in batch-fashion active learning [Settles, 2009]. However, the construction of diverse batches could lead to unintended side-furnishings and an evaluation of those is beyond the scope of the current study. The unlabeled batch size is a trade-off between a tight feedback loop (smaller batches) and computational efficiency (larger batches). As side-effect of the fixed batch assignment, there are some samples left over during selection (i.e., five for first way and eight for second manner of splitting).

The unlabeled batches are assigned a value using the sum of the active learning metric over all images in the corresponding batch as a meta-aggregation. Other functions such as boilerplate or maximum could be considered too, but are too beyond the scope of this paper.

The highest valued batch is selected for an incremental preparation step [Käding et al., 2016b]. The network is updated using the annotations from the dataset in lieu of a human analyst. Please note, annotations are non needed for update batch option but for the update itself. This process is repeated from the point of batch valuation until there are no unlabeled batches left. The assignment of samples to unlabeled batches is not inverse during an experimental run.

Evaluation

Known labeled samples , unknown samples , initial model , active learning metric

split of into random batches

while is not empty do

calculate scores for all batches in using

highest scoring batch in according to

annotations for human-machine interaction

incrementally train using and

finish while

Algorithm one: Detailed description of the experimental protocol. Please notation that in an actual continuous learning scenario, new examples are always added to

. The loop is never left because

is never exhausted. The described splitting procedure would take to be applied regularly.

We report mean average precision (mAP) every bit described in [Everingham et al., 2010] and validate each five new batches (i.east., 50 new samples). The result is averaged over five runs for each active learning metric and fashion of splitting for a total of ten runs. Every bit a baseline for comparison, we evaluate the performance of random selection, since there is no other piece of work suitable for directly comparing without any adjustments every bit of yet.

Setup – Object Detector

We use YOLO as deep object detection framework [Redmon et al., 2016]. More precisely, nosotros use the YOLOSmall architecture as an alternative to larger object detection networks, because it allows for much faster preparation. Our initial model is obtained by fine-tuning the Extraction model ¹ ¹ 1http://pjreddie.com/media/files/extraction.weights on part A of the VOC dataset for 24,000 iterations using the Adam optimizer[Kingma and Ba, 2014]

, for each way of splitting the dataset into parts A and B, resulting in two initial models. The first half of initial training is completed with a learning rate of 0.0001. The 2nd one-half and all incremental experiments use a lower learning rate of 0.00001 to prevent divergence. Other hyperparameters match

[Redmon et al., 2016], including the augmentation of preparation data using random crops, exposure or saturation adjustments.

Setup – Incremental Learning

Extending an existing CNN without sacrificing operation on known information is not a petty chore. Finetuning exclusively on new information leads to a astringent degradation of operation on previously learned examples[Kirkpatrick et al., 2016, Shmelkov et al., 2017]. We utilize a straightforward, just constructive finetuning method [Käding et al., 2016b] to implement incremental learning. With each slope step, the mini-batch is synthetic by randomly selecting from onetime and new examples with a certain probability of or , respectively. After completing the learning step, the new data is only considered old information for the adjacent step. This method tin balance known and unknown information performance successfully. We use a value of 0.5 for to make equally few assumptions equally possible and perform 100 iterations per update. Algorithm one describes the protocol in more item. The method can be applied to YOLO object detection with some adjustments. Mainly, the architecture needs to be changed when new classes are added. Because of the design of YOLO'south output layer, we rearrange the weights to fit new classes, adding 49 weights per class.

4.1 Results

We focus our analysis on the new, unknown data. Still, not losing performance on known information is likewise important. We analyze the performance on the known part of the data (i.eastward., part A of the VOC dataset) to validate our method. In worst instance, the mAP decreases from 36.7% initially to 32.one% averaged across all experimental runs and methods although three new classes were introduced. We tin see that the incremental learning method from [Käding et al., 2016b] causes only minimal losses on known data. These losses in performance are as well referred to as "catastrophic forgetting" in literature [Kirkpatrick et al., 2016]. The method from [Käding et al., 2016b] does non require additional model parameters or adjusted loss terms for added samples like comparable approaches such equally [Shmelkov et al., 2017] do, which is of import for learning indefinitely.

Performance of active learning methods is usually evaluated by observing points on a learning curve (i.e., performance over number of added samples). In Table one, we show the mAP for the new classes from part B of VOC at several intermediate learning steps as well as exhausting the unlabeled pool. In addition we show the expanse under learning curve (AULC) to farther improve comparability amongst the methods. In our experiments, the number of samples added equals the number of images.

Quantitative Results – Fast Exploration

	fifty samples	100 samples	150 samples	200 samples	250 samples	All samples
	mAP/AULC	mAP/AULC	mAP/AULC	mAP/AULC	mAP/AULC	mAP/AULC
Baseline
Random	/	/	/	/	/	/
Our Methods
Max	/	/	/	/	/	/
Avg	/	/	/	/	/	/
Sum	/	/	/	/	/	/
Max	/	/	/	/	/	/
Avg	/	/	/	/	/	/
Sum	/	/	/	/	/	/

Tabular array ane: Validation results on part B of the VOC information (i.e., new classes but). Bold face indicates block-wise best results, i.e., best results with and without additional weighting (

). Underlined face highlights overall best results.

Gaining accuracy every bit fast equally possible while minimizing the human supervision is i of the main goals of active learning. Moreover, in continuous exploration scenarios, like faced in camera feeds or other continuous automated measurements, it is assumed that new information is always bachelor. Hence, the pool of valuable examples will rarely be exhausted. To assess the performance of our methods in this fast exploration context, we evaluate the models after learning learning small amounts of samples. At this betoken in that location is still a large number of diverse samples for the methods to choose from, which makes the following results much more relevant for practical applications than results on the full dataset.

In general, we tin can see that incremental learning works in the context of the new classes in part B of the data, meaning that we detect an improving performance for all methods. After adding only samples, Max and Avg are performing better than passive option while the Sum metric is outperformed marginally. When more and more samples are added (i.eastward., to samples), we detect a superior performance of the Sum assemblage. Merely too the two other aggregation techniques are able to reach better rates than mere random selection. We attribute the fast increase of performance for the Sum metric to its tendency to select samples with many object inside which leads to more annotated bounding boxes. However, the target application is a scenario where the amount of unlabeled data is huge or new information is approaching continuously and hence a complete evaluation by humans is infeasible. Here, we consider the amount of images to exist evaluated more critical as the time needed to draw single bounding boxes. Some other interesting fact is the almost equal performance of Max and Avg which can be explained every bit follows: the VOC dataset consists mostly of images with but one object in them. Therefore, both metrics lead to a similar score if objects are identified correctly.

We can also see that the proposed residuum handling (i.eastward., ) causes slight losses in performance at very early stages up to samples. At subsequent stages, it helps to gain noticeable improvements. Especially for the Sum method benefits from the sample weighting scheme. A possible explanation for this behavior would be the post-obit: At early on stages, the classifier has not seen many samples of each class and therefore suffers more from miss-classification errors. Hence, the weighting scheme is not able to encourage the pick of rare form samples since the classifier decisions are still too unstable. At subsequently stages, this trouble becomes less severe and the weighting scheme is much more helpful than in the beginning. This could also explain the operation of Sum in general. Further details on learning pace are given subsequently in a qualitative study on model evolution. Additionally, the Sum aggregation tends to select batches with many detections in it. Hence, information technology is natural that the comeback is noticeable the most with this aggregation technique since it helps to find batches with many rare objects in it.

Quantitative Results – All Available Samples

In our example, agile learning simply affects the sequence of unlabeled batches if we railroad train until there is no new data available. Therefore, there are but very pocket-sized differences between each method'due south results for mAP after grooming has completed. The pocket-sized differences betoken that the chosen incremental learning technique [Käding et al., 2016b] is suitable for the faced scenario. In continuous exploration, it is commonly assumed that at that place will be more than new unlabeled data available than tin can exist processed. Nevertheless, evaluating the long term functioning of our metrics is important to detect possible deterioration over time compared to random selection. In contrast to this, the differences in AULC arise from the improvements of the different methods during the experimental run and therefore should be considered as distinctive characteristic implying the performance over the whole experiment. Having this in mind, we can even so run across that Sum performs best while the weighting generally leads to improvements.

Quantitative Results — Class-wise Assay

Figure 2: Class-wise validation results on function B of the VOC dataset (*i.e.,*, unknown classes). The first row details the kickoff way of splitting (bird,moo-cow,sheep) while the 2d row shows the second way (gunkhole,cat,tvmonitor). For reference, the distribution of samples (object instances as well equally images with at least i instance) over the VOC dataset is provided in the third row.

To validate the efficacy of our sample weighting strategy as discussed in Section three.2, it is important to measure non simply overall performance, just to look at metrics for individual classes. Fig. two shows the performance over time on the validation set for each individual course. For reference, we also provide the class distribution over the relevant part of the VOC dataset, indicated by number of object instances in total as well as number of images with at least one instance in it.

In the start row, we observe an reward for the weighted method when looking at the performance of cow. Out of the three classes in this manner of splitting cow has the fewest instances in the dataset. The operation of tvmonitor in the second row shows a like pattern, where it is besides the class with the lowest number of object instances in the dataset. Analyzing bird and cat, the top classes past number of instances in each fashion of splitting, we detect simply small-scale differences in operation. Thus, we can bear witness prove that our balancing scheme is able to improve performance on rare classes while it does not consequence performance on frequent classes.

Intuitively, these observations are in line with our expectations regarding our handling of class imbalances, where examples of rare classes should be preferred during selection. We offset to observe the advantages after around 100 training examples, because, for the selection to happen correctly, the prediction of the rare course needs to exist right in the showtime identify.

Qualitative Results – Sample Valuation

Figure three: Value of examples of moo-cow, sheep and bird as determined past the Sum, Avg and Max metrics using the initial model. The top vii choice is not affected by using our weighting method to counter preparation set class imbalaces.

Nosotros calculate whole image scores over bird, moo-cow and sheep samples using our corresponding initial model trained on the remaining classes for the first way of splitting. Figure 3 shows those images that the iii aggregation metrics consider the most valuable. Additionally, common nil scoring images are shown. The least valuable images shown here are representative of all proposed metrics because they do not lead to any detections using the initial model. Annotation that there are more than seven images with nada score in the training dataset. The images shown in the figure take been selected randomly.

Intuitively, the Sum metric should prefer images with many objects in them over single objects, even if individual detection values are low. Although VOC contains by and large of images with a single object, all seven of the highest scoring images contain at least three objects. The Average and Maximum metric prefer almost identical images since the average and maximum are used to be nearly equal for few detections. With few exceptions, the almost valuable images contain pristine examples of each object. They are well lit and isolated. The objects in the nada scoring images are more than noisy and hard to identify even for the human being viewer, resulting in fewer reliable detections.

Qualitative Results – Model Evolution

Figure 4: Evolution of detections on examples from validation set up.

Observing the change in model output as new information is learned can help estimate the number of samples needed to learn new classes and place possible confusions. Fig. four shows the development from initial guesses to correct detections after learning 150 samples, respective to an fast exploration scenario. For selection, the Sum metric is used.

The grade confusions shown in the figure are typical for this scenario. moo-cow and sheep are recognized as visually similar dog, horse and cat. bird is oft classified as aeroplane. After selecting and learning 150 samples, the objects are detected and classified correctly and reliably.

During the learning process, there are also unknown objects. Please annotation, being able to mark objects equally unknown is a direct upshot of using YOLO. Those objects have a detection confidence in a higher place the required threshold, but no nomenclature is certain enough. This property of YOLO is important for the discovery of objects of new classes. However, if similar data is available from other detection methods, our techniques could easily be applied.

5 Conclusions

In this paper, we advise several uncertaintybased active learning metrics for object detection. They only require a distribution of nomenclature scores per detection. Depending on the specific chore, an object detector that volition study objects of unknown classes is also important. Additionally, we advise a sample weighting scheme to balance selections amongst classes.

Nosotros evaluate the proposed metrics on the PASCAL VOC 2012 dataset [Everingham et al., 2010] and offering quantitative and qualitative results and analysis. Nosotros show that the proposed metrics are able to guide the note process efficiently which leads to superior operation in comparison to a random option baseline. In our experimental evaluation, the Sum metric is able to achieve best results overall which can be attributed to the fact that it tends to select batches with many single objects in it. However, the targeted scenario is an awarding with huge amounts of unlabeled data where nosotros consider the corporeality of images to be evaluated as more than critical than the fourth dimension needed to draw single bounding boxes. Examples would be camera streams or camera trap information. To expedite notation, our approach could exist combined with a weakly supervised learning approach as presented in [Papadopoulos et al., 2016]. We too showed that our weighting scheme leads to even increased accuracies.

All presented metrics could be applied to other deep object detectors, such as the variants of SSD [Liu et al., 2016], the improved R-CNNs east.k., [Ren et al., 2015] or the newer versions of YOLO [Redmon and Farhadi, 2017]. Moreover, our proposed metrics are not restricted to deep object detection and could be applied to arbitrary object detection methods if they fulfill the requirements. It simply requires a complete distribution of classifications scores per detection. Also the underlying uncertainty measure could be replaced with arbitrary active learning metrics to exist aggregated afterward. Depending on the specific task, an object detector that volition report objects of unknown classes is also of import.

The proposed aggregation strategies also generalize to pick of images based on sectionalisation results or whatever other type of image partition. The resulting scores could also be applied in a novelty detection scenario.

References

[Abramson and Freund, 2006] Abramson, Y. and Freund, Y. (2006). Active learning for visual object detection. Technical report, University of California, San Diego.
[Beluch et al., 2018] Beluch, W. H., Genewein, T., Nürnberger, A., and Köhler, J. Thou. (2018). The ability of ensembles for active learning in epitome classification. In Reckoner Vision and Pattern Recognition (CVPR).
[Bietti, 2012] Bietti, A. (2012). Active learning for object detection on satellite images. Technical study, California Constitute of Technology, Pasadena.
[Ertekin et al., 2007] Ertekin, South., Huang, J., Bottou, L., and Giles, L. (2007). Learning on the edge: agile learning in imbalanced data classification. In Briefing on Information and Knowledge Management.
[Everingham et al., 2010] Everingham, M., Van Gool, Fifty., Williams, C. K. I., Winn, J., and Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Periodical of Calculator Vision (IJCV).
[Feng et al., 2017] Feng, C., Liu, G.-Y., Kao, C.-C., and Lee, T.-Y. (2017). Deep active learning for civil infrastructure defect detection and classification. In International Workshop on Computing in Civil Applied science (IWCCE).
[Fu and Yang, 2015] Fu, C.-J. and Yang, Y.-P. (2015). A batch-mode agile learning svm method based on semi-supervised clustering. Intelligent Data Analysis.
[Fu et al., 2017] Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A. C. (2017). Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659.
[Gal et al., 2017] Gal, Y., Islam, R., and Ghahramani, Z. (2017). Deep bayesian active learning with epitome data. arXiv preprint arXiv:1703.02910.
[Girshick, 2015] Girshick, R. (2015). Fast R-CNN. In International Briefing on Calculator Vision (ICCV).
[Girshick et al., 2014] Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR).
[Hoi et al., 2006] Hoi, Due south. C., Jin, R., and Lyu, Thousand. R. (2006). Big-scale text categorization by batch manner active learning. In International Briefing on Www (World wide web).
[Huang et al., 2016] Huang, J., Child, R., Rao, V., Liu, H., Satheesh, Southward., and Coates, A. (2016). Active learning for speech communication recognition: the ability of gradients. arXiv preprint arXiv:1612.03226.
[Jain and Kapoor, 2009] Jain, P. and Kapoor, A. (2009). Active learning for large multi-class problems. In Figurer Vision and Blueprint Recognition (CVPR).
[Käding et al., 2016a] Käding, C., Freytag, A., Rodner, E., Perino, A., and Denzler, J. (2016a). Large-scale active learning with approximated expected model output changes. In German language Conference on Pattern Recognition (GCPR).
[Käding et al., 2016b] Käding, C., Rodner, E., Freytag, A., and Denzler, J. (2016b).
Fine-tuning deep neural networks in continuous learning scenarios.
In ACCV Workshop on Interpretation and Visualization of Deep Neural Nets (ACCV-WS).
[Käding et al., 2016c] Käding, C., Rodner, E., Freytag, A., and Denzler, J. (2016c). Sentry, ask, acquire, and amend: A lifelong learning cycle for visual recognition. In European Symposium on Artificial Neural Networks (ESANN).
[Kapoor et al., 2010] Kapoor, A., Grauman, K., Urtasun, R., and Darrell, T. (2010). Gaussian processes for object categorization. International Periodical of Computer Vision (IJCV).
[Kingma and Ba, 2014] Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980.
[Kirkpatrick et al., 2016] Kirkpatrick, J., Pascanu, R., Rabinowitz, N. C., Veness, J., Desjardins, 1000., Rusu, A. A., Milan, Thousand., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., and Hadsell, R. (2016). Overcoming catastrophic forgetting in neural networks. arXiv preprint arXiv:1612.00796.
[Kovashka et al., 2016] Kovashka, A., Russakovsky, O., Fei-Fei, Fifty., and Grauman, Thousand. (2016). Crowdsourcing in computer vision. Foundations and Trends in Computer Graphics and Vision.
[Lin et al., 2017] Lin, T.-Y., Dollár, P., Girshick, R., He, M., Hariharan, B., and Belongie, Southward. (2017). Feature pyramid networks for object detection. In CVPR.
[Liu et al., 2017] Liu, P., Zhang, H., and Eom, K. B. (2017). Agile deep learning for classification of hyperspectral images. Selected Topics in Practical Earth Observations and Remote Sensing.
[Liu et al., 2016] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. (2016). SSD: Single shot multibox detector. In European Briefing on Computer Vision (ECCV).
[Papadopoulos et al., 2016] Papadopoulos, D. P., Uijlings, J. R. R., Keller, F., and Ferrari, Five. (2016). We dont need no bounding-boxes: Training object course detectors using only human verification. In Computer Vision and Pattern Recognition (CVPR).
[Redmon et al., 2016] Redmon, J., Divvala, South., Girshick, R., and Farhadi, A. (2016). You only await once: Unified, real-fourth dimension object detection. In Computer Vision and Pattern Recognition (CVPR).
[Redmon and Farhadi, 2017] Redmon, J. and Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In Reckoner Vision and Pattern Recognition (CVPR).
[Redmon and Farhadi, 2018] Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental comeback. arXiv preprint arXiv:1804.02767.
[Ren et al., 2015] Ren, S., He, K., Girshick, R., and Lord's day, J. (2015). Faster R-CNN: Towards real-fourth dimension object detection with region proposal networks. In Neural Information Processing Systems (NIPS).
[Roy et al., 2016] Roy, S., Namboodiri, V. P., and Biswas, A. (2016). Active learning with version spaces for object detection. arXiv preprint arXiv:1611.07285.
[Settles, 2009] Settles, B. (2009). Active learning literature survey. Technical report, University of Wisconsin–Madison.
[Shmelkov et al., 2017] Shmelkov, K., Schmid, C., and Alahari, K. (2017). Incremental learning of object detectors without catastrophic forgetting. In International Conference on Calculator Vision (ICCV).
[Stark et al., 2015] Stark, F., Hazırbas, C., Triebel, R., and Cremers, D. (2015). Captcha recognition with active deep learning. In Workshop New Challenges in Neural Computation.
[Tong and Koller, 2001] Tong, S. and Koller, D. (2001). Support vector machine active learning with applications to text classification.
Periodical of Motorcar Learning Enquiry (JMLR)
.
[Uijlings et al., 2013] Uijlings, J. R., Van De Sande, K. E., Gevers, T., and Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Calculator Vision (IJCV), 104(2):154–171.
[Vijayanarasimhan and Grauman, 2014] Vijayanarasimhan, S. and Grauman, K. (2014). Large-scale alive agile learning: Training object detectors with crawled information and crowds. International Journal of Calculator Vision (IJCV).
[Wang and Shang, 2014] Wang, D. and Shang, Y. (2014). A new agile labeling method for deep learning. In International Articulation Briefing on Neural Networks (IJCNN).
[Wang et al., 2016] Wang, K., Zhang, D., Li, Y., Zhang, R., and Lin, L. (2016). Cost-effective active learning for deep image classification. Circuits and Systems for Video Technology.
[Yao et al., 2012] Yao, A., Gall, J., Leistner, C., and Van Gool, L. (2012). Interactive object detection. In Computer Vision and Design Recognition (CVPR).

Source: https://deepai.org/publication/active-learning-for-deep-object-detection

Posted by: myerstoop1998.blogspot.com

Deep Active Learning For Object Detection Journal?