BOSWAU + KNAUER
All posts

Blog

Object Detection vs Anomaly Detection in Security: Which One You Actually Need

Object detection answers "what is in the frame." Anomaly detection answers "is this normal." For most industrial operators, the second question is the one that costs money.

Dr. Raphael Nagel

Dr. Raphael Nagel

March 26, 2026

Object Detection vs Anomaly Detection in Security: Which One You Actually Need

Most procurement documents in physical security confuse two different questions, and the confusion is expensive.

Object detection answers the question of what is in the frame. A person, a vehicle, a forklift, a backpack, a ladder. The model has been trained on labelled examples of each class and assigns a probability that the pixels in a given region belong to one of them. Anomaly detection answers a different question entirely. It asks whether what is happening in the frame matches the normal behaviour of the site. The first question is solved. The second one is where the money lives, and where the operator either gains a hand or loses one.

The distinction matters because the two architectures are not interchangeable. They are trained differently, they fail differently, they are tuned differently, and they require different infrastructure. An operator who specifies object detection when the underlying problem is behavioural will buy a system that classifies everything correctly and still misses the event. The reverse mistake is rarer but more costly. A vendor who sells an anomaly model on a site where the operator only needs reliable class detection has built complexity that the customer cannot maintain. This article separates the two, names the conditions under which each is appropriate, and indicates where the combination is the only honest answer.

What object detection actually does

Object detection is a supervised learning problem with a closed taxonomy. The engineer decides in advance which classes the model needs to recognise. Person, vehicle, truck, forklift, hard hat, high-visibility vest, weapon, package, animal. A training set is assembled with thousands or tens of thousands of labelled bounding boxes per class. The model learns to predict, for any given image, a list of boxes with class probabilities and confidence scores. The state of the art is mature. Architectures like YOLO, Faster R-CNN and the more recent transformer-based detectors deliver class accuracy in the high nineties under reasonable conditions for the common classes, and the inference cost has fallen to the point where useful detection runs on edge hardware with a power envelope below twenty watts.

The strength of object detection is its precision on the classes it knows. If the operator wants to count people entering a gate, recognise vehicles by type, or trigger a workflow when a forklift enters a pedestrian zone, object detection is the right tool. Standards work in industrial control environments, including the safety-instrumented logic that IEC 62443 anticipates at the integration layer, increasingly assumes that the perception layer can deliver class information of this quality at near real time.

The limitation is the closed taxonomy. The model recognises what it has been trained on. It does not recognise what it has not been trained on. A weapon class trained on handguns and rifles will not recognise a crowbar held in an unusual posture. A vehicle class trained on cars, trucks and forklifts will not recognise a stolen tracked excavator being driven off the lot at three in the morning, because nobody labelled "excavator being stolen at night" as a class. The model will return "vehicle, confidence 0.91" and the alert will not fire, because the rule engine sitting above the detector was looking for an unauthorised person, not a vehicle that the site owns.

The second limitation is contextual blindness. Object detection has no opinion about whether the vehicle should be there. It knows the vehicle is there. The rule that decides whether the presence is normal sits in a layer above the detector, and that layer is usually a hand-coded set of geofences, schedules and exclusion lists. The more complex the site, the more brittle that layer becomes. At a certain scale, no human can write the rules fast enough to keep up with how the site actually behaves.

What anomaly detection actually does

Anomaly detection inverts the training problem. Rather than learning a closed list of classes, the model learns a distribution of normal. It observes the site for a defined baseline period, typically two to six weeks depending on operational rhythm, and builds a statistical or neural representation of what counts as ordinary. Movement patterns, speed distributions, occupancy levels at given hours, pixel-level variance in defined zones, dwell times near assets. The model is then asked, in production, whether a new observation falls inside or outside the learned distribution. Outliers are flagged. The class of the outlier is not the point. The deviation is the point.

The mathematical foundations have been in the field for decades. One-class support vector machines, isolation forests, gaussian mixture models, autoencoders that learn to reconstruct normal frames and report reconstruction error on abnormal ones, more recently variational autoencoders and diffusion-based reconstruction. NIST's work on machine learning for cybersecurity has applied similar logic in network traffic analysis, and the physical security application is the same logic translated to pixels and motion vectors. ISO 27001 controls around continuous monitoring make the same assumption. Define the baseline, observe the deviation, act on the deviation.

The strength is that the model does not need to know in advance what the threat will look like. A man on the lot at four in the morning who is wearing a high-visibility vest, carrying a clipboard, and walking the perimeter in a calm pattern looks like a security guard. An object detector trained to spot intruders will not raise an alarm because no class fires. An anomaly model, properly baselined, will raise the alarm because at four in the morning on a Sunday the site never has a person on the lot, regardless of what the person is wearing. The deviation is what matters. The disguise is irrelevant.

The weakness is that the model needs a baseline that actually reflects normal. Sites with chaotic operations, frequent layout changes, mixed shifts or seasonal swings produce baselines that drift. A baseline learned in summer fails in winter. A baseline learned during a single-shift operation fails when the operator goes to three shifts. Anomaly models therefore require lifecycle discipline. The baseline has to be re-trained, the false positive rate has to be tracked, and the operator has to be willing to accept a higher noise floor than object detection produces, because the trade-off is that the model catches events that no class taxonomy would have anticipated.

Where each model belongs

The right question is not which model is better. The right question is which question the operator is actually trying to answer. The book BOSWAU + KNAUER. From Building to Security Technology frames event recognition in exactly this way: the function of the analytics layer is to lift human attention to the places where it is needed, and to suppress it everywhere else. Object detection lifts attention by class. Anomaly detection lifts attention by deviation. The two are not in competition. They serve different decisions.

Object detection belongs in workflows where the relevant event is defined and recurring. Access control with class verification. Counting and flow analytics. Compliance monitoring, for instance whether personal protective equipment is worn in defined zones, a requirement that ASIS International publications increasingly list as a baseline for industrial sites. Forensic search of recorded footage, where the operator needs to find every instance of a particular vehicle type over the last thirty days. License plate recognition. Weapon detection at known checkpoints. In all of these cases the class taxonomy is closed, the operator knows what they are looking for, and the false positive cost of a well-tuned detector is acceptable.

Anomaly detection belongs in workflows where the relevant event is open-ended and rare. Perimeter monitoring at night on sites where the legitimate population is near zero. Detection of insider behaviour that does not match the class profile of an external attacker. Detection of vehicles operating outside expected paths, including a vehicle that the site owns and that the access control system has authorised but that is moving in a way that the baseline does not recognise. Detection of dwell, loitering, and reconnaissance behaviour, which according to NICB data and observations summarised by GDV on construction-site theft tends to precede industrial-grade incidents by hours or days. In all of these cases the class taxonomy cannot be written in advance, because the threat actor is actively trying to look like something normal.

In practice the architectures combine. An object detection layer provides the structured perception. An anomaly layer sits above it, reasoning over the time series of detections rather than over raw pixels. Person count plus vehicle type plus dwell time plus zone plus hour of day becomes a feature vector, and the anomaly model judges whether the vector is ordinary. This composite architecture is what CISA's guidance on critical infrastructure monitoring increasingly anticipates, and it is the architecture that the NIST Cybersecurity Framework 2.0 implicitly assumes when it talks about anomaly events under the Detect function.

The false positive economy

The hidden cost of every security analytics deployment is the false positive rate, and the two model families fail in opposite directions. Object detection fails by missing events that fall outside its taxonomy. The miss is silent. Nobody knows what was not flagged. Anomaly detection fails by flagging events that are unusual but harmless. The miss is loud. The operator sees every flag, the operations team complains, and within six months the system is either ignored or switched off.

The economics are well documented. A security operations centre handling more than roughly fifteen alerts per operator per hour begins to degrade in response quality. Above thirty alerts per hour, the operator stops reading the alerts and starts triaging by source rather than by content. The BSI's guidance on operational technology security mentions the same dynamic in process control environments. The implication is that an anomaly model deployed without rigorous tuning will, within weeks, cease to function as a security control. It will become noise. The vendor who sold it will blame the customer. The customer will blame the vendor. Neither will be wrong.

The way out of the false positive economy is not to choose between the two models, but to design the alert pipeline so that anomaly flags are filtered through object-detection context before they reach a human. An anomaly event that coincides with an object detection result matching a known threat class is escalated. An anomaly event with no object correlation is logged but not escalated. An object detection event in a routine zone at a routine time is suppressed. The result is a pipeline that produces fewer alerts than either model alone, with higher signal density. Building this pipeline is engineering work. It does not come out of the box. Anyone who sells it as out-of-the-box is selling a demonstration, not a system. The distinction between demonstration and serial operation runs through everything that gets fielded on real industrial sites.

What this means for procurement

Procurement specifications in physical security tend to list features rather than questions. The result is that operators end up with capability they cannot use and without capability they need. The correct procurement sequence reverses the order. First, the operator names the three to five events that, if missed, would cause material loss. Second, the operator names the three to five events that, if flagged falsely more than once a week, would cause the system to be ignored. Third, the operator names the operational rhythm of the site and the extent to which it is stable. Only after these three lists exist does the question of model architecture become answerable.

Sites with stable rhythms and well-defined high-cost events are object-detection sites. The taxonomy can be written. The detector can be trained on the relevant classes. The false positive budget can be held by tight rules. Construction sites in their stable phases, finished logistics yards under normal operations, fixed industrial perimeters with predictable shift patterns fall into this category. The operator gets the most value from precise, fast, low-noise class detection feeding a workflow.

Sites with shifting rhythms, where the threat profile cannot be enumerated, and where the legitimate population varies, are anomaly-detection sites. Construction sites during transitional phases, when crews and deliveries change weekly. Logistics nodes with mixed traffic. Critical infrastructure with low legitimate footfall and high consequence-of-loss. The operator gets the most value from a baseline-aware system that flags deviation, even when the deviation does not match a pre-defined class.

Sites with both characteristics, which in practice means most large industrial operations, are composite sites. The investment is in the pipeline, not in the model. The model is a component. The pipeline is the product. Operators who understand this buy differently from operators who do not. They ask vendors how the anomaly layer is baselined, how often it is retrained, what the documented false positive rate is on comparable sites, and how the object detection layer feeds into the anomaly layer's feature space. Vendors who cannot answer these questions are selling boxes.

What holds

Object detection is mature, useful, and limited by its taxonomy. Anomaly detection is mature, useful, and limited by its baseline. The choice between them is not a choice between technologies but a choice between questions. The operator who knows which question they are trying to answer will specify correctly. The operator who does not will end up with a system that classifies everything and prevents nothing.

The composite pipeline, where object detection feeds structured features into an anomaly layer, is the architecture that real industrial sites converge on once they have lived with either model in isolation for long enough to see its failure mode. The pipeline is engineering, not packaging. It is built once, tuned continuously, and judged on its alert quality over time. Anyone who promises a finished version of this pipeline without site-specific work is selling a demonstration. The serial product is the discipline of operating it.

For operators who want to test the question against their own sites without committing to a procurement cycle, the appropriate entry point is Path II, the three to five day audit. The audit looks at the existing camera estate, the current alert volume, the relationship between the events that have actually caused losses and the events that the current system was set up to detect, and the gap between the two. The deliverable is a written report that the operator can act on with or without further engagement. The question of which model architecture fits the site falls out of that analysis. It is not answered in advance, and it is not answered by datasheet.

Frequently asked questions

When do you need anomaly detection instead of object detection?

Anomaly detection is the right choice when the threat cannot be enumerated in advance, when the legitimate population is small and predictable, and when the cost of a missed event is high enough to justify a higher false positive rate. Perimeter security at night on low-traffic industrial sites, insider threat detection, reconnaissance behaviour ahead of organised theft. If the operator can write a closed list of classes that covers every relevant event, object detection is sufficient. If the list keeps growing every time a new incident occurs, anomaly detection is the architecture that holds.

How is an anomaly model trained?

The model observes the site during a defined baseline period, typically two to six weeks, and learns a statistical or neural representation of normal activity. Inputs include motion vectors, occupancy by zone and hour, dwell distributions, speed patterns, and increasingly the outputs of an upstream object detector. The baseline is then validated against historical incidents where available, and the detection threshold is tuned to the operator's false positive tolerance. The baseline is retrained on a defined cadence, usually quarterly, to account for operational drift. Without retraining, anomaly models degrade.

Do both models run on the same hardware?

Object detection runs efficiently on edge accelerators in the ten to thirty watt range, which is why it can be embedded in cameras and mobile units. Anomaly detection over short feature vectors runs on the same hardware. Anomaly detection over raw pixels, particularly with reconstruction-based architectures, requires more capable inference hardware, typically a small server at the site or a regional aggregation point. The composite pipeline runs the object detector at the edge and the anomaly layer one tier up, which keeps latency low and bandwidth contained.

How is performance benchmarked?

The honest benchmark is not detection accuracy on a public dataset but the precision and recall measured against the operator's own incident history over a defined period. Object detection is benchmarked by per-class precision and recall at a defined confidence threshold. Anomaly detection is benchmarked by true positive rate and false positive rate per camera per day, measured against ground truth established by the operator. Composite pipelines are benchmarked at the alert level, with the question being how many alerts reach a human per shift and what fraction of them require action. Benchmarks that do not reference the operator's own site are marketing material.

Dr. Raphael Nagel

About the author

Dr. Raphael Nagel (LL.M.) is founding partner of Tactical Management. He acquires and restructures industrial businesses in demanding market environments and writes on capital, geopolitics, and technological transformation. raphaelnagel.com

Since 1892.

The firm is reached at boswau-knauer.de or +49 711 806 53 427.