BOSWAU + KNAUER
All posts

Blog

Are Security Robots Effective? Real Operator Data, Not Vendor Claims

Three deployments, three sectors, three measurable outcomes. We report what works, what fails, and what should be tested before any new buyer commits.

Dr. Raphael Nagel

Dr. Raphael Nagel

December 30, 2025

Are Security Robots Effective? Real Operator Data, Not Vendor Claims

Effectiveness in mobile security robotics is not a property of the machine. It is a property of the deployment, measured against incidents the operator can name, in conditions the operator can describe.

The word "effective" has been so thoroughly worn down by vendor marketing that it no longer carries information. A robot is presented as effective when it runs. It is presented as effective when it patrols. It is presented as effective when its battery lasts a shift. None of these statements describes the only question a buyer should be asking, which is whether the device, in this site, against the incident types that have actually occurred here, produces measurable risk reduction at a cost the operator can defend in front of a board. That question requires evidence. The evidence, in turn, requires deployments that ran long enough to fail, and operators willing to report what failed.

This article reports from three deployments Boswau + Knauer has run or supervised over the last twenty-four months, in three sectors, with three different incident profiles. The accounts are deliberately uneven, because the deployments were. What follows is not a vendor portfolio. It is an operator briefing.

What effective means when it is measured properly

Effectiveness is a ratio. On one side stands the cost of the system, fully loaded. On the other stand the risk reductions the system actually produced, attributable through evidence rather than coincidence. Most published figures collapse one side or both. Cost is reported as purchase price, omitting integration, bandwidth, operator workstation labour, maintenance windows, firmware regression testing, and the time the security manager spent triaging false positives. Benefit is reported as patrols completed, or hours covered, neither of which is a security outcome.

The NIST Cybersecurity Framework 2.0 introduced the Govern function in part because organisations were measuring control activity rather than control outcome. The same gap exists in physical security. A robot that completes one hundred percent of its patrol cycles is not effective. A robot that, over twelve months, demonstrably contributed to the prevention or rapid escalation of incidents that would otherwise have caused loss, is effective. The first is operational. The second is what the operator pays for.

To measure the second, the deployment has to be instrumented before the device is turned on. Baseline incident rate. Baseline mean time from incident onset to operator awareness. Baseline mean time from awareness to verified response. Baseline false positive rate of the existing system. Baseline cost per verified incident. Without these numbers, post-deployment data is anecdote. With them, the comparison is honest, and sometimes uncomfortable for the vendor.

CISA, in its physical security guidance for critical infrastructure, treats detection latency and assessment confidence as the two variables that most directly determine response viability. IEC 62443 frames the same logic for industrial systems, separating the device, the zone, and the conduit. ISO 27001 treats the question as one of control effectiveness review. The intellectual apparatus exists. What is missing in most deployments is the operator's willingness to apply it before the purchase order is signed, rather than after the first incident embarrasses the budget.

In the three deployments that follow, the apparatus was applied. The results are not uniformly favourable to the technology, and that is precisely the point. An operator briefing that reports only successes is a sales document with a different cover.

Deployment one. A construction site in the German Rhine corridor

The site was a mixed-use development of approximately forty-eight thousand square metres, with two laydown areas, a tool container cluster, and a fuel storage zone. Project value in the lower nine-figure range. The general contractor had run the previous three projects with a conventional combination of fencing, lighting, and a two-person mobile patrol covering the period from twenty-two hundred to oh-five-hundred. Pre-deployment baseline established the following: forty-one documented incidents over the prior twelve months on a comparable site, of which twenty-three involved tool or copper theft, eleven involved vandalism, and seven involved unauthorised entry without confirmed loss. Insurance excess paid in the comparison year reached a figure that the project director described, accurately, as a second budget the company had stopped noticing.

The robot was deployed in combination with two mobile video towers and a single operator workstation in the regional control centre. The robot ran randomised routes through the laydown areas and the container cluster between twenty-one hundred and oh-six-hundred. Patrol patterns were varied weekly to defeat observation. Over the twelve-month deployment, documented incidents fell to fourteen. Of those, nine were detected and assessed within ninety seconds of onset, and in four of the nine, response was on site before the actors had cleared the perimeter. Tool theft, the largest single loss category in the baseline, fell by approximately seventy percent measured by replacement cost. Copper theft, which had been chronic, fell to two attempts, both interrupted.

What the device did not do is also worth recording. It did not prevent two incidents of after-hours vandalism inside an area its sensor cone did not reach, because that area had been excluded from the patrol plan on grounds of slope and surface debris. It did not resolve the chronic problem of subcontractor staff removing materials during legitimate working hours, which remained the largest qualitative loss category and which technology of this class is not designed to address. And it generated, in the first eight weeks, a false positive rate that consumed roughly four operator hours per night, until the wildlife and weather profiles had been incorporated into the analytics layer. After tuning, false positives stabilised at a level the operator described as manageable.

Net financial outcome, fully loaded, was a return on investment within the second project year. The general contractor has since standardised the configuration across two further sites.

Deployment two. A logistics yard in a Benelux distribution hub

The yard handled mixed freight, with a particular concentration of high-value electronics in transit. The operator had a mature security organisation, with a manned guardhouse, fixed CCTV, and access control that the NICB would have classified as solid for the asset class. The problem the operator wanted to solve was not perimeter breach. It was internal route surveillance during shift changes, when the guardhouse was understaffed and trailer-side dwell created exposure windows the existing fixed cameras could not cover at acceptable resolution.

Baseline figures were detailed. The operator had been tracking shrinkage by container class for four years. Loss events during shift change windows accounted for a disproportionate share of recorded shrinkage by value, despite representing a small share by time. The hypothesis was that a mobile asset, running unpredictable routes through the yard during precisely those windows, would shift the offender's risk calculation enough to displace activity, either off-site or to detectable channels.

The deployment ran for nine months. Two robots were used, with overlapping but non-identical patrol envelopes. Loss events during shift change windows fell sharply, by a margin the operator's internal analytics team treated as statistically meaningful given the sample. However, loss events during established daytime operations rose by a smaller but non-trivial amount, which the operator interpreted as displacement rather than reduction. Net shrinkage by value fell, but by less than the shift-change figure would have suggested in isolation.

This is the kind of finding that does not appear in vendor case studies, and it is the kind of finding operators need to plan against. A mobile robot does not abolish risk. It reorganises it. An operator who deploys without modelling displacement will overstate the system's effect and underinvest in the daytime controls that catch what gets pushed there. The same logic appears in NICB's reporting on cargo theft displacement and in ASIS International guidance on integrated security programmes. The robot was effective on the question it was deployed against. It was not a substitute for the rest of the security programme, and the operator never assumed it would be.

A second finding, less expected: the robots generated a body of high-quality video evidence that materially improved the operator's relationship with its primary insurer. Premium review at the next renewal incorporated the deployment as a recognised control, with a measurable effect on the rate quoted. This was not a planned outcome. It became, over the deployment period, one of the more durable economic benefits.

Deployment three. A municipal critical-infrastructure site

The third deployment is reported with less specificity, because the site falls within a category for which BSI guidance restricts public disclosure of configuration. The asset is a water-sector facility with a perimeter of approximately two kilometres, a small permanent staff, and a security model that, before deployment, relied on a contracted patrol service supplemented by alarm-triggered response from a regional dispatcher. The operator's concern was not theft. It was sabotage and reconnaissance, in a threat environment that GDV reporting and BSI advisories have flagged as elevated for the sector.

The robot was deployed in combination with a fixed sensor array and a video analytics layer trained on the specific signatures of perimeter approach behaviour. The deployment ran for twelve months. Across that period, the system generated three escalations the operator classified as substantive, of which one resulted in a confirmed unauthorised approach and a handover to the responsible authorities. The other two were resolved as non-malicious without dispatch of response, which the operator treated as a positive outcome, because it indicated the analytics layer was distinguishing approach from intrusion at a useful threshold.

The deployment also revealed, within the first quarter, a vulnerability the operator had not previously identified, in a section of perimeter where vegetation growth had degraded the line of sight of the fixed installation. The robot's mobile vantage point surfaced the gap. The operator closed it. This was not, strictly, a robot capability. It was a deployment capability. A different sensor, mounted on a different platform, might have produced the same finding. The point is that mobile assessment generated an insight that fixed assessment had not, over years.

What the deployment did not do, again, deserves recording. It did not eliminate the need for the contracted patrol. It did not reduce headcount. What it did was redirect human attention from routine perimeter coverage to exception handling and incident assessment, which is the redistribution that integrated programmes, as described in NIST 800-53 control families for physical and environmental protection, are designed to enable. The operator's security manager characterised the outcome as a meaningful step toward the maturity level the operator's board had been asking for, without changing the manning budget.

What these three deployments together suggest

Three sites, three sectors, three different incident profiles. The findings do not collapse into a single claim about effectiveness, and the operator who pretends they do should be doubted. What they suggest, in combination, is the following.

Effectiveness depends on alignment between the device's capability envelope and the incident types that actually occur on the site. A robot deployed against an incident profile it cannot influence will fail. A robot deployed against an incident profile it is designed to influence, with the supporting sensor and analytics layer properly tuned, can produce loss reductions in the range that justifies the investment within two to three years for sites of meaningful scale.

Effectiveness depends on baseline measurement before deployment. Without it, the operator cannot tell whether the device produced the change, or whether the change reflected something else in the environment, from weather to market conditions to changes in the offender population. The argument for baselining is not academic. It is the only defence against the vendor narrative that whatever happened after deployment was caused by the deployment.

Effectiveness depends on operator competence at the workstation. The robot does not act on its own authority. It feeds an operator who decides. An operator centre running too many feeds with too little training will degrade the effectiveness of even the best-tuned device. Conversely, a well-staffed centre with disciplined escalation procedures will extract value from devices the vendor would not have rated highly.

Effectiveness, finally, depends on the operator's willingness to report what did not work. The author's manuscript, BOSWAU + KNAUER. From Building to Security Technology, makes this point in the context of construction: a project director who only reports successful work cannot improve. The same applies to security technology procurement. A buyer who only collects vendor success stories has no instrument with which to measure the next purchase.

How to benchmark before signing

A buyer evaluating mobile security robotics should require, before any purchase or pilot, the following: a defined baseline period of at least six months with documented incident data; a written hypothesis about which incident types the device is expected to influence and which it is not; a defined measurement plan with metrics for detection latency, assessment confidence, false positive rate, response time, and verified incident reduction; a defined cost model that includes integration, bandwidth, workstation labour, maintenance, and firmware management over a three-year horizon; and a defined exit criterion, so that if the deployment fails its hypothesis, the buyer is not trapped by sunk cost into extending it.

This is the structure of a serious pilot. It is the structure Boswau + Knauer applies in its ninety-day pilot programme, and it is the structure CISA, IEC 62443, and ISO 27001 all converge toward when the question is whether a security control is doing what it was purchased to do. An operator who insists on this structure will be quoted slower delivery timelines and higher pilot costs than operators who do not. The operator who insists will end the pilot knowing what the device does. The operator who does not, will not.

What holds

The honest answer to the question of whether security robots are effective is conditional. They are effective when the deployment matches the threat, when the baseline was measured before the device arrived, when the operator workstation is competent, and when the contract permits an exit if the hypothesis fails. They are ineffective, or worse, when any of these conditions is absent. The figure is not a property of the robot. It is a property of the procurement discipline.

Operators considering a deployment have, in the manuscript referenced above, a description of three paths into a conversation with the manufacturer. The most fitting for a buyer at the evidence-gathering stage is Path II, the three-to-five-day audit, in which the existing security posture is assessed against the threat profile, and the question of whether mobile robotics fits is answered before a pilot is scoped. Operators who already know the fit, and want to test it, move to Path III, the ninety-day pilot under the structure described above. Operators not yet sure they need either, take Path I, the sixty-minute confidential conversation. None of these paths produces a sale by itself. They produce, in sequence, the evidence on which a defensible decision can be made.

Frequently asked questions

How effective are security robots in real industrial deployments?

Effectiveness varies by sector, threat profile, and deployment discipline. In the three deployments reported above, loss reductions ranged from approximately seventy percent on the dominant incident category at the construction site, to a meaningful but smaller net reduction at the logistics yard once displacement was accounted for, to a qualitative improvement in detection capability at the critical infrastructure site without a reduction in manning. The honest figure is conditional, not absolute. Operators who report uniformly high effectiveness without baseline data should be doubted. Effectiveness is a measured outcome, not a brochure claim.

What incident types do robots detect best?

Mobile security robots, combined with a properly tuned analytics layer, perform best against perimeter approach, unauthorised entry into instrumented zones, after-hours movement in areas with predictable empty-state baselines, and behaviours with clear visual or thermal signatures, such as tool removal from defined storage areas. They are particularly strong where the alternative is a human patrol with predictable timing, because randomised mobile patrol changes the offender's risk calculation. They contribute high-quality evidence to insurance and legal processes, which is a benefit operators often discover after deployment rather than before.

What incident types do robots miss?

They miss, or contribute weakly to, incidents inside their sensor exclusion zones, incidents during legitimate working hours involving authorised personnel, incidents that displace to channels the device does not cover, and incidents involving social engineering at staffed gates or reception points. They also miss what they have not been trained to recognise, which is why analytics tuning during the first deployment quarter is not optional. An operator who treats the device as a complete security control rather than a component of an integrated programme will be disappointed, and the disappointment will not be the device's fault.

How is effectiveness benchmarked?

Effectiveness is benchmarked against a measured baseline, established before deployment, covering at least six months of documented incident data, broken down by category, time window, and asset class. Post-deployment data is compared against the baseline using metrics for detection latency, assessment confidence, false positive rate, response time, verified incident reduction, and fully loaded cost. Authority anchors for the framework include NIST CSF 2.0, NIST 800-53 physical protection controls, IEC 62443 for industrial environments, ISO 27001 for control effectiveness review, and CISA physical security guidance for critical infrastructure operators.

Dr. Raphael Nagel

About the author

Dr. Raphael Nagel (LL.M.) is founding partner of Tactical Management. He acquires and restructures industrial businesses in demanding market environments and writes on capital, geopolitics, and technological transformation. raphaelnagel.com

Since 1892.

The firm is reached at boswau-knauer.de or +49 711 806 53 427.