The New Zealand Office of the Privacy Commissioner (OPC) recently released the results of an inquiry into an experimental facial recognition system (FRT) used by Foodstuffs North Island (FSNI) across 25 supermarkets, scanning over 225.9 million faces.
The inquiry examined the trialβs privacy impact, assessed its compliance with the Privacy Act, and evaluated whether FRT was an effective tool in reducing serious retail crime compared to less privacy-intrusive alternatives.
To help interpret the findings and highlight the most noteworthy insights for facial recognition in retail, we asked 3DiVi Sales Manager Mikhaylo Pavlyuk to give his commentary on the reportβs key points and implications.
The inquiry examined the trialβs privacy impact, assessed its compliance with the Privacy Act, and evaluated whether FRT was an effective tool in reducing serious retail crime compared to less privacy-intrusive alternatives.
To help interpret the findings and highlight the most noteworthy insights for facial recognition in retail, we asked 3DiVi Sales Manager Mikhaylo Pavlyuk to give his commentary on the reportβs key points and implications.
Key Takeaways from the OPCβs Report
The report outlines key elements of FSNIβs operational model that allowed the company to claim compliance with the Privacy Act (para. 8 of the βFindings relating to the trialβ section). Several features of FSNIβs operational approach, as applied during the trial, support this conclusion.
A clear and limited purpose
FSNI used facial recognition only to identify individuals previously involved in serious incidents at stores: physical or verbal abuse, threats, aggressive behavior, and major thefts. Other uses were strictly prohibited.
Mikhaylo Pavlyuk: βThis restriction on who can be added to the watchlist makes sense for several reasons. For one, law enforcement typically isnβt interested in minor incidents, and without police involvement, retailers have limited options to act.β
The system was effective to address that purpose
Independent evaluations and interviews with store employees suggested that FRT was generally an effective tool for reducing the number of serious repeat offenses during the trial period.
Mikhaylo Pavlyuk: βItβs worth noting how βeffectivenessβ is quantified. In different countries and studies, a 10β20% reductionβor moreβis often considered the threshold for recognizing a method or program as effective.β
Fit for purpose technology
FSNI selected a system proven in real-world conditionsβwithout staged photos. The technology was not trained on a New Zealand dataset, as no such dataset exists. However, it was trained on similar groups in Australiaβincluding Maori and Pacific Islander populationsβwhich helped minimize the risk of technical bias.
Mikhaylo Pavlyuk: βThis is a complex point, and the justification is somewhat stretched. Training on similar populations in Australia does not guarantee fairness in New Zealand. Proving this would require extensive testing and validation locally.β
No use of images for system training
The software provider was explicitly prohibited from using the collected images for training purposes. The Privacy Commissioner supports the idea of creating a New Zealand-specific training dataset, but only with explicit consent from individuals.
Mikhaylo Pavlyuk: βNo issues hereβthis is a strong and responsible approach.β
Immediate deletion of most images
Images not matching the watchlist were deleted almost instantly, representing the majority of captured faces.
Mikhaylo Pavlyuk: βItβs good practice. The question is whether the biometric descriptors were deleted too, which affects auditability.β
Rapid deletion of images where no action taken
Matches with no follow-up were deleted by midnight of the same day.
Mikhaylo Pavlyuk: βThe term βmatchesβ is used here, but the same question appliesβwhat happens to the biometric template? Is it deleted as well?β
Watchlists were generally of reasonable quality and carefully controlled
Each store maintained its own watchlist. Only trained staff could add individuals, strictly following criteria for serious offences. Adding children or young people under 18, elderly, or persons with known mental health conditions was prohibited.
Mikhaylo Pavlyuk: βFrom a facial recognition system providerβs perspective, this store-by-store watchlist policy is ideal, though unusual for a business to adopt such rigorous controls.β
Retention of watchlist information was limited
Primary offenders could remain on the watchlist for no more than 2 years, while accomplices were listed for a maximum of 3 months. This approach helped ensure that information stayed relevant and reduced long-term consequences for individuals.
Mikhaylo Pavlyuk: βThe mention of βaccomplicesβ is confusing hereβit seems to contradict point (a), which limited inclusion strictly to serious offenders.β
Watchlists are not shared between stores
All watchlists were store-specific and not shared across other FSNI locations. This ensured that individuals were not automatically barred from every store in the network, allowing them continued access to food and other essentials.
Mikhaylo Pavlyuk: βFrom a facial recognition vendorβs perspective, this store-by-store watchlist policy is ideal. But itβs surprising that the business agreed to such a model.β
Accuracy levels were acceptable, once adjusted in response to problems
Initially, matches triggered alerts at 90% confidence. After two identification errors, FSNI raised the threshold to 92.5% and improved image quality, verification procedures, and staff training. No further incidents occurred.
Mikhaylo Pavlyuk: βThis part is not entirely clear. The report uses terminology that seems closer to marketing language than technical precisionβlikely because the research was carried out by a marketing agency. The term βconfidenceβ probably refers to a score, but to properly assess performance, weβd need details on false positives and false negatives at that threshold.β
Alerts were checked by two trained staff
Before the system was launched, staff were informed that it was not infallible. An alert required confirmation from at least two cameras, after which it was reviewed by two trained employees, who then decided whether to intervene or contact the police.
Mikhaylo Pavlyuk: βManual verification is already a good practice. Having a double layer of manual checks is even better.β
Reasonable degree of transparency that the FRT trial was operating
Stores displayed large A1/A0 signs at entrances to inform customers that the trial was in operation, with additional signage inside. Information was also published on the FSNI website, and further details were available at the customer information desk upon request. Staff involved in the trial were trained to answer questions, while other employees were trained to direct inquiries appropriately.
Mikhaylo Pavlyuk: βThis is a very valuable measure. Unfortunately, not every organization implementing FRT is willing to invest in proper staff training.β
No apparent bias or discrimination in how discretion was exercised
Based on sample checks, the Privacy Commissionerβs compliance team found no apparent bias or discrimination in how watchlists were created, how alerts were verified, or how intervention decisions were made.
Mikhaylo Pavlyuk: βThis point is debatable. According to NIST reports, every algorithm shows some variation in accuracy across ethnic groups. The real question is the frequency and scale of such errors. The report does not provide methodology or supporting data here, which makes it difficult to validate the claim.β
Processes for requests and complaints
Individuals who believed they had been misidentified or wrongly added to a watchlist were able to file complaints. If an error was confirmed, their information was corrected or removed.
Mikhaylo Pavlyuk: βThis is a very important element. A solid feedback and correction protocol is half the success of any such system.β
Security processes in place to protect information
Only authorised personnel had access to the system and the secure room where equipment was located. The FRT system was not automatically linked to the storeβs incident reporting platform; any information had to be transferred manually according to strict criteria. All access was logged and reviewed by the Loss Prevention Manager. FRT alerts could only be received on authorised devices operating within the in-store network.
Mikhaylo Pavlyuk: βHonestly, the argumentation here is not very strong. Letβs put it down to a lack of deep cybersecurity expertise.β
Good record-keeping about system operation
FSNI created records on key events: numbers of matches, alerts, outcomes of interventions (including reasons for action or inaction), customer reactions, and whether interventions prevented or escalated harmful behaviour. This record-keeping allowed FSNI to monitor the effectiveness of the system.
Mikhaylo Pavlyuk: βThis is a must-have function for any system of this kind, yet surprisingly many projects skip it. Ideally, there should also be a set of standard automatic performance metrics β hopefully FSNI had those in place.β
Stores have good security infrastructure and are committed to privacy measures
Stores were equipped with adequate CCTV coverage and dedicated rooms for FRT equipment. FSNI implemented policies and assigned personnel responsible for ensuring compliance with privacy requirements.
Mikhaylo Pavlyuk: βHaving CCTV alone does not necessarily indicate robust security infrastructure, and it certainly doesnβt guarantee strong attention to privacy. Still, weβll leave that assessment to information security experts.β
Following the trial, FSNI completed a detailed Privacy Impact Assessment (PIA) with the Office of the Privacy Commissioner (OPC), identifying key risks and implementing mitigation processes.
Further Improvements Needed
While the model largely complies with the Privacy Act, the inquiry highlighted several areas that need attention before FSNI can commit to long-term use or expand FRT:
Update the match algorithm so an alert is triggered at a higher accuracy level
Right now, the system flags matches at 90%, but staff are trained not to intervene below 92.5%. This gap needs to be fixed technically, and it might even make sense to aim higherβperhaps 94%.
Mikhaylo Pavlyuk: βJust keep in mind: raising the score threshold increases the risk of Type II errorsβcases where a person is in the database but the system fails to flag them.β
Watchlist criteria should remain consistent with store practice during the trial that targeted genuinely harmful behaviour
FRT should target only serious offensesβviolence, aggression, or major thefts. It must not be used for minor incidents or βproblematicβ individuals.
Mikhaylo Pavlyuk: βFrom my experience, most losses come from repeated minor thefts. This is something that should be discussed with the business and carefully documented. Individually small incidents can quickly add up to a significant total loss.β
Conclusion
Mikhaylo Pavlyuk: βOverall, excellent work has been done. As far as I know, itβs the first publicly available document of its kind, and it will be invaluable for retailers. My only notes: tighten up the facial recognition terminology and clarify some of the labels to avoid confusion. Small tweaks, but theyβll make a big difference in clarity.β
π Curious how biometrics can level up your business? Book your free consultation today.