来源:《新科学家》
原文见刊日期:2022年2月12日
Privacy policies have become longer and less readable, and require more access to user data for the organisations that write them, an analysis of 25 years of documents shows.
一项对25年来文档的分析显示,隐私政策变得越来越长,可读性越来越差,要求编写隐私政策的组织更多地访问用户数据。
Isabel Wagner at De Montfort University in UK, gathered 50,000 privacy policy texts by trawling some of the most visited websites in the world for their privacy policies. She also delved into their history dating back to 1996 using the Internet Archive’s Wayback Machine, which hosts historical versions of web pages.
英国德蒙福特大学的伊莎贝尔·瓦格纳通过搜索世界上访问量最大的一些网站的隐私政策,收集了5万个隐私政策文本。她还利用互联网档案馆的Wayback机器,深入研究了这些文本追溯到1996年的历史。Wayback机器保存着网页的历史版本。
She analysed the data using a machine-learning tool called BERT, developed by Google in 2018, that does natural language processing. BERT is able to quickly analyse large amounts of data for patterns.
她使用谷歌于2018年开发的一种名为BERT的机器学习工具分析了数据,该工具可以进行自然语言处理。BERT能够快速分析大量数据的模式。
Wagner’s work was triggered by a recognition of her own habits. “As a researcher who works on privacy, I find myself agreeing to privacy policies but not reading them,” she says.
瓦格纳的工作是由她对自己习惯的认识引发的。她说:“作为一名研究隐私的研究人员,我发现自己会在不阅读的情况下就同意隐私政策。”
She found that the average privacy policy nearly quadrupled in length between 2000 and 2021. The average policy was 1146 words long in 2000, 2159 words long in March 2011 and 4191 words long in March 2021.
她发现,2000年至2021年期间,隐私政策的平均长度几乎翻了两番。2000年的平均长度为1146词,2011年3月的平均长度为2159词,2021年3月的平均长度为4191词。
Analysing the texts on a month-by-month basis, Wagner found that their length ballooned in around May 2018 – when the European Union’s General Data Protection Regulation (GDPR), a set of laws designed to protect consumers’ data, came into effect – and at the start of 2020, when California introduced similar rules.
瓦格纳逐月分析了这些文本,发现它们的长度在2018年5月左右激增——当时欧盟的《通用数据保护条例》(GDPR)生效,这是一套旨在保护消费者数据的法律——2020年初,加利福尼亚州也引入了类似法规。
As privacy policies have got longer, they have also become more complicated. According to the Flesch reading ease scale, which measures the readability of text, Wagner found that privacy policies written in 2021 had scores similar to academic papers written for the likes of the Harvard Law Review.
随着隐私政策变得越来越长,它们也变得越来越复杂。根据衡量文本可读性的Flesch阅读轻松度量表,瓦格纳发现,2021年撰写的隐私政策的得分与为《哈佛法律评论》等杂志上撰写的学术论文相似。
“I think privacy policies, from a user’s point of view, are fundamentally broken,” says Wagner. She suggests that until policies are dramatically simplified, machine learning could help users sift through the jargon.
“我认为,从用户的角度来看,隐私政策已经从根本上被打破了,”瓦格纳说。她认为,在隐私政策得到极大简化之前,机器学习可以帮助用户仔细检查术语。
Lilian Edwards at Newcastle University, UK, says that such analyses highlight the issues with impenetrable privacy policies, an issue she says is particularly prevalent in the US. “Hopefully that situation will not now last forever.”
英国纽卡斯尔大学的莉莲•爱德华兹表示,此类分析突显出隐私政策难以理解的问题,她表示,这一问题在美国尤为普遍。“希望这种情况不会永远持续下去。”