Facebook's Human Content Moderators Pose Privacy Questions

Human-powered content labeling is a growth industry as companies seek to harness data for AI training and other purposes. But it is also raising privacy concerns.

Self-driving car companies such as Alphabet Inc’s Waymo have labelers identify traffic lights and pedestrians in videos to fortify their AI. Voice assistant developers including Amazon.com Inc have people annotate customer audio to improve AI’s ability to decipher speech.

Social media giants are also using humans for content moderation, anti-money laundering and data analytics, as they faces increased global pressure and scrutiny to curb rumours as well as fraud on the platform.

Facebook has been working with Indian firm Wipro for content moderation. According to Reuters, a team of as many as 260 contract workers in Hyderabad, India has ploughed through millions of Facebook Inc photos, status updates and other content posted since 2014.

The workers categorize items according to five “dimensions,” as Facebook calls them. These include the subject of the post - is it food, for example, or a selfie or an animal? What is the occasion - an everyday activity or major life event? And what is the author’s intention - to plan an event, to inspire, to make a joke?

The work is aimed at understanding how the types of things users post on its services are changing, Facebook said. That can help the company develop new features, potentially increasing usage and ad revenue.

Facebook has confirmed many details of the Wipro project. The company says the data labeling efforts are aimed at “training” the software that determines what appears in users’ news feeds and powers the artificial intelligence underlying many other features.

The Wipro workers gain a window into lives as they view a vacation photo or a post memorializing a deceased family member. Facebook acknowledged that some posts, including screenshots and those with comments, may include user names.

The Wipro labelers and Facebook said the posts are a random sampling of text-based status updates, shared links, event posts, Stories feature uploads, videos and photos, including user-posted screenshots of chats on Facebook’s various messaging apps. The posts come from Facebook and Instagram users globally, in languages including English, Hindi and Arabic.

The content labeling program could raise new privacy issues for Facebook. The company is facing regulatory investigations worldwide over an unrelated set of alleged privacy abuses involving the sharing of user data with business partners.

Facebook said its legal and privacy teams must sign off on all labeling efforts, adding that it recently introduced an auditing system “to ensure that privacy expectations are being followed and parameters in place are working as expected.”

But users’ posts are being scrutinized without their explicit permission. The European Union’s year-old General Data Protection Regulation (GDPR) has strict rules about how companies gather and use personal data and in many cases requires specific consent.

If the purpose is looking at posts to improve the precision of services, that should be stated explicitly. Using an outside vendor for the work could also require consent.

Facebook says that it makes it clear in its data policy that the company uses the information people provide to Facebook "to improve their experience and that we might work with service providers to help in this process."