Image description technology in Facebook

Several years ago, Facebook introduced innovative new technology for screen reader users called AAT that can generate text-based descriptions for photographs that are missing them. They have continued to improve this technology and it is now taking a big leap forward.

Before looking at the details of this below, please note that our podcast team looked at this in detail in the latest edition of the EBU Access Cast, so take the time to check it out for their independent analysis!

AAT is now able to recognize over 1,200 concepts – more than 10 times as many as before. AAT is available for photos on Facebook for iOS and Android in News Feed, profiles and groups and when a photo is open in the "detail" view (where an image appears full screen against a black background). It is also available on Instagram for iOS and Android for photos in Feed, Explore and Profile. 

They’re also unveiling a new, additional AAT feature on Facebook for iOS and Android that provides a more detailed image description, including a count of recognized objects, the position of objects in a photo, and their relative size. The detailed description can be accessed using a long press (Android) or a ‘custom action’ (iOS) on a photo. This design is based on early user feedback, enabling people to hear more about the photos they are most interested in, without slowing them down by providing too much information about every photo they encounter.

They have also published a Newsroom blog post – Using AI to Improve Photo Descriptions for People who are Blind and Visually Impaired – to introduce these improvements and describe how They accomplished this work. Here are some of the highlights from the blog post:

  • Expanded Object Recognition. AAT uses our latest AI models and now recognizes over 1,200 concepts – more than 10 times as many as the original version they launched in 2016. AAT can now identify activities, landmarks, types of animals, etc. They accomplished this by training the AAT model on billions of public Instagram images and their hashtags.
  • Keeping It Simple. AAT uses simple phrasing for its default description rather than a long, flowy sentence. For example, an AAT description might say “May be a selfie of 2 people, outdoors, The Leaning Tower of Pisa.” They’ve found users can read and understand the description quickly. 
  • Global Availability. The alt text descriptions are available in 45 different languages, helping ensure that AAT is useful to people around the world.
  • Detailed Image Descriptions. Screen reader users of the Facebook app can now access detailed image descriptions of a photo that provide additional context, such as a count of the recognized objects, positional information, and a comparison of the relative prominence of objects in an image.
  • Accuracy and Reliability. As they consulted with screen reader users about AAT and how best to improve it, they made it clear that accuracy is paramount. To that end, they’ve only included concepts where they could ensure well-trained models with a high threshold of accuracy. 
  • Reducing Bias. They continue to improve machine learning models, sampling data from images across all geographies, and using translations of hashtags in many languages to reduce bias. They also evaluated our concepts along gender, skin tone, and age axes. The resulting models are both more accurate and culturally and demographically inclusive.