Addressing Challenges In Building Web-Scale Content Classification Systems
Aditya Srinivas Timmaraju, Angli Liu, Pushkar Tripathi
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:19
Understanding the semantic meaning of content on the web through the lens of a taxonomy has many practical advantages. However, when building large-scale content classification systems, practitioners are faced with unique challenges involving finding the best ways to leverage the scale and variety of data available on internet platforms. We present learnings from our efforts in building a content classification system for multiple document types at Facebook using Multi-modal Transformers. We empirically demonstrate the effectiveness of multi-lingual, multi-modal and cross-document type learning. We describe effective strategies for exploiting weakly supervised signals as a pre-training step and show that they lead to significant gains in downstream classification accuracy. We also discuss label collection schemes that help minimize the amount of noise in collected data.