- Redmond WA, US Young Jin YUN - San Francisco CA, US Sneha CHAUDHARI - Santa Clara CA, US Mahesh Subhash JOSHI - Belmont CA, US Gungor POLATKAN - San Jose CA, US Gautam BOROOAH - Oakland CA, US
International Classification:
G06F 16/28 G06N 20/00
Abstract:
Efficient tagging of content items using content embeddings are provided. In one technique, multiple content items are stored a content embedding for content item is stored. Entity names are also stored along with an entity name embedding for each entity name. For each content item, (1) multiple content embeddings that are associated with the content item are identified; (2) a subset of the entity names is identified; and (3) for each entity name in the subset, (i) an embedding of the entity name is identified, (ii) similarity measures are generated based on the entity name embedding and the multiple content embeddings, (iii), a distribution of the similarity measures is generated, (iv) feature values are generated based on the distribution, (v) the feature values are input into a machine-learned classifier, and (vi) based on output from the classifier, it is determined whether to associate the entity name with the content item.