OWL-ViT is a zero-shot text-conditioned object detection model | Tensorflow(@CVision)
OWL-ViT is a zero-shot text-conditioned object detection model that allows querying images with text descriptions of unseen objects. It has impressive generalization capabilities and is on par with some of the state-of-the-art object detection models.