Other examples show it: perceiving from an image that a tennis player has a pony tail reading the time on an image of a clock face at 10:10 calculating the sum from an image of 4 + 5 answering 'what is TorchScale?' (which is a PyTorch machine-learning library), based on a GitHub description page and reading the heart rate from an Apple Watch face.Įach of the examples demonstrates a potential for MLLMs like Kosmos-1 to automate a task in multiple situations, from telling a Windows 10 user how to restart their computer (or any other task with a visual prompt), to reading a web page to initiate a web search, interpreting health data from a device, captioning images, and so on. The prompt is: 'Explain why this photo is funny?' Kosmos-1's answer is: "The cat is wearing a mask that gives the cat a smile." The demonstrations of Kosmos-1's outputs to prompts include an image of a kitten with a person holding a paper with a drawn smile over its mouth.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |