Welcome All Technologies Browse by Category

Using Scenarios for Improved Computer Vision

Technology #2019-022

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Image Gallery
Dimitris Metaxas, P.h.D.
Dr. Dimitris Metaxas is a Distinguished Professor in the Department of Computer Science at Rutgers University since July 2007. From September 2001 to June 2007 he was a Professor in the same department. He is currently directing the Center for Computational Biomedicine, Imaging and Modeling (CBIM).
External Link (
Zachary Daniels
P.h.D. student at Rutgers University C.S. Department
External Link (
Managed By
Andrea Dick
Assistant Director, Licensing 848-932-4018
Patent Protection

Provisional Patent Application Filed

The ability for computational agents to reason about the high-level content of real-world scene images is important for many applications. For example, for many applications (e.g. robotics, human-machine teaming,  surveillance, and autonomous vehicles), an agent must reason about the high-level content of real-world scene images in order to make rational, grounded decisions that can be trusted by humans. It is often also necessary to have models that can be interpreted by humans in order to further encourage trust and allow humans to understand the failure modes of the autonomous agent. For example, if a self-driving car makes an error, it is important to know what caused the error to prevent future situations where similar errors might arise.


Researchers at Rutgers University have introduced “Scenarios” as a new way of representing scenes in images. Useful for a wide range of scene understanding tasks,  the scenario is an easy-to-interpret, low-dimensional, data-driven representation consisting of sets of frequently co-occurring objects. Scenarios are learned from data using a novel matrix factorization method which is integrated into new neural network architecture, the “ScenarioNet”. Using ScenarioNets, semantic information about real-world scene images are recovered at three levels of granularity: 1) scene categories,  2) scenarios, and 3)objects. Training a single ScenarioNet model enables scene classification, scenario recognition, multi-object recognition, content-based scene image retrieval, and content-based image comparison. Use of this scene understanding technology enables recognizing scenes (e.g., in images and videos) as well as explaining the reasons for the recognition in a human-understandable form. 


  • It provides the ability to explain decisions and actions made by artificial intelligence (AI) processes. 
  • Able to support safety-critical tasks and tasks involving human-machine teaming. (e.g. to   understand what caused an error in order to prevent future situations where similar errors might arise.)
  • The ScenarioNet neural network structure is efficient, requiring significantly fewer parameters than other convolutional neural networks while achieving similar performance on benchmark tasks; and is interpretable because it can produce human-understandable explanations for every decision. 


The technology-enabled by this invention is applicable to areas including Human-machine teaming, robotics, medical image diagnostics, surveillance, and autonomous vehicles. 

Property & Development Status: 

Patent pending. This technology is available for licensing and/or research collaboration with industry partners.