Tommer Leyvand - Homepage

Old Personal Publications

Combining Body Pose, Gaze, and Gesture to Determine Intention to Interact in Vision-Based Interfaces
Julia Schwarz, Charles Marais, Tommer Leyvand, Scott E. Hudson, Jennifer Mankoff
CHI 2014
Vision-based interfaces, such as those made popular by the Microsoft Kinect, suffer from the Midas Touch problem: every user motion can be interpreted as an interaction. In response, we developed an algorithm that combines facial features, body pose and motion to approximate a user’s intention to interact with the system. We show how this can be used to determine when to pay attention to a user’s actions and when to ignore them. To demonstrate the value of our approach, we present results from a 30-person lab study conducted to compare four engagement algorithms in single and multi-user scenarios. We found that combining intention to interact with a “raise an open hand in front f you” gesture yielded the best results. The latter approach offers a 12% improvement in accuracy and a 20% reduction in time to engage over a baseline “wave to engage” gesture currently used on the Xbox 360. More ...

Exemplar-Based Human Action Pose Correction and Tagging
Wei Shen, Ke Deng, Xiang Bai, Tommer Leyvand, Baining Guo, and Zhuowen Tu
CVPR 2012
The launch of Xbox Kinect has built a very successful computer vision product and made a big impact to the gaming industry; this sheds lights onto a wide variety of potential applications related to action recognition. The accurate estimation of human poses from the depth image is universally a critical step. However, existing pose estimation systems exhibit failures when faced severe occlusion. In this paper, we propose an exemplar-based method to learn to correct the initially estimated poses. We learn an inhomogeneous systematic bias by leveraging the exemplar information within specific human action domain. Our algorithm is illustrated on both joint-based skeleton correction and tag prediction. In the experiments, significant improvement is observed over the contemporary approaches, including what is delivered by the current Kinect system.

Kinect Identity: Technology and Experience
Tommer Leyvand, Casey Meekhof, Yi-Chen Wei, Jian Sun, and Baining Guo
IEEE Computer, vol. 44, no. 4, pp. 94-96. 2011.
This IEEE Computer article is a high-level introduction to how Kinect performs player identity recognition on the Xbox 360, what we call 'Kinect Identity'.

Additional details and references to related facial-recognition publications are available here. MSR video is available here.

	Data-Driven Enhancement of Facial Attractiveness Tommer Leyvand, Daniel Cohen-Or, Gideon Dror and Dani Lischinski ACM SIGGRAPH 2008
	In this work we focus on the challenging problem of enhancing the aesthetic appeal (or the attractiveness) of human faces in frontal photographs (portraits), while maintaining close similarity with the original. The key component in our approach is an automatic facial attractiveness engine trained on datasets of faces with accompanying facial attractiveness ratings collected from groups of human raters. More ... Digital Face Beautification SIGGRAPH 2006, Technical Sketch page (here)

	A Machine Learning Predictor of Facial Attractiveness Revealing Human-Like Psychophysical Biases Amit Kagian, , Gideon Dror, Tommer Leyvand, Isaac Meilijson, Daniel Cohen-Or, Eytan Ruppin Vision Research 48 (2008) 235–243
	Recent psychological studies have strongly suggested that humans share common visual preferences for facial attractiveness. Here, we present a learning model that automatically extracts measurements of facial features from raw images and obtains human-level performance in predicting facial attractiveness ratings. The machine’s ratings are highly correlated with mean human ratings, markedly improving on recent machine learning studies of this task. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine’s judgments that are remarkably similar to those of humans. Thus, a model trained explicitly to capture a specific operational performance criteria, implicitly captures basic human psychophysical characteristics.

	Color Harmonization Daniel Cohen-Or, Olga Sorkine, Ran Gal, Tommer Leyvand and Ying-Qing Xu ACM SIGGRAPH 2006
	Harmonic colors are sets of colors that are aesthetically pleasing in terms of human visual perception. In this paper, we present a method that enhances the harmony among the colors of a given photograph or of a general image, while remaining faithful, as much as possible, to the original colors. Given a color image, our method finds the best harmonic scheme for the image colors. It then allows a graceful shifting of hue values so as to fit the harmonic scheme while considering spatial coherence among colors of neighboring pixels using an optimization technique. The results demonstrate that our method is capable of automatically enhancing the color "look-and-feel" of an ordinary image.

	Interactive Object Segmentation in Video by Fitting Splines to Graph Cuts Iddo Drori, Tommer Leyvand, Daniel Cohen-Or and Hezy Yeshurun ACM SIGGRAPH 2004 Posters Session
	Object segmentation in image sequences is one of the fundamental problems in computer vision and graphics. This problem is usually addressed either by discrete representations which are currently manifested by graph partitioning techniques, or by continuous methods typically referred to as active contours. In this work we take a unified approach by fitting splines to graph cuts. The strengths of this approach stem from the dual discrete and continuous representations and from allowing the user to refine the result of the cut by fitting a new spline to it and modifying its points. More ...

Video Operations in the Gradient Domain
Iddo Drori, Tommer Leyvand, Shachar Fleishman, Daniel Cohen-Or and Hezy Yeshurun
Technical Report, May 2004

Fusion of image sequences is a fundamental operation in numerous video applications and usually
consists of segmentation, matting and compositing. We present a unified framework for performing
these operations on video in the gradient domain. Our approach consists of 3D graph cut computation followed by reconstruction of a new 3D vector field by solving the Poisson equation. We demonstrate the applicability of smooth video transitions by fusing pairs for video mosaics, video folding, and video texture synthesis, and demonstrate the applicability of sharp video transitions by video segmentation, video trimap extraction and 3D compositing into a new sequence. Our results demonstrate that our method maintains coherence of the video matte and composite, and avoids temporal artifacts. More ...

	Ray Space Factorization for From-Region Visibility Tommer Leyvand, Olga Sorkine and Daniel Cohen-Or ACM SIGGRAPH 2003
	This paper present a conservative occlusion culling method based on factorizing the 4D from-region visibility problem into horizontal and vertical components. The visibility of the two components is solved asymmetrically: the horizontal component is based on a parameterization of the ray space, and the visibility of the vertical component is solved by incrementally merging umbrae. The technique is designed so that the horizontal and vertical operations can be efficiently realized together by modern graphics hardware. More ...

Projects

Advanced Topic in Computer Graphics / Spring 2004: Exercise 1 - Poisson Image Editing
This exercise is an introduction to gradient domain image editing. We start with the simpler smooth image completion operation (an example input/out pair is on the left). We continue to describe the poisson image cloning technique that involves cloning pixel-gradients instead of pixel values and usually results in a smoother blending. The exercise material includes the presentation slides and full solution source-code. More ...

CityGen - Procedural Urban Model Generator

CityGen is a procedural 3D model generator application aimed for generating random urban models. These models are generated from an XML construction file using several simple operations and random inputs. Developed as a side project from my "Ray-Space Factorization for From-Region Visibility" paper. More ...

07/07/2005, V0.9 of CityGen released (download here)

Online source-code:

Advanced computer graphics exercise 1 solution, "Poisson Image Editing"
imagelib - A simple, lightweight image library support common image file formats.
Reranking search engine with source code

	Tommer Leyvand
	Ambitious and experienced software engineering leader with a track record of driving innovation and execution of product and platform teams with a main applied-research component. Currently Senior Director of Engineering at Apple, leading the Video Computer Vision (VCV) organization, a centralized applied research and engineering organization responsible for developing real-time on-device Computer Vision and Machine Perception technologies across Apple products. Previously leading the AI Camera group at Facebook, shipping Oculus Quest and Spark AR computer vision technologies. Prior to that at Microsoft leading software development at HoloLens and Kinect (Xbox 360, Xbox One and Windows).

	Combining Body Pose, Gaze, and Gesture to Determine Intention to Interact in Vision-Based Interfaces Julia Schwarz, Charles Marais, Tommer Leyvand, Scott E. Hudson, Jennifer Mankoff CHI 2014
	Vision-based interfaces, such as those made popular by the Microsoft Kinect, suffer from the Midas Touch problem: every user motion can be interpreted as an interaction. In response, we developed an algorithm that combines facial features, body pose and motion to approximate a user’s intention to interact with the system. We show how this can be used to determine when to pay attention to a user’s actions and when to ignore them. To demonstrate the value of our approach, we present results from a 30-person lab study conducted to compare four engagement algorithms in single and multi-user scenarios. We found that combining intention to interact with a “raise an open hand in front f you” gesture yielded the best results. The latter approach offers a 12% improvement in accuracy and a 20% reduction in time to engage over a baseline “wave to engage” gesture currently used on the Xbox 360. More ...

	Exemplar-Based Human Action Pose Correction and Tagging Wei Shen, Ke Deng, Xiang Bai, Tommer Leyvand, Baining Guo, and Zhuowen Tu CVPR 2012
	The launch of Xbox Kinect has built a very successful computer vision product and made a big impact to the gaming industry; this sheds lights onto a wide variety of potential applications related to action recognition. The accurate estimation of human poses from the depth image is universally a critical step. However, existing pose estimation systems exhibit failures when faced severe occlusion. In this paper, we propose an exemplar-based method to learn to correct the initially estimated poses. We learn an inhomogeneous systematic bias by leveraging the exemplar information within specific human action domain. Our algorithm is illustrated on both joint-based skeleton correction and tag prediction. In the experiments, significant improvement is observed over the contemporary approaches, including what is delivered by the current Kinect system.

	Kinect Identity: Technology and Experience Tommer Leyvand, Casey Meekhof, Yi-Chen Wei, Jian Sun, and Baining Guo IEEE Computer, vol. 44, no. 4, pp. 94-96. 2011.
	This IEEE Computer article is a high-level introduction to how Kinect performs player identity recognition on the Xbox 360, what we call 'Kinect Identity'. Additional details and references to related facial-recognition publications are available here. MSR video is available here.