I develop AI and computer vision algorithms that integrate camera design with advanced imaging processes, pushing the boundaries of computational photography.
Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance the capabilities of existing cameras. Recently, several works addressed this task using conventional imaging and deep learning. Yet, such purely digital methods are inherently limited, due to direction ambiguity and noise sensitivity. Some works attempt to address these limitations with non-conventional image sensors, however, such sensors are extremely rare and expensive. To circumvent these limitations by simpler means, we propose a hybrid optical-digital method for video reconstruction that requires only simple modifications to existing optical systems. We use learned dynamic phase-coding in the lens aperture during image acquisition to encode motion trajectories, which serve as prior information for the video reconstruction process. The proposed computational camera generates a sharp frame burst of the scene at various frame rates from a single coded motion-blurred image, using an image-to-video convolutional neural network. We present advantages and improved performance compared to existing methods, with both simulations and a real-world camera prototype. We extend our optical coding to video frame interpolation and present robust and improved results for noisy videos.
@article{yosef2023video,title={Video reconstruction from a single motion blurred image using learned dynamic phase coding},author={Yosef, Erez and Elmalem, Shay and Giryes, Raja},journal={Scientific Reports},volume={13},number={1},pages={13625},year={2023},publisher={Nature Publishing Group UK London},paper={https://www.nature.com/articles/s41598-023-40297-0},}
Journal of Optics
Deep learning in optics-a tutorial
Barak Hadad, Sahar Froim, Erez Yosef, and 2 more authors
In recent years, machine learning and deep neural networks applications have experienced a remarkable surge in the field of physics, with optics being no exception. This tutorial aims to offer a fundamental introduction to the utilization of deep learning in optics, catering specifically to newcomers. Within this tutorial, we cover essential concepts, survey the field, and provide guidelines for the creation and deployment of artificial neural network architectures tailored to optical problems.
@article{hadad2023deep,title={Deep learning in optics-a tutorial},author={Hadad, Barak and Froim, Sahar and Yosef, Erez and Giryes, Raja and Bahabad, Alon},journal={Journal of Optics},year={2023},paper={https://iopscience.iop.org/article/10.1088/2040-8986/ad08dc},doi={10.1088/2040-8986/ad08dc},}
CVPR 24
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation
Lior Talker, Aviad Cohen, Erez Yosef, and 2 more authors
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. Recently, LIDAR-supervised methods have achieved remarkable per-pixel depth accuracy in outdoor scenes. However, significant errors are typically found in the proximity of depth discontinuities, i.e., depth edges, which often hinder the performance of depth-dependent applications that are sensitive to such inaccuracies, e.g., novel view synthesis and augmented reality. Since direct supervision for the location of depth edges is typically unavailable in sparse LIDAR-based scenes, encouraging the MDE model to produce correct depth edges is not straightforward. To the best of our knowledge this paper is the first attempt to address the depth edges issue for LIDAR-supervised scenes. In this work we propose to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training. %Despite the ’domain gap’ between synthetic and real data, we show that depth edges that are estimated directly are significantly more accurate than the ones that emerge indirectly from the MDE training. To quantitatively evaluate our approach, and due to the lack of depth edges ground truth in LIDAR-based scenes, we manually annotated subsets of the KITTI and the DDAD datasets with depth edges ground truth. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
@inproceedings{talker2022mind,title={Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation},author={Talker, Lior and Cohen, Aviad and Yosef, Erez and Dana, Alexandra and Dinerstein, Michael},booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},year={2024},}
arXiv
DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
The flat lensless camera design reduces the camera size and weight significantly. In this design, the camera lens is replaced by another optical element that interferes with the incoming light. The image is recovered from the raw sensor measurements using a reconstruction algorithm. Yet, the quality of the reconstructed images is not satisfactory. To mitigate this, we propose utilizing a pre-trained diffusion model with a control network and a learned separable transformation for reconstruction. This allows us to build a prototype flat camera with high-quality imaging, presenting state-of-the-art results in both terms of quality and perceptuality. We demonstrate its ability to leverage also textual descriptions of the captured scene to further enhance reconstruction. Our reconstruction method which leverages the strong capabilities of a pre-trained diffusion model can be used in other imaging systems for improved reconstruction results.
@article{yosef2025difuzcam,title={DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model},author={Yosef, Erez and Giryes, Raja},journal={arXiv preprint arXiv:2408.07541},year={2024},}
IEEE OJSP
Tell Me What You See: Text-Guided Real-World Image Denoising
Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene’s noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images. All code and data will be made publicly available upon publication.
@article{yosef2025tell,title={Tell Me What You See: Text-Guided Real-World Image Denoising},author={Yosef, Erez and Giryes, Raja},journal={IEEE Open Journal of Signal Processing},year={2025},paper={https://ieeexplore.ieee.org/document/11078899},}
Always open to discussing new research opportunities and collaborations.