Insert related papers here

Insert related papers here

Insert related papers here

Insert related papers here

Insert related papers here

Insert related papers here

Welcome to Byte Size Arxiv

Papers made digestable

2023-03-20

Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion

  • Locomotion has seen dramatic progress for walking or running across challenging terrains
  • Videos at https://robot-skills.github.io
Locomotion has seen dramatic progress for walking or running across challenging terrains. However, robotic quadrupeds are still far behind their biological counterparts, such as dogs, which display a variety of agile skills and can use the legs beyond locomotion to perform several basic manipulation tasks like interacting with objects and climbing. In this paper, we take a step towards bridging this gap by training quadruped robots not only to walk but also to use the front legs to climb walls, press buttons, and perform object interaction in the real world. To handle this challenging optimization, we decouple the skill learning broadly into locomotion, which involves anything that involves movement whether via walking or climbing a wall, and manipulation, which involves using one leg to interact while balancing on the other three legs. These skills are trained in simulation using curriculum and transferred to the real world using our proposed sim2real variant that builds upon recent locomotion success. Finally, we combine these skills into a robust long-term plan by learning a behavior tree that encodes a high-level task hierarchy from one clean expert demonstration. We evaluate our method in both simulation and real-world showing successful executions of both short as well as long-range tasks and how robustness helps confront external perturbations. Videos at https://robot-skills.github.io

Authors: Xuxin Cheng, Ashish Kumar, Deepak Pathak.

2023-03-20

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

  • We learn to solve these tasks solely through self-supervision
  • We train these models to generate predictions that agree with one another
  • At test time, the models can be deployed independently
  • Project site: https://ificl.github.io/SLfM/
The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. We learn to solve these tasks solely through self-supervision. A visual model predicts camera rotation from a pair of images, while an audio model predicts the direction of sound sources from binaural sounds. We train these models to generate predictions that agree with one another. At test time, the models can be deployed independently. To obtain a feature representation that is well-suited to solving this challenging problem, we also propose a method for learning an audio-visual representation through cross-view binauralization: estimating binaural sound from one view, given images and sound from another. Our model can successfully estimate accurate rotations on both real and synthetic scenes, and localize sound sources with accuracy competitive with state-of-the-art self-supervised approaches. Project site: https://ificl.github.io/SLfM/

Authors: Ziyang Chen, Shengyi Qian, Andrew Owens.

2023-03-20

Zero-1-to-3: Zero-shot One Image to 3D Object

  • We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image
  • Our viewpoint-conditioned diffusion approach can further be used for the task of 3D reconstruction from a single image.
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. To perform novel view synthesis in this under-constrained setting, we capitalize on the geometric priors that large-scale diffusion models learn about natural images. Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint, which allow new images to be generated of the same object under a specified camera transformation. Even though it is trained on a synthetic dataset, our model retains a strong zero-shot generalization ability to out-of-distribution datasets as well as in-the-wild images, including impressionist paintings. Our viewpoint-conditioned diffusion approach can further be used for the task of 3D reconstruction from a single image. Qualitative and quantitative experiments show that our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.

Authors: Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick.

2023-03-20

3D Concept Learning and Reasoning from Multi-View Images

  • Humans are able to accurately reason in 3D by gathering multi-view observations of the surrounding world
  • In total, it consists of approximately 5k scenes, 600k images, paired with 50k questions
  • We evaluate various state-of-the-art models for visual reasoning on our benchmark and find that they all perform poorly
  • Experimental results suggest that our framework outperforms baseline models by a large margin, but the challenge remains largely unsolved
  • We further perform an in-depth analysis of the challenges and highlight potential future directions.
Humans are able to accurately reason in 3D by gathering multi-view observations of the surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for 3D multi-view visual question answering (3DMV-VQA). This dataset is collected by an embodied agent actively moving and capturing RGB images in an environment using the Habitat simulator. In total, it consists of approximately 5k scenes, 600k images, paired with 50k questions. We evaluate various state-of-the-art models for visual reasoning on our benchmark and find that they all perform poorly. We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations. As the first step towards this approach, we propose a novel 3D concept learning and reasoning (3D-CLR) framework that seamlessly combines these components via neural fields, 2D pre-trained vision-language models, and neural reasoning operators. Experimental results suggest that our framework outperforms baseline models by a large margin, but the challenge remains largely unsolved. We further perform an in-depth analysis of the challenges and highlight potential future directions.

Authors: Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan.

2023-03-20

Spin- and momentum-correlated atom pairs mediated by photon exchange

  • Pairs of correlated particles are at the core of complex many-body phenomena and their control is essential for quantum technologies
  • Engineering pairs that are simultaneously correlated in their external and internal degrees of freedom is a major challenge
  • The scheme is independent of collisional interactions, fast and tunable.
Pairs of correlated particles are at the core of complex many-body phenomena and their control is essential for quantum technologies. Engineering pairs that are simultaneously correlated in their external and internal degrees of freedom is a major challenge. In this work, we experimentally demonstrate a mechanism for generating pairs of atoms in well-defined spin and momentum modes. This mechanism couples atoms from a degenerate Bose gas via a superradiant photon-exchange process mediated by the vacuum mode of an optical cavity. The scheme is independent of collisional interactions, fast and tunable. We observe a collectively enhanced production of pairs, characterize their statistics, and measure inter-spin correlations in momentum space. Our observation of coherent many-body oscillations involving well-defined momentum modes offers promising prospects for quantum-enhanced interferometry using entangled matter waves.

Authors: Fabian Finger, Rodrigo Rosa-Medina, Nicola Reiter, Panagiotis Christodoulou, Tobias Donner, Tilman Esslinger.

2023-03-20

The ALMA REBELS Survey: The First Infrared Luminosity Function Measurement at $\mathbf{z \sim 7}

  • 16 sources exhibit a dust detection, 15 of which are also spectroscopically confirmed through the [CII] line
  • The IR luminosities of the sample range from $\log L_{IR}/L_\odot=11.4$ to 12.2
  • Our observational results are in broad agreement with the average of predicted IRLFs from simulations at $z\sim7$
  • Conversely, our IRLFs lie significantly below lower redshift estimates, suggesting a rapid evolution from $z\sim4$ to $z\sim7$, into the reionization epoch
  • We conclude that the presence of dust is already abundant in the EoR and discuss the possibility of unveiling larger samples of dusty galaxies with future ALMA and JWST observations.
We present the first observational infrared luminosity function (IRLF) measurement in the Epoch of Reionization (EoR) based on a UV-selected galaxy sample with ALMA spectroscopic observations. Our analysis is based on the ALMA large program Reionization Era Bright Emission Line Survey (REBELS), which targets 42 galaxies at $\mathrm{z=6.4-7.7}$ with [CII] 158$\micron$ line scans. 16 sources exhibit a dust detection, 15 of which are also spectroscopically confirmed through the [CII] line. The IR luminosities of the sample range from $\log L_{IR}/L_\odot=11.4$ to 12.2. Using the UVLF as a proxy to derive the effective volume for each of our target sources, we derive IRLF estimates, both for detections and for the full sample including IR luminosity upper limits. The resulting IRLFs are well reproduced by a Schechter function with the characteristic luminosity of $\log L_{*}/L_\odot=11.6^{+0.2}_{-0.1}$. Our observational results are in broad agreement with the average of predicted IRLFs from simulations at $z\sim7$. Conversely, our IRLFs lie significantly below lower redshift estimates, suggesting a rapid evolution from $z\sim4$ to $z\sim7$, into the reionization epoch. The inferred obscured contribution to the cosmic star-formation rate density at $z\sim7$ amounts to $\mathrm{log(SFRD/M_{\odot}/yr/Mpc^{3}) = -2.66^{+0.17}_{-0.14} }$ which is at least $\sim$10\% of UV-based estimates. We conclude that the presence of dust is already abundant in the EoR and discuss the possibility of unveiling larger samples of dusty galaxies with future ALMA and JWST observations.

Authors: L. Barrufet, P. A. Oesch, R. Bouwens, H. Inami, L. Sommovigo, H. Algera, E. da Cunha, M. Aravena, P. Dayal, A. Ferrara, Y. Fudamoto, V. Gonzalez, L. Graziani, A. Hygate, I. de Looze, T. Nanayakkara, A. Pallottini, R. Schneider, M. Stefanon, M. Topping, P. van Der Werf.

2023-03-20

ScribbleSeg: Scribble-based Interactive Image Segmentation

  • Our code will be made available.

Interactive segmentation enables users to extract masks by providing simple annotations to indicate the target, such as boxes, clicks, or scribbles. Among these interaction formats, scribbles are the most flexible as they can be of arbitrary shapes and sizes. This enables scribbles to provide more indications of the target object. However, previous works mainly focus on click-based configuration, and the scribble-based setting is rarely explored. In this work, we attempt to formulate a standard protocol for scribble-based interactive segmentation. Basically, we design diversified strategies to simulate scribbles for training, propose a deterministic scribble generator for evaluation, and construct a challenging benchmark. Besides, we build a strong framework ScribbleSeg, consisting of a Prototype Adaption Module(PAM) and a Corrective Refine Module (CRM), for the task. Extensive experiments show that ScribbleSeg performs notably better than previous click-based methods. We hope this could serve as a more powerful and general solution for interactive segmentation. Our code will be made available.

Authors: Xi Chen, Yau Shing Jonathan Cheung, Ser-Nam Lim, Hengshuang Zhao.

2023-03-20

Over-the-Air Federated Edge Learning with Error-Feedback One-Bit Quantization and Power Control

  • Over-the-air federated edge learning (Air-FEEL) is a communication-efficient framework for distributed machine learning using training data distributed at edge devices
  • However, the low-resolution one-bit gradient quantization slows down the model convergence and leads to performance degradation
  • On the other hand, the aggregation errors caused by fading channels in Air-FEEL is still remained to be solved
  • To this end, we first provide a theoretical analysis to evaluate the impact of error feedback on the convergence of FL with EFOBDA
  • Then, we further introduce a power control policy by maximizing the convergence rate under instantaneous power constraints.
Over-the-air federated edge learning (Air-FEEL) is a communication-efficient framework for distributed machine learning using training data distributed at edge devices. This framework enables all edge devices to transmit model updates simultaneously over the entire available bandwidth, allowing for over-the-air aggregation. A one-bit digital over-the-air aggregation (OBDA) scheme has been recently proposed, featuring one-bit gradient quantization at edge devices and majority-voting based decoding at the edge server. However, the low-resolution one-bit gradient quantization slows down the model convergence and leads to performance degradation. On the other hand, the aggregation errors caused by fading channels in Air-FEEL is still remained to be solved. To address these issues, we propose the error-feedback one-bit broadband digital aggregation (EFOBDA) and an optimized power control policy. To this end, we first provide a theoretical analysis to evaluate the impact of error feedback on the convergence of FL with EFOBDA. The analytical results show that, by setting an appropriate feedback strength, EFOBDA is comparable to the Air-FEEL without quantization, thus enhancing the performance of OBDA. Then, we further introduce a power control policy by maximizing the convergence rate under instantaneous power constraints. The convergence analysis and optimized power control policy are verified by the experiments, which show that the proposed scheme achieves significantly faster convergence and higher test accuracy in image classification tasks compared with the one-bit quantization scheme without error feedback or optimized power control policy.

Authors: Yuding Liu, Dongzhu Liu, Guangxu Zhu, Qingjiang Shi, Caijun Zhong.

2023-03-20

Collisional evolution of dust and water ice in protoplanetary discs during and after an accretion outburst

  • Most protoplanetary discs are thought to undergo violent and frequent accretion outbursts, during which the accretion rate and central luminosity are elevated for several decades
  • This temporarily increases the disc temperature, leading to the sublimation of ice species as snowlines move outwards
  • Our main finding is that the evolution of dust grains located between the quiescent and outburst water snowlines is driven by significant changes in composition and porosity
  • Pebble-sized particles, the building blocks of planetesimals, are either deprecated in water ice or completely destroyed, respectively resulting in drier planetesimals or halting their formation altogether
  • Our results highlight the importance of including accretion outbursts in models of dust coagulation and planet formation.

Most protoplanetary discs are thought to undergo violent and frequent accretion outbursts, during which the accretion rate and central luminosity are elevated for several decades. This temporarily increases the disc temperature, leading to the sublimation of ice species as snowlines move outwards. In this paper, we investigate how an FUor-type accretion outburst alters the growth and appearance of dust aggregates at different locations in protoplanetary discs. We develop a model based on the Monte Carlo approach to simulate locally the coagulation and fragmentation of icy dust particles and investigate different designs for their structure and response to sublimation. Our main finding is that the evolution of dust grains located between the quiescent and outburst water snowlines is driven by significant changes in composition and porosity. The time required for the dust population to recover from the outburst and return to a coagulation/fragmentation equilibrium depends on the complex interplay of coagulation physics and outburst properties, and can take up to 4500 yr at 5 au. Pebble-sized particles, the building blocks of planetesimals, are either deprecated in water ice or completely destroyed, respectively resulting in drier planetesimals or halting their formation altogether. When accretion outbursts are frequent events, the dust can be far from collisional equilibrium for a significant fraction of time, offering opportunities to track past outbursts in discs at millimetre wavelengths. Our results highlight the importance of including accretion outbursts in models of dust coagulation and planet formation.

Authors: Adrien Houge, Sebastiaan Krijt.

2023-03-20

Grover's Algorithm Offers No Quantum Advantage

  • Our finding implies that there is no a priori theoretical quantum speedup associated with Grover's algorithm
  • We critically examine the possibility of a practical speedup, a possibility that depends on the nature of the quantum circuit associated with the oracle.
Grover's algorithm is one of the primary algorithms offered as evidence that quantum computers can provide an advantage over classical computers. It involves an "oracle" (external quantum subroutine) which must be specified for a given application and whose internal structure is not part of the formal scaling of the quantum speedup guaranteed by the algorithm. Grover's algorithm also requires exponentially many steps to succeed, raising the question of its implementation on near-term, non-error-corrected hardware and indeed even on error-corrected quantum computers. In this work, we construct a quantum inspired algorithm, executable on a classical computer, that performs Grover's task in a linear number of call to the oracle - an exponentially smaller number than Grover's algorithm - and demonstrate this algorithm explicitly for boolean satisfiability problems (3-SAT). Our finding implies that there is no a priori theoretical quantum speedup associated with Grover's algorithm. We critically examine the possibility of a practical speedup, a possibility that depends on the nature of the quantum circuit associated with the oracle. We argue that the unfavorable scaling of the success probability of Grover's algorithm, which in the presence of noise decays as the exponential of the exponential of the number of qubits, makes a practical speedup unrealistic even under extremely optimistic assumptions on both hardware quality and availability.

Authors: E. M. Stoudenmire, Xavier Waintal.

2023-03-20

Generative Semantic Segmentation

  • We present Generative Semantic Segmentation (GSS), a generative learning approach for semantic segmentation
  • Uniquely, we cast semantic segmentation as an image-conditioned mask generation problem
  • This is achieved by replacing the conventional per-pixel discriminative learning with a latent prior learning process
  • This posterior distribution allows to generate segmentation masks unconditionally
  • To achieve semantic segmentation on a given image, we further introduce a conditioning network.

We present Generative Semantic Segmentation (GSS), a generative learning approach for semantic segmentation. Uniquely, we cast semantic segmentation as an image-conditioned mask generation problem. This is achieved by replacing the conventional per-pixel discriminative learning with a latent prior learning process. Specifically, we model the variational posterior distribution of latent variables given the segmentation mask. To that end, the segmentation mask is expressed with a special type of image (dubbed as maskige). This posterior distribution allows to generate segmentation masks unconditionally. To achieve semantic segmentation on a given image, we further introduce a conditioning network. It is optimized by minimizing the divergence between the posterior distribution of maskige (i.e., segmentation masks) and the latent prior distribution of input training images. Extensive experiments on standard benchmarks show that our GSS can perform competitively to prior art alternatives in the standard semantic segmentation setting, whilst achieving a new state of the art in the more challenging cross-domain setting.

Authors: Jiaqi Chen, Jiachen Lu, Xiatian Zhu, Li Zhang.

2023-03-20

Context-faithful Prompting for Large Language Models

  • In particular, we identify opinion-based prompts and counterfactual demonstrations as the most effective methods
  • Neither technique requires additional training.
Large language models (LLMs) encode parametric knowledge about world facts and have shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks (e.g., knowledge acquisition tasks). In this paper, we seek to assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention. We demonstrate that LLMs' faithfulness can be significantly improved using carefully designed prompting strategies. In particular, we identify opinion-based prompts and counterfactual demonstrations as the most effective methods. Opinion-based prompts reframe the context as a narrator's statement and inquire about the narrator's opinions, while counterfactual demonstrations use instances containing false facts to improve faithfulness in knowledge conflict situations. Neither technique requires additional training. We conduct experiments on three datasets of two standard NLP tasks, machine reading comprehension and relation extraction, and the results demonstrate significant improvement in faithfulness to contexts.

Authors: Wenxuan Zhou, Sheng Zhang, Hoifung Poon, Muhao Chen.

2023-03-20

The underlying radial acceleration relation

  • In reality these are free parameters of the fit, contributing systematic rather than statistical error
  • This reveals the intrinsic RAR underlying that observed.

The radial acceleration relation (RAR) of late-type galaxies relates their dynamical acceleration, $g_\text{obs}$, to that sourced by baryons alone, $g_\text{bar}$, across their rotation curves. Literature fits to the RAR have fixed the galaxy parameters on which the relation depends -- distance, inclination, luminosity and mass-to-light ratios -- to their maximum a priori values with an uncorrelated Gaussian contribution to the uncertainties on $g_\text{bar}$ and $g_\text{obs}$. In reality these are free parameters of the fit, contributing systematic rather than statistical error. Assuming a range of possible functional forms for the relation with or without intrinsic scatter (motivated by Modified Newtonian Dynamics with or without the external field effect), I use Hamiltonian Monte Carlo to perform the full joint inference of RAR and galaxy parameters for the SPARC dataset. This reveals the intrinsic RAR underlying that observed. I find an acceleration scale $a_0=(1.19 \pm 0.04 \, \text{(stat)} \pm 0.09 \, \text{(sys)}) \: \times \: 10^{-10}$ m s$^{-2}$, an intrinsic scatter $\sigma_\text{int}=(0.034 \pm 0.01 \, \text{(stat)} \pm 0.01 \, \text{(sys)})$ dex (assuming the SPARC error model is reliable) and weak evidence for the external field effect. I make summary statistics of all my analyses publicly available for future SPARC studies or applications of a calibrated RAR, for example redshift-independent distance measurement.

Authors: Harry Desmond.

2023-03-20

CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition

  • Vision-Language models like CLIP have been widely adopted for various tasks due to their impressive zero-shot capabilities
  • We note that the natural images used to train CLIP and the rendered 2D images in CG3D have a distribution shift
  • Attempting to train the visual and text encoder to account for this shift results in catastrophic forgetting and a notable decrease in performance
  • Further, it also serves as strong starting weights for fine-tuning in downstream 3D recognition tasks.
Vision-Language models like CLIP have been widely adopted for various tasks due to their impressive zero-shot capabilities. However, CLIP is not suitable for extracting 3D geometric features as it was trained on only images and text by natural language supervision. We work on addressing this limitation and propose a new framework termed CG3D (CLIP Goes 3D) where a 3D encoder is learned to exhibit zero-shot capabilities. CG3D is trained using triplets of pointclouds, corresponding rendered 2D images, and texts using natural language supervision. To align the features in a multimodal embedding space, we utilize contrastive loss on 3D features obtained from the 3D encoder, as well as visual and text features extracted from CLIP. We note that the natural images used to train CLIP and the rendered 2D images in CG3D have a distribution shift. Attempting to train the visual and text encoder to account for this shift results in catastrophic forgetting and a notable decrease in performance. To solve this, we employ prompt tuning and introduce trainable parameters in the input space to shift CLIP towards the 3D pre-training dataset utilized in CG3D. We extensively test our pre-trained CG3D framework and demonstrate its impressive capabilities in zero-shot, open scene understanding, and retrieval tasks. Further, it also serves as strong starting weights for fine-tuning in downstream 3D recognition tasks.

Authors: Deepti Hegde, Jeya Maria Jose Valanarasu, Vishal M. Patel.

2023-03-20

waywiser: Ergonomic Methods for Assessing Spatial Models

  • Assessing predictive models can be challenging
  • Additional features make it particularly easy to use waywiser along packages and workflows in the tidymodels ecosystem.

Assessing predictive models can be challenging. Modelers must navigate a wide array of evaluation methodologies implemented with incompatible interfaces across multiple packages which may give different or even contradictory results, while ensuring that their chosen approach properly estimates the performance of their model when generalizing to new observations. Assessing models fit to spatial data can be particularly difficult, given that model errors may exhibit spatial autocorrelation, model predictions are often aggregated to multiple spatial scales by end users, and models are often tasked with generalizing into spatial regions outside the boundaries of their initial training data. The waywiser package for the R language attempts to make assessing spatial models easier by providing an ergonomic toolkit for model evaluation tasks, with functions for multiple assessment methodologies sharing a unified interface. Functions from waywiser share standardized argument names and default values, making the user-facing interface simple and easy to learn. These functions are additionally designed to be easy to integrate into a wide variety of modeling workflows, accepting standard classes as inputs and returning size- and type-stable outputs, ensuring that their results are of consistent and predictable data types and dimensions. Additional features make it particularly easy to use waywiser along packages and workflows in the tidymodels ecosystem.

Authors: Michael J Mahoney.