https://github.com/zhengyuf/imavatar
Traditional 3D morphable face models (3DMMs) provide fine-grained control over expression but cannot easily capture geometric and appearance details. Code: https://github.com/zhengyuf/imavatar
2022/06/26
228
/biasvariancelabs/ AiTLAS: Artificial Intelligence Toolbox for Earth Observation
https://github.com/biasvariancelabs/aitlas
The AiTLAS toolbox (Artificial Intelligence Toolbox for Earth Observation) includes state-of-the-art machine learning methods for exploratory and predictive analysis of satellite imagery as well as repository of AI-ready Earth Observation (EO) datasets. Code: https://github.com/biasvariancelabs/aitlas
2022/06/26
83
/wgcban/ Remote Sensing Change Detection (Segmentation) using Denoising Diffusion Probabilistic Models
https://github.com/wgcban/ddpm-cd
Human civilization has an increasingly powerful influence on the earth system, and earth observations are an invaluable tool for assessing and mitigating the negative impacts. Code: https://github.com/wgcban/ddpm-cd
2022/06/26
23
/hustvl/ Polar Parametrization for Vision-based Surround-View 3D Detection
https://github.com/hustvl/polardetr
Based on Polar Parametrization, we propose a surround-view 3D DEtection TRansformer, named PolarDETR. Code: https://github.com/hustvl/polardetr
2022/06/25
32
/lucidrains/ Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
https://github.com/lucidrains/parti-pytorch
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Code: https://github.com/lucidrains/parti-pytorch
2022/06/25
111
/liaopeiyuan/ The ArtBench Dataset: Benchmarking Generative Models with Artworks
https://github.com/liaopeiyuan/artbench
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. Code: https://github.com/liaopeiyuan/artbench
2022/06/25
49
/petrhruby97/ Learning to Solve Hard Minimal Problems
https://github.com/petrhruby97/learning_minimal
The hard minimal problems arise from relaxing the original geometric optimization problem into a minimal problem with many spurious solutions. Code: https://github.com/petrhruby97/learning_minimal
2022/06/24
61
/neuralmagic/ Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
https://github.com/neuralmagic/sparseml
Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Code: https://github.com/neuralmagic/sparseml
2022/06/24
1133
/zhangbaijin/ SpA-Former: Transformer image shadow detection and removal via spatial attention
In this paper, we propose an end-to-end SpA-Former to recover a shadow-free image from a single shaded image. Code: https://github.com/zhangbaijin/spa-former-shadow-removal
2022/06/24
21
/expressai/ reStructured Pre-training
https://github.com/expressai/restructured-pretraining
In addition, we test our model in the 2022 College Entrance Examination English that happened a few days ago (2022. 06. 08), and it gets a total score of 134 (v. s. Code: https://github.com/expressai/restructured-pretraining
2022/06/23
20
/crowsonkb/ Elucidating the Design Space of Diffusion-Based Generative Models
https://github.com/crowsonkb/k-diffusion
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. Code: https://github.com/crowsonkb/k-diffusion
2022/06/23
40
/sail-sg/ EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
https://github.com/sail-sg/envpool
On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments. Code: https://github.com/sail-sg/envpool
2022/06/23
476
/nvlabs/ Global Context Vision Transformers
https://github.com/nvlabs/gcvit
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization. Code: https://github.com/nvlabs/gcvit
2022/06/23
45
/chaytonmin/ Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds
https://github.com/chaytonmin/voxel-mae
As the point clouds in 3D object detection is large-scale, it is impossible to reconstruct the input point clouds. Code: https://github.com/chaytonmin/voxel-mae
2022/06/23
23
/facebookresearch/ Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world
We introduce \textit{Nocturne}, a new 2D driving simulator for investigating multi-agent coordination under partial observability. Code: https://github.com/facebookresearch/nocturne
2022/06/23
30
/codedotal/ Evaluating Large Language Models Trained on Code
https://github.com/codedotal/gpt-code-clippy
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. Code: https://github.com/codedotal/gpt-code-clippy
2022/06/23
1127
/mmaaz60/ EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
https://github.com/mmaaz60/EdgeNeXt
Our EdgeNeXt model with 1. 3M parameters achieves 71. 2\% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2\% with 28\% reduction in FLOPs. Code: https://github.com/mmaaz60/EdgeNeXt
2022/06/23
34
/BUPT-GAMMA/ Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning
https://github.com/BUPT-GAMMA/OpenHGNN
Then the cross-view contrastive learning, as well as a view mask mechanism, is proposed, which is able to extract the positive and negative embeddings from two views. Code: https://github.com/BUPT-GAMMA/OpenHGNN
2022/06/22
311
/neuralmagic/ How Well Do Sparse Imagenet Models Transfer?
https://github.com/neuralmagic/deepsparse
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets. Code: https://github.com/neuralmagic/deepsparse
2022/06/22
721
https://github.com/Kai-46/ARF-svox2
We present a method for transferring the artistic features of an arbitrary style image to a 3D scene. Code: https://github.com/Kai-46/ARF-svox2
2022/06/22
217
/microsoft/ RegionCLIP: Region-based Language-Image Pretraining
https://github.com/microsoft/regionclip
However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. Code: https://github.com/microsoft/regionclip
2022/06/22
36
/tjiiv-cprg/ EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
https://github.com/tjiiv-cprg/epro-pnp
The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution. Code: https://github.com/tjiiv-cprg/epro-pnp
2022/06/22
234
/predict-idlab/ Powershap: A Power-full Shapley Feature Selection Method
https://github.com/predict-idlab/powershap
Benchmarks and simulations show that powershap outperforms other filter methods with predictive performances on par with wrapper methods while being significantly faster, often even reaching half or a third of the execution time. Code: https://github.com/predict-idlab/powershap
2022/06/21
60
/hukenovs/ HaGRID -- HAnd Gesture Recognition Image Dataset
https://github.com/hukenovs/hagrid
In this paper, we introduce an enormous dataset HaGRID (HAnd Gesture Recognition Image Dataset) for hand gesture recognition (HGR) systems. Code: https://github.com/hukenovs/hagrid
2022/06/21
65
/microsoft/ Bridge-Tower: Building Bridges Between Encoders in Vision-Language Representation Learning
https://github.com/microsoft/BridgeTower
Current VL models either use lightweight uni-modal encoders and learn to extract, align and fuse both modalities simultaneously in a cross-modal encoder, or feed the last-layer uni-modal features directly into the top cross-modal encoder, ignoring the semantic information at the different levels in the deep uni-modal encoders. Code: https://github.com/microsoft/BridgeTower
2022/06/21
33
/AstraZeneca/ The Shapley Value in Machine Learning
https://github.com/AstraZeneca/awesome-shapley-value
Over the last few years, the Shapley value, a solution concept from cooperative game theory, has found numerous applications in machine learning. Code: https://github.com/AstraZeneca/awesome-shapley-value
2022/06/21
15
/MineDojo/ MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
https://github.com/MineDojo/MineDojo
Autonomous agents have made great strides in specialist domains like Atari games and Go. Code: https://github.com/MineDojo/MineDojo
2022/06/21
78
/microsoft/ GLIPv2: Unifying Localization and Vision-Language Understanding
https://github.com/microsoft/GLIP
We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e. g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e. g., VQA, image captioning). Code: https://github.com/microsoft/GLIP
2022/06/20
448
/OpenPerceptionX/ Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
https://github.com/OpenPerceptionX/TCP
The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step. Code: https://github.com/OpenPerceptionX/TCP
2022/06/20
37
/romainloiseau/ Online Segmentation of LiDAR Sequences: Dataset and Algorithm
https://github.com/romainloiseau/Helix4D
Helix4D operates on acquisition slices that correspond to a fraction of a full rotation of the sensor, significantly reducing the total latency. Code: https://github.com/romainloiseau/Helix4D
2022/06/20
26
/google-research/ General-purpose, long-context autoregressive modeling with Perceiver AR
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. Code: https://github.com/google-research/perceiver-ar
2022/06/20
65
/oatml/ Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
https://github.com/oatml/rho-loss
But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. Code: https://github.com/oatml/rho-loss
2022/06/20
67
/caiyuanhao1998/ Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
https://github.com/caiyuanhao1998/MST
In coded aperture snapshot spectral compressive imaging (CASSI) systems, hyperspectral image (HSI) reconstruction methods are employed to recover the spatial-spectral signal from a compressed measurement. Code: https://github.com/caiyuanhao1998/MST
2022/06/20
188
We propose a new method to invert and edit such complex images in the latent space of GANs, such as StyleGAN2. Code: https://github.com/adobe-research/sam_inversion
2022/06/20
90
/clementchadebec/ Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case
In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions. Code: https://github.com/clementchadebec/benchmark_VAE
2022/06/20
383
/dblalock/ Multiplying Matrices Without Multiplying
https://github.com/dblalock/bolt
Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Code: https://github.com/dblalock/bolt
2022/06/20
1855
/prbonn/ Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions
https://github.com/prbonn/4dmos
A key challenge for autonomous vehicles is to navigate in unseen dynamic environments. Code: https://github.com/prbonn/4dmos
2022/06/17
30
LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs." Code: https://github.com/uw-madison-lee-lab/languageinterfacedfinetuning
2022/06/16
19
/google-research/ Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Code: https://github.com/google-research/t5x
2022/06/16
484
/nv-tlabs/ Variable Bitrate Neural Fields
Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations. Code: https://github.com/nv-tlabs/vqad
2022/06/16
77
/zhiqi-li/ BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
https://github.com/zhiqi-li/BEVFormer
In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. Code: https://github.com/zhiqi-li/BEVFormer
2022/06/16
647
/thudm/ CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
https://github.com/thudm/cogview2
The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. Code: https://github.com/thudm/cogview2
2022/06/16
363
Specifically, we generate support samples from actual samples and their neighbouring clusters in the embedding space through a progressive linear interpolation (PLI) strategy. Code: https://github.com/PaddlePaddle/PaddleClas
2022/06/15
3754
/hustvl/ Featurized Query R-CNN
https://github.com/hustvl/featurized-queryrcnn
The query mechanism introduced in the DETR method is changing the paradigm of object detection and recently there are many query-based methods have obtained strong object detection performance. Code: https://github.com/hustvl/featurized-queryrcnn
2022/06/15
24
/salesforce/ OmniXAI: A Library for Explainable AI
https://github.com/salesforce/omnixai
We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. Code: https://github.com/salesforce/omnixai
2022/06/14
57
/megvii-research/ PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
https://github.com/megvii-research/petr
More specifically, we extend the 3D position embedding (3D PE) in PETR for temporal modeling. Code: https://github.com/megvii-research/petr
2022/06/14
87
/qianc62/ Counterfactual Inference for Text Classification Debiasing
https://github.com/qianc62/corsair
In inference, given a factual input document, Corsair imagines its two counterfactual counterparts to distill and mitigate the two biases captured by the poisonous model. Code: https://github.com/qianc62/corsair
2022/06/14
71
/jwwangchn/ A Normalized Gaussian Wasserstein Distance for Tiny Object Detection
https://github.com/jwwangchn/NWD
To alleviate this, we propose a new evaluation metric using Wasserstein distance for tiny object detection. Code: https://github.com/jwwangchn/NWD
2022/06/14
31
/zuruoke/ Free-Form Image Inpainting with Gated Convolution
https://github.com/zuruoke/watermark-removal
We present a generative image inpainting system to complete images with free-form mask and guidance. Code: https://github.com/zuruoke/watermark-removal
2022/06/13
77