/MTLab/ MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
https://github.com/MTLab/MorphMLPWith such multi-dimension and multi-scale factorization, our MorphMLP block can achieve a great accuracy-computation balance. Code: https://github.com/MTLab/MorphMLP
/salesforce/ ProGen2: Exploring the Boundaries of Protein Language Models
https://github.com/salesforce/progenAttention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design. Code: https://github.com/salesforce/progen
/salesforce/ Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data
https://github.com/salesforce/causalaiWe introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data. Code: https://github.com/salesforce/causalai
/microsoft/ BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
https://github.com/microsoft/biogptPre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Code: https://github.com/microsoft/biogpt
/zhongcl-thu/ SNAKE: Shape-aware Neural 3D Keypoint Field
https://github.com/zhongcl-thu/snakeDetecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection? Code: https://github.com/zhongcl-thu/snake
/air-discover/ VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and Uncertainty-Spectrum Modeling
https://github.com/air-discover/vibusIn the first stage, we perform self-supervised representation learning on unlabeled points with the proposed Viewpoint Bottleneck loss function. Code: https://github.com/air-discover/vibus
/serycjon/ Planar Object Tracking via Weighted Optical Flow
https://github.com/serycjon/WOFTWe propose WOFT -- a novel method for planar object tracking that estimates a full 8 degrees-of-freedom pose, i. e. the homography w. r. t. Code: https://github.com/serycjon/WOFT
/sjvasquez/ Generating Sequences With Recurrent Neural Networks
https://github.com/sjvasquez/handwriting-synthesisThis paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. Code: https://github.com/sjvasquez/handwriting-synthesis
/facebookresearch/ Cut and Learn for Unsupervised Object Detection and Instance Segmentation
https://github.com/facebookresearch/cutlerWe propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. Code: https://github.com/facebookresearch/cutler
/hustvl/ A Simple Adaptive Unfolding Network for Hyperspectral Image Reconstruction
https://github.com/hustvl/saunetWe present a simple, efficient, and scalable unfolding network, SAUNet, to simplify the network design with an adaptive alternate optimization framework for hyperspectral image (HSI) reconstruction. Code: https://github.com/hustvl/saunet
/microsoft/ TorchGeo: Deep Learning With Geospatial Data
https://github.com/microsoft/torchgeoDeep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery available. Code: https://github.com/microsoft/torchgeo
/chaitjo/ On the Expressive Power of Geometric Graph Neural Networks
https://github.com/chaitjo/geometric-gnn-dojoThe expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test. Code: https://github.com/chaitjo/geometric-gnn-dojo
/sarafridov/ K-Planes: Explicit Radiance Fields in Space, Time, and Appearance
https://github.com/sarafridov/k-planesWe introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Code: https://github.com/sarafridov/k-planes
/stanfordnlp/ Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
https://github.com/stanfordnlp/dspRetrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Code: https://github.com/stanfordnlp/dsp
/wxjiao/ Is ChatGPT A Good Translator? A Preliminary Study
https://github.com/wxjiao/is-chatgpt-a-good-translatorThis report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. Code: https://github.com/wxjiao/is-chatgpt-a-good-translator
/hazyresearch/ Hungry Hungry Hippos: Towards Language Modeling with State Space Models
https://github.com/hazyresearch/h3First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. Code: https://github.com/hazyresearch/h3
/autonomousvision/ StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
https://github.com/autonomousvision/stylegan-tText-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. Code: https://github.com/autonomousvision/stylegan-t
/showlab/ Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://github.com/showlab/Tune-A-VideoTo reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. Code: https://github.com/showlab/Tune-A-Video
/facebookresearch/ Learning-Rate-Free Learning by D-Adaptation
https://github.com/facebookresearch/dadaptationIn this work, we describe a single-loop method, with no back-tracking or line searches, which does not require knowledge of $D$ yet asymptotically achieves the optimal rate of convergence for the complexity class of convex Lipschitz functions. Code: https://github.com/facebookresearch/dadaptation
/gallilmaimon/ Speaking Style Conversion With Discrete Self-Supervised Units
https://github.com/gallilmaimon/DISSCWe introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate the proposed approach is significantly superior to the evaluated baselines. Code: https://github.com/gallilmaimon/DISSC
/facebookresearch/ Multiview Compressive Coding for 3D Reconstruction
https://github.com/facebookresearch/mccWe introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos. Code: https://github.com/facebookresearch/mcc
/BlinkDL/ GLU Variants Improve Transformer
https://github.com/BlinkDL/RWKV-LMGated Linear Units (arXiv:1612. 08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Code: https://github.com/BlinkDL/RWKV-LM
/hello-simpleai/ How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
https://github.com/hello-simpleai/chatgpt-comparison-detectionWe call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Code: https://github.com/hello-simpleai/chatgpt-comparison-detection
/sileod/ $\texttt{tasksource}$: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation
https://github.com/sileod/tasksourceWe release a dataset annotation framework and dataset annotations for more than 400 English tasks (https://github. com/sileod/tasksource). Code: https://github.com/sileod/tasksource
/slds-lmu/ Multimodal Deep Learning
https://github.com/slds-lmu/seminar_multimodal_dlThis book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Code: https://github.com/slds-lmu/seminar_multimodal_dl
/gligen/ GLIGEN: Open-Set Grounded Text-to-Image Generation
https://github.com/gligen/GLIGENLarge-scale text-to-image diffusion models have made amazing advances. Code: https://github.com/gligen/GLIGEN
/kinyugo/ Msanii: High Fidelity Music Synthesis on a Shoestring Budget
https://github.com/kinyugo/msaniiIn this paper, we present Msanii, a novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently. Code: https://github.com/kinyugo/msanii
/timothybrooks/ InstructPix2Pix: Learning to Follow Image Editing Instructions
https://github.com/timothybrooks/instruct-pix2pixWe propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. Code: https://github.com/timothybrooks/instruct-pix2pix
/open-mmlab/ RTMDet: An Empirical Study of Designing Real-Time Object Detectors
https://github.com/open-mmlab/mmyoloIn this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. Code: https://github.com/open-mmlab/mmyolo
/agemagician/ Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
https://github.com/agemagician/AnkhAs opposed to scaling-up protein language models (PLMs), we seek improving performance via protein-specific optimization. Code: https://github.com/agemagician/Ankh
/salesforce/ EDICT: Exact Diffusion Inversion via Coupled Transformations
https://github.com/salesforce/edictEDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion. Code: https://github.com/salesforce/edict
/cleanlab/ Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators
https://github.com/cleanlab/cleanlabMany algorithms also rely solely on annotator statistics, ignoring the features of the examples from which the annotations derive. Code: https://github.com/cleanlab/cleanlab
/zuruoke/ Free-Form Image Inpainting with Gated Convolution
https://github.com/zuruoke/watermark-removalWe present a generative image inpainting system to complete images with free-form mask and guidance. Code: https://github.com/zuruoke/watermark-removal
/deepmind/ Tracr: Compiled Transformers as a Laboratory for Interpretability
https://github.com/deepmind/tracrInterpretability research aims to build tools for understanding machine learning (ML) models. Code: https://github.com/deepmind/tracr
/open-mmlab/ SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
https://github.com/open-mmlab/mmposeWith a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect. Code: https://github.com/open-mmlab/mmpose
/opendr-eu/ VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images
https://github.com/opendr-eu/opendrIn this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). Code: https://github.com/opendr-eu/opendr
/hku-mars/ ImMesh: An Immediate LiDAR Localization and Meshing Framework
https://github.com/hku-mars/immeshThis voxel-wise meshing operation is delicately designed for the purpose of efficiency; it first performs a dimension reduction by projecting 3D points to a 2D local plane contained in the voxel, and then executes the meshing operation with pull, commit and push steps for incremental reconstruction of triangle facets. Code: https://github.com/hku-mars/immesh
/google/ Vectorized and performance-portable Quicksort
https://github.com/google/highwayRecent works showed that implementations of Quicksort using vector CPU instructions can outperform the non-vectorized algorithms in widespread use. Code: https://github.com/google/highway
/PrieureDeSion/ GNM: A General Navigation Model to Drive Any Robot
https://github.com/PrieureDeSion/drive-any-robotLearning provides a powerful tool for vision-based navigation, but the capabilities of learning-based policies are constrained by limited training data. Code: https://github.com/PrieureDeSion/drive-any-robot
/felix-petersen/ Deep Differentiable Logic Gate Networks
https://github.com/felix-petersen/difflogicRecently, research has increasingly focused on developing efficient neural network architectures. Code: https://github.com/felix-petersen/difflogic
/sebastianstarke/ Local motion phases for learning multi-contact character movements
https://github.com/sebastianstarke/AI4AnimationTraining a bipedal character to play basketball and interact with objects, or a quadruped character to move in various locomotion modes, are difficult tasks due to the fast and complex contacts happening during the motion. Code: https://github.com/sebastianstarke/AI4Animation
/mindflow-institue/ Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review
https://github.com/mindflow-institue/awesome-transformerThe remarkable performance of the Transformer architecture in natural language processing has recently also triggered broad interest in Computer Vision. Code: https://github.com/mindflow-institue/awesome-transformer
/keyu-tian/ Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
https://github.com/keyu-tian/sparkThis is the first use of sparse convolution for 2D masked modeling. Code: https://github.com/keyu-tian/spark
/XT-1997/ DeepMatcher: A Deep Transformer-based Network for Robust and Accurate Local Feature Matching
https://github.com/XT-1997/DeepMatcherIn this work, we propose DeepMatcher, a deep Transformer-based network built upon our investigation of local feature matching in detector-free methods. Code: https://github.com/XT-1997/DeepMatcher
/tzvilederer/ Silent Killer: Optimizing Backdoor Trigger Yields a Stealthy and Powerful Data Poisoning Attack
https://github.com/tzvilederer/silent-killerIn contrast to previous attacks, both the poison and the trigger in our method are stealthy. Code: https://github.com/tzvilederer/silent-killer
/chuhaojin/ Text2Poster: Laying out Stylized Texts on Retrieved Images
https://github.com/chuhaojin/text2poster-icassp-22Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience. Code: https://github.com/chuhaojin/text2poster-icassp-22
/blueGorae/ DynaGAN: Dynamic Few-shot Adaptation of GANs to Multiple Domains
https://github.com/blueGorae/DynaGANIn this paper, we propose DynaGAN, a novel few-shot domain-adaptation method for multiple target domains. Code: https://github.com/blueGorae/DynaGAN
/fwilliams/ Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
https://github.com/fwilliams/point-cloud-utilsOptimal transportation distances are a fundamental family of parameterized distances for histograms. Code: https://github.com/fwilliams/point-cloud-utils
/anthropics/ Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
https://github.com/anthropics/hh-rlhfWe provide our own analysis of the data and find a variety of harmful outputs, which range from offensive language to more subtly harmful non-violent unethical outputs. Code: https://github.com/anthropics/hh-rlhf
/microsoft/ MPNet: Masked and Permuted Pre-training for Language Understanding
https://github.com/microsoft/MASSSince BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. Code: https://github.com/microsoft/MASS