1 Automated Syllabus of Machine Learning Papers

Built by Rex W. Douglass @RexDouglass ; Github ; LinkedIn

Papers curated by hand, summaries and taxonomy written by LLMs.

Submit paper to add for review

2 Machine learning

2.1 Hyperbolic Embedding

Consider using hyperbolic embeddings combined with Hearst patterns to infer concept hierarchies from large text corpora because this approach allows for improved taxonomic consistency, increased efficiency in handling large datasets, and greater interpretability of results. (Le et al. 2019)
Consider using hyperbolic spaces for learning hierarchical embeddings of directed acyclic graphs, as they offer superior representational capacity compared to Euclidean spaces and can better preserve the underlying network properties. (Ganea, Bécigneul, and Hofmann 2018)
Consider combining both Euclidean and hyperbolic embeddings for improved representational power in node classification and link prediction tasks, especially when dealing with complex graphs that exhibit both hierarchical and non-hierarchical structures. (Kipf and Welling 2016)
Consider using hyperbolic spaces, specifically the hyperbolic plane, as a target space for embedding trees due to its ability to preserve the topological and geometric properties of the tree, enable guaranteed greedy routing, and achieve low distortion when embedding weighted trees. (Chepoi et al. 2010)

2.2 Transformer

Leverage the power of transformer architectures, specifically through the use of separate embeddings for covariates and treatments followed by cross-modal attention, to improve treatment effect estimation while reducing parameter inefficiency and increasing robustness to changes in treatment or dosage. (Zhang et al. 2022)
Prioritize using pre-trained language representations over explicit linguistic features when conducting relation extraction studies, as they offer superior performance, require less data annotation, and reduce the risk of error accumulation. (Alt, Hübner, and Hennig 2019)
Consider the potential impact of non-identifiability in self-attention mechanisms, as the presence of a non-trivial null space in the attention matrix implies that there are multiple sets of attention weights that produce the same output, making interpretation of attention weights difficult and potentially misleading. (Brunner et al. 2019)
Consider leveraging the power of the open-source library “Transformers,” which offers carefully engineered state-of-the-art Transformer architectures under a unified API, backed by a curated collection of pretrained models, and designed to be extensible, simple, and fast for both research and industrial deployments. (Wolf et al. 2019)

2.3 Deep Learning

Leverage heterogeneous data sources and study the dependencies between societal events to build interpretable deep learning models for accurate event prediction. (Deng and Ning 2021)
Consider employing a multi-task learning approach for analyzing complex scenarios involving crowds, as it can lead to improved performance in individual tasks, as evidenced by the authors study showing a 9% improvement in ROC curve AUC for violent behavior detection. (Marsden et al. 2017)

2.4 Imbalanced Dataset

Consider using Monte Carlo simulation methods to systematically vary specific aspects of your data while controlling others, allowing them to draw conclusions about the impact of those variations on the performance of different algorithms for handling class imbalance in machine learning tasks. (Abdar et al. 2019)
Employ specialized evaluation metrics and modify learning algorithms to prioritize rare and important cases when working with imbalanced datasets, where standard evaluation criteria and learning algorithms may perform poorly due to non-uniform user preference biases. (Branco, Torgo, and Ribeiro 2015)
Consider the potential impact of data skewness on performance metrics, particularly for imbalanced datasets, and report skew-normalized scores alongside raw scores to ensure accurate interpretation of results. (Jeni, Cohn, and Torre 2013)

2.5 Attention Mechanism

Utilize a multiscale visualization tool to better understand complex attention mechanisms in transformer models, allowing for the identification of biases, location of relevant attention heads, and linking of neurons to model behavior. (Vig 2019)

2.6 Autoencoder

Consider using deterministic autoencoders with explicit regularization schemes for the decoder as a simpler and potentially more effective alternative to variational autoencoders for generative modeling tasks. (Ghosh et al. 2019)
Consider using a hierarchical Vector Quantized Variational AutoEncoder (VQ-VAE) model for large scale image generation, as it enables the generation of high-coherence and high-fidelity synthetic samples, while being scalable and efficient due to its use of simple feed-forward encoder and decoder networks and fast sampling in the compressed latent space. (Razavi, Oord, and Vinyals 2019)

2.7 Batch Normalization

Carefully consider the choice of batch in BatchNorm, including its size, data source, and algorithm for computing statistics, as different choices can lead to inconsistencies and affect the generalization of models. (Wu and Johnson 2021)
Carefully consider the potential for variance shift when combining batch normalization (BN) and dropout techniques in neural networks, as the differing ways these methods handle variance can lead to numerical instability and decreased performance. (Li et al. 2018)

2.8 Reproducibility In Machine Learning

Vary the random seed in your deep learning experiments and analyze the distribution of scores to assess the potential impact of randomness on your results. (Caron et al. 2021)
Consider both algorithmic and implementation-level sources of non-determinism when evaluating deep learning models, as these factors can significantly impact model performance and lead to inconsistent results across identical training runs. (Pham et al. 2020)

2.9 Transfer Learning

Utilize a combination of pre-training and supervised fine-tuning when developing language models, allowing for effective transfer learning and high performance on a wide range of tasks. (Al-Rfou et al. 2018)
Consider leveraging pre-trained deep CNN models on large and diverse datasets for unsupervised classification tasks, as they can outperform more sophisticated and specifically tailored image-set clustering methods. (Guérin et al. 2017)

2.10 Automatic Differentiation

Utilize automatic differentiation variational inference (ADVI) to enable rapid iteration and exploration of complex probabilistic models, allowing for efficient and accurate estimation of model parameters without requiring manual derivation of algorithms. (Abadi et al. 2016)

2.11 Boosting

Carefully consider the trade-offs between computational resources and model quality when selecting a gradient boosting decision tree (GBDT) algorithm and its hyperparameters, especially when working with large-scale datasets and limited time or hardware constraints. (Anghel et al. 2018)

2.12 Clustering

Consider creating new clustering methods by systematically combining and modifying components of existing methods, guided by a comprehensive taxonomy of clustering methods that utilize deep neural networks. (Aljalbout et al. 2018)

2.13 Comparison Of Classifiers

Aim to compare a large number of classifiers from various families across numerous datasets to increase the likelihood of identifying the best performing model for a particular dataset, rather than relying solely on familiar or commonly used classifiers. (Aha, Kibler, and Albert 1991)

2.14 Computer Vision In Politics

Consider using automated coding techniques, specifically machine learning algorithms, to efficiently and accurately code large volumes of video data, achieving comparable levels of accuracy to traditional human coding methods. (Tarr, Hwang, and Imai 2022)

2.15 Deep Reinforcement Learning

Carefully select and preprocess your training data to improve the signal-to-noise ratio, specifically by focusing on periods of high price activity, which can lead to better performance of reinforcement learning algorithms in high frequency trading applications. (Briola et al. 2021)

2.16 Embeddings

Consider using the epsilon-four-points condition (?-4PC) as a measure of proximity to tree metrics, as it is a scalable and easily verifiable condition that accurately reflects the hierarchical nature of complex networks like the Internet. (Abraham et al. 2007)

2.17 Expectation Maximization Algorithm

Consider using the Orthogonalizing Expectation Maximization (OEM) algorithm for penalized regression analysis in situations involving “tall” data sets, where the number of observations far exceeds the number of variables, as it offers significant computational advantages compared to other methods. (Huling and Qian 2018)

2.18 Gaussian Mixture Model

Carefully consider the assumptions of independence and homogeneity in event count models, as violations of these assumptions can lead to inefficient estimates and biased standard errors. (King 1989)

2.19 Generative Models

Utilize a deep generative model-based framework like Credence to validate causal inference methods, as it generates synthetic data anchored at the empirical distribution for the observed sample, enabling users to specify ground truth for causal effects and confounding bias, and evaluate the potential performance of various causal estimation methods on data similar to the observed sample. (Cui and Tchetgen 2019)

2.20 Interaction Model

Carefully evaluate the appropriateness of the Linear Interaction Effect (LIE) assumption in multiplicative interaction models, as it often fails in empirical settings, leading to potentially biased and inconsistent estimates. Additionally, researchers should ensure adequate common support in the data to avoid making inferences based solely on interpolation or extrapolation. (Hainmueller, Mummolo, and Xu 2018)

2.21 Interactive Machine Learning

Consider incorporating dynamic memory of user feedback into your models to enable continual system improvement without the need for model retraining. (Mishra, Tafjord, and Clark 2022)

2.22 Isaac Gym

Consider implementing an end-to-end GPU-accelerated training pipeline for robotics simulations to achieve significant speed-ups in training complex environments, as demonstrated by the development of Isaac Gym. (Makoviychuk et al. 2021)

2.23 Leave Future Out Cross Validation

Use leave-future-out cross-validation (LFO-CV) instead of leave-one-out cross-validation (LOO-CV) for time series analysis to avoid overly optimistic estimates caused by the availability of future information during the prediction of past observations. (Bürkner, Gabry, and Vehtari 2020)

2.24 Machine Learning Pitfalls

Meticulously manage your data, including setting aside independent test sets early on, avoiding data leakage, and ensuring that feature selection and dimensionality reduction are treated as part of the model training process. (Lones 2021)

2.25 Manifold Learning

Focus on generating samples from the target distribution on the manifold rather than the input space, especially when dealing with high dimensional data and complex models, as this leads to more accurate and representative estimates of the underlying population. (Oh et al. 2013)

2.26 Merlion Machine Learning Library

Consider using Merlion, an open-source machine learning library specifically designed for time series analysis, which offers a unified interface for various models and datasets, standard pre/post-processing layers, visualization tools, anomaly score calibration, AutoML for hyperparameter tuning and model selection, and model ensembling, allowing for rapid development and benchmarking of models for specific time series needs. (Bhatnagar et al. 2021)

2.27 Meta-learning

Consider meta-learning unsupervised update rules for unsupervised representation learning, specifically targeting semi-supervised classification performance, and constraining the update rule to be a biologically-motivated, neuron-local function to improve generalizability across different neural network architectures, datasets, and data modalities. (Metz et al. 2018)

2.28 Multi-label Classification

Consider using the mldr package in R for working with multilabel datasets, which provides functions for loading, analyzing, and manipulating such datasets, as well as applying binary relevance (BR) and label powerset (LP) transformations to enable the use of traditional binary and multiclass classification models. (Charte and Charte 2015)

2.30 Nonparametric Autoencoder

Consider combining Bayesian nonparametric methods with variational autoencoders to enable greater modeling flexibility and structured interpretability in unsupervised representation learning tasks. (Bowman et al. 2015)

2.31 One-class Classification

Consider extending traditional Random Forests (RFs) to one-class classification problems by developing a natural methodology to adapt standard splitting criteria to the one-class setting, allowing for structural generalizations of RFs to one-class classification. (Goix et al. 2016)

2.32 Optimizers

Evaluate multiple optimizers with default hyperparameters, as this approach performs approximately as well as tuning the hyperparameters for a fixed optimizer, and can save valuable computational resources. (Schmidt, Schneider, and Hennig 2020)

2.33 Overfitting

Carefully consider the potential impact of adaptive overfitting when using holdout data for model evaluation, especially in situations where the test set is reused frequently or the sample size is small. (Chen and Guestrin 2016)

2.34 PAC-Bayes

Consider using PAC-Bayes bounds, which are a type of tool in statistical learning theory, to analyze the generalization ability of aggregated and randomized predictors, as these bounds do not rely on minimization problems and can handle complex models such as neural networks. (Alquier 2021)

2.35 Predictive Maintenance Framework

Explicitly define the predictimand, or the specific question about treatment effects that your clinical prediction model aims to answer, as this choice determines the appropriate statistical approach and ensures accurate interpretation of results. (Geloven et al. 2020)

2.36 Pretrained Model

Consider utilizing pre-trained models (PTMs) for natural language processing (NLP) tasks, as they offer significant benefits such as learning universal language representations, providing better model initializations, acting as a regularization technique to prevent overfitting, and reducing the reliance on labeled data through leveraging large-scale unlabeled corpora. (Qiu et al. 2020)

2.37 Proximal Causal Learning

Attempt to identify and categorize proxy variables into three buckets - those that are common causes of treatment and outcome, treatment-inducing confounding proxies, and outcome-inducing confounding proxies - in order to enable proximal causal learning and improve causal inferences in situations where traditional exchangeability assumptions fail. (Tchetgen et al. 2020)

2.38 Pure Prediction Algorithms

Carefully consider whether your primary objective is accurate prediction or understanding the underlying scientific truth when choosing between traditional regression methods and newer pure prediction algorithms. (Efron 2020)

2.39 Quantized LLMs

Consider using the QLoRA approach for efficient fine-tuning of large language models, which combines 4-bit NormalFloat quantization, Double Quantization, and Paged Optimizers to reduce memory usage while maintaining full 16-bit finetuning task performance. (Dettmers et al. 2023)

2.40 Random Forest

Carefully consider the impact of subsampling rate and tree depth when using Breimans random forests, as properly tuning these parameters can significantly improve the accuracy of the model. (Duroux and Scornet 2016)

2.41 Reinforcement Learning

Carefully consider the limitations of Markov reward functions in expressing complex tasks, as there are certain tasks that cannot be accurately captured by these functions. Therefore, researchers should explore alternative formulations of the problem when encountering such tasks. (Abel et al. 2021)

2.42 Representation Learning

Consider incorporating both word-level and entity-level features when developing models for text classification tasks, as demonstrated by the improved performance of the TextEnt-full model over the TextEnt-word and TextEnt-entity models in the entity typing task. (Yamada, Shindo, and Takefuji 2018)

2.43 SHAP

Consider using a novel \(R^{2\) metric based on Shapley decomposition to evaluate feature importance in machine learning models, as it provides a fair allocation of explained variability to each feature, is model-agnostic, and can be computed efficiently using pre-calculated Shapley values. (Redell 2019)

2.44 Self-supervised Learning

Carefully consider the impact of the choice of learning objective on the learned representations, especially in the final layers, when using self-supervised or supervised methods for visual deep learning. (Grigg et al. 2021)

2.45 Sequential Model Based Optimization

Consider using the flexible and comprehensive R toolbox, mlrMBO, for model-based optimization (MBO) when dealing with expensive black-box functions, as it enables approximation of the objective function through a surrogate regression model, supports single- and multi-objective optimization with mixed continuous, categorical, and conditional parameters, and is implemented in a modular fashion allowing for easy replacement or adaptation of components. (Bischl et al. 2017)

2.46 Snorkel Software

Consider utilizing weak supervision, specifically through the use of noisy, programmatically-generated training data, to address the common issue of limited labeled training data in machine learning projects. (Dehghani et al. 2017)

2.47 Statistical Comparison Of Classifiers

Utilize non-parametric tests like the Wilcoxon signed ranks test for comparing two classifiers and the Friedman test with corresponding post-hoc tests for comparing multiple classifiers over multiple data sets, as these tests are simple, safe, and robust alternatives to parametric tests that rely on strong assumptions. (Purg et al. 2023)

2.48 Statistical Learning Theory

Carefully distinguish between descriptive and causal inferences, utilizing counterfactual frameworks and potential outcome models to accurately estimate causal effects while controlling for confounding factors. (Gelman and Vehtari 2020)

2.49 Survival Analysis

Consider using the ggRandomForests package to visualize and explore the structure of your random forest models, as it provides separation of data and figures, modularity of data objects/figures, and flexibility in modifying the output using ggplot2 functions. (Ehrlinger 2016)

2.50 Synthetic Data Generation

Consider using synthetic text generated by large language models (LLMs) to overcome obstacles in supervised text analysis, such as the high cost of labeling and retrieval, and copyright restrictions, while still maintaining transparency, reproducibility, and interpretability. (Jankowski and Huber 2023)

2.51 TensorFlow Distributions

Carefully consider the shape semantics of your data when working with probability distributions, particularly distinguishing between sample, batch, and event shapes, to ensure efficient and accurate analysis. (Dillon et al. 2017)

2.52 TensorFlow Probability

Utilize the flexibility of TensorFlow Probability JointDistributions to specify complex probabilistic models using either imperative or declarative styles, leveraging the shared interface for inference algorithms and the ability to easily switch between different model specifications. (Piponi, Moore, and Dillon 2020)

2.53 Understanding Machine Learning

Consider the trade-offs between bias and variance when selecting a machine learning model, as well as the importance of evaluating model performance using methods such as cross-validation and train-test splits. (Chang, Weiss, and Freeman 2009)

2.54 Unsupervised Feature Learning

Distinguish the contributions of architectures from those of learning systems by reporting random weight performance, as a sizeable component of a systems performance can come from the intrinsic properties of the architecture, and not from the learning system. (Gray 2005)

2.55 Variational Inference

Utilize the Pareto Smoothed Importance Sampling (PSIS) diagnostic tool to assess the quality of your variational inference (VI) approximations, as it provides a continuous estimate of the Renyi divergence between the true and approximated posteriors, allowing for early detection of potentially disastrous VI approximations. (Yao et al. 2018)

2.56 Weak Supervision

Consider utilizing a robust PCA-based algorithm for learning dependency structures in weak supervision models, as it can lead to improved theoretical recovery rates and superior performance on real-world tasks compared to existing methods that ignore sparsity patterns or make assumptions about conditional independence. (Varma et al. 2019)

2.57 Wikipedia Infobox Completion

Consider utilizing both word and network embeddings when attempting to predict Wikipedia infobox types, particularly when working with limited information such as tables of contents and named entities in article abstracts. (Biswas et al. 2023)

3 Artificial intelligence

3.1 NA

Consider incorporating knowledge-guided linguistic rewrites as a secondary source of evidence when generating inference rule corpora, as it can significantly improve the precision of the rules without sacrificing substantial recall. (Jain, Rathi, and Chakrabarti 2020)
Consider both the degree of constrainedness and the availability of positive examples when studying the learnability of boolean formulas using deep neural networks, as these factors can significantly affect the performance of the models. (Nicolau et al. 2020)
Consider using automated methods, such as mining-based and paraphrasing-based approaches, to generate diverse and high-quality prompts for querying language models, rather than relying solely on manually created prompts, in order to more accurately estimate the knowledge contained in the models. (McCann et al. 2018)

3.2 Explainable Artificial Intelligence

Pay close attention to the issue of disagreement between explanations generated by different post hoc explanation methods, as it frequently arises in practice and can have significant implications for model interpretation and decision-making. Moreover, there is a lack of principled, well-established approaches for resolving such disagreements, suggesting a need for further research in this area. (Krishna et al. 2022)
Consider generating explanations for AI systems in the form of entailment trees, which are hierarchical structures that capture the logical relationships between premises and conclusions, and can help improve the interpretability and debuggability of AI systems. (Dalvi et al. 2021)
Carefully select appropriate explainable artificial intelligence (XAI) methods based on the model structure and the purpose of the explanation, recognizing that model-agnostic methods offer greater flexibility across different types of models but may sacrifice accuracy compared to model-specific or inherently interpretable methods. (Maksymiuk, Gosiewska, and Biecek 2020)
Aim to develop automatic concept-based explanation methods that prioritize meaningfulness, coherence, and importance in identifying higher-level human-understandable concepts applicable across datasets, as opposed to focusing solely on feature importance scores for individual inputs. (Ghorbani et al. 2019)

3.3 Atomic Commonsense Reasoning Dataset

Consider organizing commonsense knowledge into typed if-then relations with variables, and distinguishing between causes vs. effects, agents vs. themes, voluntary vs. involuntary events, and actions vs. mental states, as this approach leads to improved accuracy in commonsense reasoning tasks. (Sap et al. 2018)

3.4 Language Models And Legal Reasoning

Actively involve domain experts in the creation of evaluation tasks for large language models (LLMs) to ensure that the tasks accurately reflect real-world scenarios and enable meaningful engagement in discussions of LLM performance using familiar terminology and conceptual frameworks. (Guha et al. 2023)

3.5 Neuralsymbolic Integration

Consider combining symbolic and subsymbolic knowledge representations in your language models, allowing for improved interpretability, adaptability, and control over the models factual information. (Verga et al. 2020)

4 Artificial neural networks

4.1 Attention Mechanism

Consider analyzing the relationship between in-context learning in Transformers and gradient-based optimization techniques, particularly in the context of auto-regressive tasks, as the authors propose that in-context learning in the Transformer forward pass is implemented via gradient-based optimization of an implicit auto-regressive inner loss constructed from its in-context data. (Oswald et al. 2022)

4.2 Convolutional Neural Network

Carefully consider the choice of hyperparameters in your convolutional neural network models, particularly the number of filters, filter size, activation function, and stride, as these decisions can significantly impact the performance of the model. (Thoma 2017)
Employ a multi-granularity and multi-perspective approach to modeling sentence similarity using convolutional neural networks, combining both holistic and per-dimension filters, along with various pooling methods, to effectively capture diverse linguistic patterns and enhance overall performance. (He, Gimpel, and Lin 2015)

4.3 Deep Residual Network

Consider optimizing your deep learning models for super-resolution tasks by simplifying the network architecture, modifying the loss function, and transferring knowledge from pre-trained models at other scales. (Lim et al. 2017)

4.4 Neural Network

Employ statistical models capable of handling complex, nonlinear, and contingent relationships, such as neural network models, especially when studying rare events like international conflicts, where traditional linear-normal models may miss important nuances due to the rarity and heterogeneity of the phenomenon. (Beck, King, and Zeng 2000)

4.5 Recurrent Neural Networks

Consider decomposing the output of an LSTM into a product of factors, where each factor represents the contribution of a particular word, in order to gain insights into the underlying learned patterns of the model. (Murdoch and Szlam 2017)

4.6 Transformers

Consider utilizing a knowledge attribution method to identify knowledge neurons in pretrained transformers, as these neurons have been found to be positively correlated with the expression of your corresponding facts, allowing for targeted editing of specific factual knowledge without fine-tuning. (Dai et al. 2021)

References

Abadi, Martín, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. “TensorFlow: A System for Large-Scale Machine Learning.” arXiv. https://doi.org/10.48550/ARXIV.1605.08695.

Abdar, Moloud, U. Rajendra Acharya, Nizal Sarrafzadegan, and Vladimir Makarenkov. 2019. “NE-Nu-SVC: A New Nested Ensemble Clinical Decision Support System for Effective Diagnosis of Coronary Artery Disease.” IEEE Access 7. https://doi.org/10.1109/access.2019.2953920.

Abel, David, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh. 2021. “On the Expressivity of Markov Reward.” arXiv. https://doi.org/10.48550/ARXIV.2111.00876.

Abraham, Ittai, Mahesh Balakrishnan, Fabian Kuhn, Dahlia Malkhi, Venugopalan Ramasubramanian, and Kunal Talwar. 2007. “Reconstructing Approximate Tree Metrics.” Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, August. https://doi.org/10.1145/1281100.1281110.

Aha, David W., Dennis Kibler, and Marc K. Albert. 1991. Machine Learning 6. https://doi.org/10.1023/a:1022689900470.

Alayrac, Jean-Baptiste, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, et al. 2022. “Flamingo: A Visual Language Model for Few-Shot Learning.” arXiv. https://doi.org/10.48550/ARXIV.2204.14198.

Aljalbout, Elie, Vladimir Golkov, Yawar Siddiqui, Maximilian Strobel, and Daniel Cremers. 2018. “Clustering with Deep Learning: Taxonomy and New Methods.” arXiv. https://doi.org/10.48550/ARXIV.1801.07648.

Alquier, Pierre. 2021. “User-Friendly Introduction to PAC-Bayes Bounds.” arXiv. https://doi.org/10.48550/ARXIV.2110.11216.

Al-Rfou, Rami, Dokook Choe, Noah Constant, Mandy Guo, and Llion Jones. 2018. “Character-Level Language Modeling with Deeper Self-Attention.” arXiv. https://doi.org/10.48550/ARXIV.1808.04444.

Alt, Christoph, Marc Hübner, and Leonhard Hennig. 2019. “Improving Relation Extraction by Pre-Trained Language Representations.” arXiv. https://doi.org/10.48550/ARXIV.1906.03088.

Anghel, Andreea, Nikolaos Papandreou, Thomas Parnell, Alessandro De Palma, and Haralampos Pozidis. 2018. “Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms.” arXiv. https://doi.org/10.48550/ARXIV.1809.04559.

Beck, Nathaniel, Gary King, and Langche Zeng. 2000. “Improving Quantitative Studies of International Conflict: A Conjecture.” American Political Science Review 94 (March). https://doi.org/10.2307/2586378.

Bhatnagar, Aadyot, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, et al. 2021. “Merlion: A Machine Learning Library for Time Series.” arXiv. https://doi.org/10.48550/ARXIV.2109.09265.

Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, and Michel Lang. 2017. “mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions.” arXiv. https://doi.org/10.48550/ARXIV.1703.03373.

Biswas, Russa, Lucie-Aimée Kaffee, Michael Cochez, Stefania Dumbrava, Theis E. Jendal, Matteo Lissandrini, Vanessa Lopez, et al. 2023. “Knowledge Graph Embeddings: Open Challenges and Opportunities.” Schloss Dagstuhl – Leibniz-Zentrum Für Informatik. https://doi.org/10.4230/TGDK.1.1.4.

Bowman, Samuel R., Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2015. “Generating Sentences from a Continuous Space.” arXiv. https://doi.org/10.48550/ARXIV.1511.06349.

Branco, Paula, Luis Torgo, and Rita Ribeiro. 2015. “A Survey of Predictive Modelling Under Imbalanced Distributions.” arXiv. https://doi.org/10.48550/ARXIV.1505.01658.

Briola, Antonio, Jeremy Turiel, Riccardo Marcaccioli, Alvaro Cauderan, and Tomaso Aste. 2021. “Deep Reinforcement Learning for Active High Frequency Trading.” arXiv. https://doi.org/10.48550/ARXIV.2101.07107.

Brunner, Gino, Yang Liu, Damián Pascual, Oliver Richter, Massimiliano Ciaramita, and Roger Wattenhofer. 2019. “On Identifiability in Transformers.” arXiv. https://doi.org/10.48550/ARXIV.1908.04211.

Bürkner, Paul-Christian, Jonah Gabry, and Aki Vehtari. 2020. “Approximate Leave-Future-Out Cross-Validation for Bayesian Time Series Models.” Journal of Statistical Computation and Simulation 90 (June). https://doi.org/10.1080/00949655.2020.1783262.

Caron, Mathilde, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. “Emerging Properties in Self-Supervised Vision Transformers.” arXiv. https://doi.org/10.48550/ARXIV.2104.14294.

Chang, Hyun Sung, Yair Weiss, and William T. Freeman. 2009. “Informative Sensing.” arXiv. https://doi.org/10.48550/ARXIV.0901.4275.

Charte, Francisco, and David Charte. 2015. “Working with Multilabel Datasets in r: The Mldr Package.” The R Journal 7. https://doi.org/10.32614/rj-2015-027.

Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August. https://doi.org/10.1145/2939672.2939785.

Chepoi, Victor, Feodor F. Dragan, Bertrand Estellon, Michel Habib, Yann Vaxès, and Yang Xiang. 2010. “Additive Spanners and Distance and Routing Labeling Schemes for Hyperbolic Graphs.” Algorithmica 62 (December). https://doi.org/10.1007/s00453-010-9478-x.

Cui, Yifan, and Eric Tchetgen Tchetgen. 2019. “Selective Machine Learning of Doubly Robust Functionals.” arXiv. https://doi.org/10.48550/ARXIV.1911.02029.

Dai, Damai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, and Furu Wei. 2021. “Knowledge Neurons in Pretrained Transformers.” arXiv. https://doi.org/10.48550/ARXIV.2104.08696.

Dalvi, Bhavana, Peter Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, and Peter Clark. 2021. “Explaining Answers with Entailment Trees.” arXiv. https://doi.org/10.48550/ARXIV.2104.08661.

Dehghani, Mostafa, Aliaksei Severyn, Sascha Rothe, and Jaap Kamps. 2017. “Learning to Learn from Weak Supervision by Full Supervision.” arXiv. https://doi.org/10.48550/ARXIV.1711.11383.

Deng, Songgaojun, and Yue Ning. 2021. “A Survey on Societal Event Forecasting with Deep Learning.” arXiv. https://doi.org/10.48550/ARXIV.2112.06345.

Dettmers, Tim, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. “QLoRA: Efficient Finetuning of Quantized LLMs.” arXiv. https://doi.org/10.48550/ARXIV.2305.14314.

Dillon, Joshua V., Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, and Rif A. Saurous. 2017. “TensorFlow Distributions.” arXiv. https://doi.org/10.48550/ARXIV.1711.10604.

Duroux, Roxane, and Erwan Scornet. 2016. “Impact of Subsampling and Pruning on Random Forests.” arXiv. https://doi.org/10.48550/ARXIV.1603.04261.

Efron, Bradley. 2020. “Prediction, Estimation, and Attribution.” International Statistical Review 88 (December). https://doi.org/10.1111/insr.12409.

Ehrlinger, John. 2016. “ggRandomForests: Exploring Random Forest Survival.” arXiv. https://doi.org/10.48550/ARXIV.1612.08974.

Ganea, Octavian-Eugen, Gary Bécigneul, and Thomas Hofmann. 2018. “Hyperbolic Neural Networks.” arXiv. https://doi.org/10.48550/ARXIV.1805.09112.

Gelman, Andrew, and Aki Vehtari. 2020. “What Are the Most Important Statistical Ideas of the Past 50 Years?” arXiv. https://doi.org/10.48550/ARXIV.2012.00174.

Geloven, Nan van, Sonja A. Swanson, Chava L. Ramspek, Kim Luijken, Merel van Diepen, Tim P. Morris, Rolf H. H. Groenwold, Hans C. van Houwelingen, Hein Putter, and Saskia le Cessie. 2020. “Prediction Meets Causal Inference: The Role of Treatment in Clinical Prediction Models.” European Journal of Epidemiology 35 (May). https://doi.org/10.1007/s10654-020-00636-1.

Ghorbani, Amirata, James Wexler, James Zou, and Been Kim. 2019. “Towards Automatic Concept-Based Explanations.” arXiv. https://doi.org/10.48550/ARXIV.1902.03129.

Ghosh, Partha, Mehdi S. M. Sajjadi, Antonio Vergari, Michael Black, and Bernhard Schölkopf. 2019. “From Variational to Deterministic Autoencoders.” arXiv. https://doi.org/10.48550/ARXIV.1903.12436.

Goix, Nicolas, Nicolas Drougard, Romain Brault, and Maël Chiapino. 2016. “One Class Splitting Criteria for Random Forests.” arXiv. https://doi.org/10.48550/ARXIV.1611.01971.

Gray, Robert M. 2005. “Toeplitz and Circulant Matrices: A Review.” Foundations and Trends® in Communications and Information Theory 2. https://doi.org/10.1561/0100000006.

Grigg, Tom George, Dan Busbridge, Jason Ramapuram, and Russ Webb. 2021. “Do Self-Supervised and Supervised Methods Learn Similar Visual Representations?” arXiv. https://doi.org/10.48550/ARXIV.2110.00528.

Guérin, Joris, Olivier Gibaru, Stéphane Thiery, and Eric Nyiri. 2017. “CNN Features Are Also Great at Unsupervised Classification.” arXiv. https://doi.org/10.48550/ARXIV.1707.01700.

Guha, Neel, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, et al. 2023. “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models.” arXiv. https://doi.org/10.48550/ARXIV.2308.11462.

Hainmueller, Jens, Jonathan Mummolo, and Yiqing Xu. 2018. “How Much Should We Trust Estimates from Multiplicative Interaction Models? Simple Tools to Improve Empirical Practice.” Political Analysis 27 (December). https://doi.org/10.1017/pan.2018.46.

He, Hua, Kevin Gimpel, and Jimmy Lin. 2015. “Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks.” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/d15-1181.

Huling, Jared D., and Peter Z. G. Qian. 2018. “Fast Penalized Regression and Cross Validation for Tall Data with the Oem Package.” arXiv. https://doi.org/10.48550/ARXIV.1801.09661.

Jain, Prachi, Sushant Rathi, and Soumen Chakrabarti. 2020. “Temporal Knowledge Base Completion: New Algorithms and Evaluation Protocols.” arXiv. https://doi.org/10.48550/ARXIV.2005.05035.

Jankowski, Michael, and Robert A. Huber. 2023. “When Correlation Is Not Enough: Validating Populism Scores from Supervised Machine-Learning Models.” Political Analysis 31 (January). https://doi.org/10.1017/pan.2022.32.

Jeni, Laszlo A., Jeffrey F. Cohn, and Fernando De La Torre. 2013. “Facing Imbalanced Data–Recommendations for the Use of Performance Metrics.” 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, September. https://doi.org/10.1109/acii.2013.47.

King, Gary. 1989. “Variance Specification in Event Count Models: From Restrictive Assumptions to a Generalized Estimator.” American Journal of Political Science 33 (August). https://doi.org/10.2307/2111071.

Kipf, Thomas N., and Max Welling. 2016. “Semi-Supervised Classification with Graph Convolutional Networks.” arXiv. https://doi.org/10.48550/ARXIV.1609.02907.

Krishna, Satyapriya, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, and Himabindu Lakkaraju. 2022. “The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective.” arXiv. https://doi.org/10.48550/ARXIV.2202.01602.

Le, Matt, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, and Maximilian Nickel. 2019. “Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings.” arXiv. https://doi.org/10.48550/ARXIV.1902.00913.

Li, Xiang, Shuo Chen, Xiaolin Hu, and Jian Yang. 2018. “Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift.” arXiv. https://doi.org/10.48550/ARXIV.1801.05134.

Lim, Bee, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. “Enhanced Deep Residual Networks for Single Image Super-Resolution.” arXiv. https://doi.org/10.48550/ARXIV.1707.02921.

Lones, Michael A. 2021. “How to Avoid Machine Learning Pitfalls: A Guide for Academic Researchers.” arXiv. https://doi.org/10.48550/ARXIV.2108.02497.

Makoviychuk, Viktor, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, et al. 2021. “Isaac Gym: High Performance GPU-Based Physics Simulation for Robot Learning.” arXiv. https://doi.org/10.48550/ARXIV.2108.10470.

Maksymiuk, Szymon, Alicja Gosiewska, and Przemyslaw Biecek. 2020. “Landscape of r Packages for eXplainable Artificial Intelligence.” arXiv. https://doi.org/10.48550/ARXIV.2009.13248.

Marsden, Mark, Kevin McGuinness, Suzanne Little, and Noel E. O’Connor. 2017. “ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification.” arXiv. https://doi.org/10.48550/ARXIV.1705.10698.

McCann, Bryan, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. “The Natural Language Decathlon: Multitask Learning as Question Answering.” arXiv. https://doi.org/10.48550/ARXIV.1806.08730.

Metz, Luke, Niru Maheswaranathan, Brian Cheung, and Jascha Sohl-Dickstein. 2018. “Meta-Learning Update Rules for Unsupervised Representation Learning.” arXiv. https://doi.org/10.48550/ARXIV.1804.00222.

Mishra, Bhavana Dalvi, Oyvind Tafjord, and Peter Clark. 2022. “Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement.” arXiv. https://doi.org/10.48550/ARXIV.2204.13074.

Murdoch, W. James, and Arthur Szlam. 2017. “Automatic Rule Extraction from Long Short Term Memory Networks.” arXiv. https://doi.org/10.48550/ARXIV.1702.02540.

Nicolau, Marcio, Anderson R. Tavares, Zhiwei Zhang, Pedro Avelar, João M. Flach, Luis C. Lamb, and Moshe Y. Vardi. 2020. “Understanding Boolean Function Learnability on Deep Neural Networks: PAC Learning Meets Neurosymbolic Models.” arXiv. https://doi.org/10.48550/ARXIV.2009.05908.

Oh, Hyun-Jung, Ana Muriel, Hari Balasubramanian, Katherine Atkinson, and Thomas Ptaszkiewicz. 2013. “Guidelines for Scheduling in Primary Care Under Different Patient Types and Stochastic Nurse and Provider Service Times.” IIE Transactions on Healthcare Systems Engineering 3 (October). https://doi.org/10.1080/19488300.2013.858379.

Oswald, Johannes von, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. 2022. “Transformers Learn in-Context by Gradient Descent.” arXiv. https://doi.org/10.48550/ARXIV.2212.07677.

Pham, Hung Viet, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, and Nachiappan Nagappan. 2020. “Problems and Opportunities in Training Deep Learning Software Systems.” Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, December. https://doi.org/10.1145/3324884.3416545.

Piponi, Dan, Dave Moore, and Joshua V. Dillon. 2020. “Joint Distributions for TensorFlow Probability.” arXiv. https://doi.org/10.48550/ARXIV.2001.11819.

Purg, Nina, Jure Demšar, Alan Anticevic, and Grega Repovš. 2023. “Corrigendum: Autohrf-an r Package for Generating Data-Informed Event Models for General Linear Modeling of Task-Based fMRI Data.” Frontiers in Neuroimaging 2 (February). https://doi.org/10.3389/fnimg.2023.1158159.

Qiu, XiPeng, TianXiang Sun, YiGe Xu, YunFan Shao, Ning Dai, and XuanJing Huang. 2020. “Pre-Trained Models for Natural Language Processing: A Survey.” Science China Technological Sciences 63 (September). https://doi.org/10.1007/s11431-020-1647-3.

Razavi, Ali, Aaron van den Oord, and Oriol Vinyals. 2019. “Generating Diverse High-Fidelity Images with VQ-VAE-2.” arXiv. https://doi.org/10.48550/ARXIV.1906.00446.

Redell, Nickalus. 2019. “Shapley Decomposition of r-Squared in Machine Learning Models.” arXiv. https://doi.org/10.48550/ARXIV.1908.09718.

Sap, Maarten, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2018. “ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning.” arXiv. https://doi.org/10.48550/ARXIV.1811.00146.

Schmidt, Robin M., Frank Schneider, and Philipp Hennig. 2020. “Descending Through a Crowded Valley - Benchmarking Deep Learning Optimizers.” arXiv. https://doi.org/10.48550/ARXIV.2007.01547.

Tarr, Alexander, June Hwang, and Kosuke Imai. 2022. “Automated Coding of Political Campaign Advertisement Videos: An Empirical Validation Study.” Political Analysis 31 (November). https://doi.org/10.1017/pan.2022.26.

Tchetgen, Eric J Tchetgen, Andrew Ying, Yifan Cui, Xu Shi, and Wang Miao. 2020. “An Introduction to Proximal Causal Learning.” arXiv. https://doi.org/10.48550/ARXIV.2009.10982.

Thoma, Martin. 2017. “Analysis and Optimization of Convolutional Neural Network Architectures.” arXiv. https://doi.org/10.48550/ARXIV.1707.09725.

Varma, Paroma, Frederic Sala, Ann He, Alexander Ratner, and Christopher Ré. 2019. “Learning Dependency Structures for Weak Supervision Models.” arXiv. https://doi.org/10.48550/ARXIV.1903.05844.

Verga, Pat, Haitian Sun, Livio Baldini Soares, and William W. Cohen. 2020. “Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge.” arXiv. https://doi.org/10.48550/ARXIV.2007.00849.

Vig, Jesse. 2019. “A Multiscale Visualization of Attention in the Transformer Model.” arXiv. https://doi.org/10.48550/ARXIV.1906.05714.

Wolf, Thomas, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, et al. 2019. “HuggingFace’s Transformers: State-of-the-Art Natural Language Processing.” arXiv. https://doi.org/10.48550/ARXIV.1910.03771.

Wu, Yuxin, and Justin Johnson. 2021. “Rethinking "Batch" in BatchNorm.” arXiv. https://doi.org/10.48550/ARXIV.2105.07576.

Yamada, Ikuya, Hiroyuki Shindo, and Yoshiyasu Takefuji. 2018. “Representation Learning of Entities and Documents from Knowledge Base Descriptions.” arXiv. https://doi.org/10.48550/ARXIV.1806.02960.

Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Yes, but Did It Work?: Evaluating Variational Inference.” arXiv. https://doi.org/10.48550/ARXIV.1802.02538.

Zhang, Yi-Fan, Hanlin Zhang, Zachary C. Lipton, Li Erran Li, and Eric P. Xing. 2022. “Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation.” arXiv. https://doi.org/10.48550/ARXIV.2202.01336.