Machine Learning (ML)

These terms encapsulate the core concepts, methodologies, and challenges within the field of machine learning, from the basics of model training and evaluation to the complexities of different learning paradigms.

Accuracy - A metric used to evaluate the performance of a machine learning model, indicating the percentage of correct predictions made by the model.

Activation Function - A function in a neural network that determines the output of a node given an input or set of inputs, crucial for adding non-linearity to the model's learning process.

AlphaFold - DeepMind's AI for predicting the 3D structures of proteins.

Autoencoder - A type of neural network used in unsupervised learning tasks, such as feature learning and dimensionality reduction, by learning to encode input data as representations and then decode these representations back to the original format.

Automated Machine Learning (AutoML) - The process of automating the process of applying machine learning to real-world problems.

Classification - A type of supervised machine learning task where the model is trained to categorize input data into predefined labels or classes.

Clustering - An unsupervised learning technique where the algorithm groups a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.

Contrastive Learning - A technique used in unsupervised learning that aims to learn representations by contrasting positive pairs against negative pairs.

Convolutional Neural Network (CNN) - A deep learning algorithm particularly effective for image and video recognition, spatial hierarchies of features.

Cross-Validation - A technique in machine learning to assess the generalizability of a model, involving partitioning the data into subsets, training the model on one subset, and validating it on another.

Data Cleaning - The process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.

Data Exploration - The initial phase in data analysis, where users explore a large set of data and develop initial insights, patterns, and trends.

Deep Learning (DL) - A subset of machine learning involving neural networks with multiple layers that learn representations of data with multiple levels of abstraction.

Dimensionality Reduction - The process of reducing the number of random variables under consideration, by obtaining a set of principal variables.

Feature Engineering - The process of selecting, modifying, or creating new input variables to improve the performance of machine learning models.

Label - In supervised learning, a label is the answer or outcome that the model is trained to predict, based on the input data.

Labeled Data - Data that has been tagged with one or more labels, identifying certain properties or categories, essential for supervised learning.

Loss Function - A function that measures the difference between the actual output of the model and the expected output, used to guide the optimization of the model parameters.

Machine Learning (ML) - The field of study that gives computers the ability to learn from data without being explicitly programmed, focusing on the development of algorithms that can learn from and make predictions on data.

Markov Decision Process (MDP) - A mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker, used in reinforcement learning.

Masked Language Modeling (MLM) - A fill-in-the-blank task, where a model uses the context to predict the masked words in a sentence.

Model - In machine learning, a model is the representation learned from data; a mathematical structure that makes predictions based on input data.

Overfitting - A modeling error that occurs when a machine learning model learns the detail and noise in the training data to the extent that it negatively impacts the model's performance on new data.

Regression - A type of predictive modeling technique in machine learning that involves predicting a continuous outcome variable based on one or more predictor variables.

Reinforcement Learning (RL) - A type of machine learning where an agent learns to make decisions by performing actions and receiving feedback in the form of rewards or penalties.

Self-Supervised Learning - A form of unsupervised learning where the data itself provides supervision.

Semi-Supervised Learning - A machine learning approach that involves a small amount of labeled data and a large amount of unlabeled data during training.

Supervised Learning - A machine learning task where the model is trained on a labeled dataset, which includes both the input data and the correct output, and the model learns to predict the output from the input data.

Train vs. Test - The practice in machine learning of dividing a dataset into a training set used to train the model, and a test set used to evaluate its performance.

Unlabeled Data - Data that does not have explicit labels, making it suitable for unsupervised learning tasks in machine learning, such as clustering or dimensionality reduction.

Unsupervised Learning - A type of machine learning where models learn patterns from unlabeled data without any explicit instructions on what to predict.