Questions tagged [optimization]
In statistics this refers to selecting an estimator of a parameter by maximizing or minimizing some function of the data. One very common example is choosing an estimator which maximizes the joint density (or mass function) of the observed data referred to as Maximum Likelihood Estimation (MLE).
513 questions
3
votes
4
answers
160
views
Rounding Float Values in ML Models
Let's assume I have a column with float values (e.g., 3.12334354454, 5.75434331354, and so on). If I round these values to two decimal places (e.g., 3.12, 5.75),
I think the advantages and ...
3
votes
1
answer
59
views
Why MAE is hard to optimize?
In numerous sources it is said that MAE has a disadvantage of not being differentiable a zero hence it has problems with gradient-based optimization methods. However I've never saw an explanation why ...
9
votes
4
answers
2k
views
Does training a neural network on a combined dataset outperform sequential training on individual datasets?
I have a neural network with a fixed architecture (let's call it Architecture A). I also have two datasets, Dataset 1 and Dataset 2, both of which are independently and identically distributed (i.i.d.)...
0
votes
0
answers
16
views
Stale weights and gradients given Adam with an optimal learning rate
I'm fitting a network to predict a delta between eight corresponding 3D points at two timesteps.
The model consists of two MLPs with two layers each, with LeakyRELU in between the layers. It takes in ...
1
vote
0
answers
20
views
Second Moment (Uncentered Variance) Estimate of Gradient
I am reading Kingma and Lei Ba's paper introducing the Adam optimizer. I was looking over some derivations for the second moment estimate:
I noticed that they find the sum of a finite geometric ...
0
votes
0
answers
9
views
Nesterov Accelerated Gradient Descent Stalling with High Regularization in Extreme Learning Machine
I'm implementing Nesterov Accelerated Gradient Descent (NAG) on an Extreme Learning Machine (ELM) with one hidden layer. My loss function is the Mean Squared Error (MSE) with L2 regularization.
The ...
0
votes
0
answers
23
views
Question on Optimized Threshold in Predictive Modeling
I'm trying to build a predictive model, but I haven't found a method that consistently delivers high performance.
Is it acceptable to use an # Optimize classification threshold
0.996 ?
0
votes
0
answers
15
views
Optimizing LLM-Based Field Extraction Across 3500+ Document Templates
We are using Azure Document Intelligence to extract all content from PDFs containing 1-7 pages. After extraction, we pass the content to an LLM (OpenAI) to extract only the required 35-40 fields.
The ...
0
votes
0
answers
15
views
Error in plotting Gaussian Process for 3 models that use Bayesian Optimization
I'm writing a python script for Orange Data Mining to plot the gaussian processes in order to find the best hyperparameters for the 5-FoldCrossValidation Accuracy metric. The three models are SVC, ...
0
votes
0
answers
31
views
Objective function in reward model in Vanilla RLHF is ambiguous for me
I am trying to learn the background of Vanilla RLHF. I am struggling to understand the objective function in reward model. It is defined
If the difference of the log of the sigmoid of the difference ...
5
votes
2
answers
644
views
Is there any advantage of a lower value of a loss function?
I have two loss functions $\mathcal{L}_1$ and $\mathcal{L}_2$ to train my model. The model is predominantly a classification model. Both $\mathcal{L}_1$ and $\mathcal{L}_2$ takes are two variants of ...
2
votes
0
answers
33
views
Effect of objective function's Hessian's condition number on learning rate in Gradient Descent
I'm following Ian Goodfellow et al. book titled Deep Learning, and in Chapter 4 - Numerical Computation, page 87, he mentions that by utilising second order Taylor approximation of the objective ...
0
votes
0
answers
17
views
How to improve LSTM model performance for weather prediction?
I predict rainfall using observational data. There are a total of 87,070 data samples, but only 1,885 samples have rainfall.
And here is the LSTM model I am using:
...
0
votes
0
answers
17
views
Given the total cost of a graph walk, how to estimate the cost of each edge?
I have a real-world problem in which I have a collection of nodes and their edges. This collection is composed of hundreds of nodes and thousands of connections. Then I have about 10 K datapoints each ...
4
votes
3
answers
247
views
What is best package for convex optimization?
I have a set of problems of the form $\text{min} \|Ax-y\|_1$ with some constraints on the $x_i$. A quick search turns up the cvxpy, ...