Skip to main content

Questions tagged [optimization]

In statistics this refers to selecting an estimator of a parameter by maximizing or minimizing some function of the data. One very common example is choosing an estimator which maximizes the joint density (or mass function) of the observed data referred to as Maximum Likelihood Estimation (MLE).

Filter by
Sorted by
Tagged with
3 votes
4 answers
160 views

Rounding Float Values in ML Models

Let's assume I have a column with float values (e.g., 3.12334354454, 5.75434331354, and so on). If I round these values to two decimal places (e.g., 3.12, 5.75), I think the advantages and ...
Guna's user avatar
  • 390
3 votes
1 answer
59 views

Why MAE is hard to optimize?

In numerous sources it is said that MAE has a disadvantage of not being differentiable a zero hence it has problems with gradient-based optimization methods. However I've never saw an explanation why ...
Nourless's user avatar
  • 163
9 votes
4 answers
2k views

Does training a neural network on a combined dataset outperform sequential training on individual datasets?

I have a neural network with a fixed architecture (let's call it Architecture A). I also have two datasets, Dataset 1 and Dataset 2, both of which are independently and identically distributed (i.i.d.)...
Arvind Kumar Sharma's user avatar
0 votes
0 answers
16 views

Stale weights and gradients given Adam with an optimal learning rate

I'm fitting a network to predict a delta between eight corresponding 3D points at two timesteps. The model consists of two MLPs with two layers each, with LeakyRELU in between the layers. It takes in ...
zak's user avatar
  • 31
1 vote
0 answers
20 views

Second Moment (Uncentered Variance) Estimate of Gradient

I am reading Kingma and Lei Ba's paper introducing the Adam optimizer. I was looking over some derivations for the second moment estimate: I noticed that they find the sum of a finite geometric ...
Mateo del Rio Lanse's user avatar
0 votes
0 answers
9 views

Nesterov Accelerated Gradient Descent Stalling with High Regularization in Extreme Learning Machine

I'm implementing Nesterov Accelerated Gradient Descent (NAG) on an Extreme Learning Machine (ELM) with one hidden layer. My loss function is the Mean Squared Error (MSE) with L2 regularization. The ...
Paolo Pedinotti's user avatar
0 votes
0 answers
23 views

Question on Optimized Threshold in Predictive Modeling

I'm trying to build a predictive model, but I haven't found a method that consistently delivers high performance. Is it acceptable to use an # Optimize classification threshold 0.996 ?
waleed almutairi's user avatar
0 votes
0 answers
15 views

Optimizing LLM-Based Field Extraction Across 3500+ Document Templates

We are using Azure Document Intelligence to extract all content from PDFs containing 1-7 pages. After extraction, we pass the content to an LLM (OpenAI) to extract only the required 35-40 fields. The ...
Johnimmanuel's user avatar
0 votes
0 answers
15 views

Error in plotting Gaussian Process for 3 models that use Bayesian Optimization

I'm writing a python script for Orange Data Mining to plot the gaussian processes in order to find the best hyperparameters for the 5-FoldCrossValidation Accuracy metric. The three models are SVC, ...
Mattma's user avatar
  • 1
0 votes
0 answers
31 views

Objective function in reward model in Vanilla RLHF is ambiguous for me

I am trying to learn the background of Vanilla RLHF. I am struggling to understand the objective function in reward model. It is defined If the difference of the log of the sigmoid of the difference ...
Baghban's user avatar
  • 101
5 votes
2 answers
644 views

Is there any advantage of a lower value of a loss function?

I have two loss functions $\mathcal{L}_1$ and $\mathcal{L}_2$ to train my model. The model is predominantly a classification model. Both $\mathcal{L}_1$ and $\mathcal{L}_2$ takes are two variants of ...
Aleph's user avatar
  • 185
2 votes
0 answers
33 views

Effect of objective function's Hessian's condition number on learning rate in Gradient Descent

I'm following Ian Goodfellow et al. book titled Deep Learning, and in Chapter 4 - Numerical Computation, page 87, he mentions that by utilising second order Taylor approximation of the objective ...
Aditya's user avatar
  • 121
0 votes
0 answers
17 views

How to improve LSTM model performance for weather prediction?

I predict rainfall using observational data. There are a total of 87,070 data samples, but only 1,885 samples have rainfall. And here is the LSTM model I am using: ...
Vinh Nguyen's user avatar
0 votes
0 answers
17 views

Given the total cost of a graph walk, how to estimate the cost of each edge?

I have a real-world problem in which I have a collection of nodes and their edges. This collection is composed of hundreds of nodes and thousands of connections. Then I have about 10 K datapoints each ...
Althis's user avatar
  • 123
4 votes
3 answers
247 views

What is best package for convex optimization?

I have a set of problems of the form $\text{min} \|Ax-y\|_1$ with some constraints on the $x_i$. A quick search turns up the cvxpy, ...
Edmund's user avatar
  • 757

15 30 50 per page
1
2 3 4 5
35