Skip to main content
13 votes
Accepted

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

There are many contributing factors to the abundance of research based on BERT vs the research based on Llama: Age: BERT has been around for far longer than Llama (2018 vs 2023), so it has more ...
noe's user avatar
  • 28k
6 votes
Accepted

Need an advice on research topic

You're probably aiming too high: a research topic doesn't have to lead to a major breakthrough, and very often it's impossible to know what it leads to before doing it. A Master thesis is not very ...
Erwan's user avatar
  • 26.2k
4 votes

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

Adding/complementing the other answers, BERT gives the possibility to access/obtain the embeddings of the fed input (which wasn't and still isn't the case of some other models). The embeddings are ...
ahmed_khan_89's user avatar
3 votes

Is machine learning successful in solving combinatorial optimisation problems in NP-hard? Discuss problem of scheduling using machine learning

You don't have to get to EXP-complete in order to get a hard problem. NP-Complete is bad enough... Cryptographic assumptions (e.g., the existence of one way functions) are also a good way to create ...
DaL's user avatar
  • 2,663
3 votes
Accepted

How do researchers actually code novel architectures and layers?

In this particular case, I don't know how are they implementing these complex layers, but in Keras/TensorFlow you can define your own layers by inheriting from ...
alexmolas's user avatar
  • 566
3 votes

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

Although LLM's like GPT-3 and LLAMA have gain public attention due to marketing, BERT is the foundation of all Large Language Models being open-source and the first one to base on transformer ...
Hithesh Jay's user avatar
2 votes

What is the loss function defined by Mnih and Hinton in their paper “Learning to Label Aerial Images from Noisy Data”?

Section 3.3 simply gives the equation for the negative log-likelihood. They say that it takes the form of cross-entropy (because it just looks like a cross-entropy equation, perhaps?), but ...
n1k31t4's user avatar
  • 15.4k
2 votes
Accepted

Classification training using probabilites and not raw classes (factors)

Beta Regression You could use beta-regression. I have no practical experience with this type of regression. However, it might be the right method for your task. As far as I understand, the link ...
Peter's user avatar
  • 7,866
2 votes

Will the job of Data Science is going to be at risk?

I highly doubt that the job itself will going to be at risk. It is rather the other way around: Data science and machine learning will replace a lot of other jobs. In the end there, at least, always ...
iShazook's user avatar
2 votes

Actual problems in Data Science/Machine Learning connected with music

You could begin your journey by picking up one of the topics present in the magenta project started by google. It's open-source so you could pick up some great ideas there. Link to Magenta: https://...
Academic's user avatar
  • 482
2 votes

For a student who is a beginner in quantitative research and statistics, which is the better statistical tool to start: R or IBM SPSS? Why?

I suggest to use R since it is open source and very powerful and thus is used by many companies and researchers. R does not only allow to deal with large amounts of data, it also allows to do state-of-...
Peter's user avatar
  • 7,866
2 votes

How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

If you want to focus on the outliers wrt the class, you can do as follows: Using Isolation Forest ...
Multivac's user avatar
  • 3,139
2 votes
Accepted

Where can I find the applied data science research papers?

If you're looking for conferences that focus on applied data science and have a high ranking, there are several options you can consider. While it's true that some conferences may have a more ...
Mahmood Mohajer's user avatar
2 votes

Where can I find the applied data science research papers?

I think the Open Data Science Conference ODSC is what you are looking for - industry leaders present some tools that they use in their work. The ones you've listed are research oriented, the material ...
Stefan Popov's user avatar
2 votes
Accepted

Would adding Elastic Net as an additional Benchmark add any value when LASSO is already an included benchmark?

No, the reason is because Elastic Net as a cross between the L1 and L2 norms, would only ever select a subset of the variables that LASSO would select, or if the penalty is extremely close to having ...
Marlen's user avatar
  • 167
1 vote
Accepted

Data Mining of unresearched data for a master's degree final project

If it's business oriented, there are many "business Wikipedia" type websites that have lots of data presented in the same format on each page, which will make them a lot simpler to scrape. ...
lexan55's user avatar
  • 36
1 vote

The ideal function in R for fit fitting n LASSO Regressions on n data sets

A common choice would be the glmnet package which has "extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression." Within the ...
Brian Spiering's user avatar
1 vote

Resources for Promotion/Demotion Strategies for ML Item Recommendation Systems?

One option is what I did on a project for an e-commerce website. Items were recommended based on similarity scores to other items. The similarity scores were based on item embeddings. People could pay ...
Brian Spiering's user avatar
1 vote
Accepted

Which specific AWS service to use for running Benchmark Regressions on datasets far too large to run locally on my laptop

One option is creating a EC2 instance. Choose a prebuilt AMI (Amazon Machine Image) with RStudio Server installed.
Brian Spiering's user avatar
1 vote

Zero-shot learning for tabular data?

Zero-shot learning is a type of machine learning that allows a model to make predictions on previously unseen data. It is typically used in natural language processing and computer vision tasks, where ...
Pluviophile's user avatar
  • 4,143
1 vote
Accepted

How to conclude the generality of any classification methods?

You're experiencing an unfortunately common issue with the current state of system/model evaluation. In addition to evaluating on different datasets, authors often leave out important details, such as ...
primussucks's user avatar
1 vote
Accepted

How do we know a neural network test accuracy is good enough when results vary with different runs?

So, the question is about how to report test accuracy, etc, when you see variation over executions. As @Nikos M. has eluded to, you typically train and test the model at least 3 times and then report ...
shepan6's user avatar
  • 1,488
1 vote
Accepted

In "Attention Is All You Need", why are the FFNs in (2) the same as two convolutions with kernel size 1?

A position-wise feedforward layer is just a matrix multiplication plus the addition of a bias vector for each position along the time dimension. You can express this as a 1D convolution of kernel size ...
noe's user avatar
  • 28k
1 vote
Accepted

Is there a reference data set for ridge regression?

Try using the Boston house data set. ...
Derek O's user avatar
  • 354
1 vote
Accepted

Previous work Replication and Research ethics Ask Question

I wouldn't think so. If they're publishing their methodology, they want other people to see how well it works and apply it to their work. You'll probably want to explain why you think this method ...
m13op22's user avatar
  • 399
1 vote
Accepted

T-DBSCAN - Implementing STOP logic

Just to point out a minor confusion that there seems to be in the wording: there is mixed use of the the words temporarily and temporally. [OP has since corrected this] We really only care about the ...
n1k31t4's user avatar
  • 15.4k
1 vote

Classification training using probabilites and not raw classes (factors)

What you are describing is just the cross entry loss (also known as relative entropy or kullback-leibler divergence). If you have target probabilities that are one-hot you get the NLL form of it that ...
BookYourLuck's user avatar
1 vote

Mapping between original feature space and an interpretable feature space

Welcome to the community @iaaml! I hope I understood the concept right by briefly going through your reference. This is my impression: in 3.1, they say For example, a possible interpretable ...
Kasra Manshaei's user avatar
1 vote
Accepted

How can there be more true positive than positive?

Of course it is impossible to have a higher than $100\%$ true positive rate in your results. The author does explain how he arrives at his results, but I must admit it is not at all obvious. In the ...
JahKnows's user avatar
  • 9,086
1 vote
Accepted

Will data science develop into a separate academic field?

At the risk of showing confirmation bias, yes, data science will become it's own field. The primary argument would be an economic one, data science enables a larger return on investment per resources ...
davmor's user avatar
  • 191

Only top scored, non community-wiki answers of a minimum length are eligible