13
votes
Accepted
Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?
There are many contributing factors to the abundance of research based on BERT vs the research based on Llama:
Age: BERT has been around for far longer than Llama (2018 vs 2023), so it has more ...
6
votes
Accepted
Need an advice on research topic
You're probably aiming too high: a research topic doesn't have to lead to a major breakthrough, and very often it's impossible to know what it leads to before doing it. A Master thesis is not very ...
4
votes
Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?
Adding/complementing the other answers, BERT gives the possibility to access/obtain the embeddings of the fed input (which wasn't and still isn't the case of some other models).
The embeddings are ...
3
votes
Is machine learning successful in solving combinatorial optimisation problems in NP-hard? Discuss problem of scheduling using machine learning
You don't have to get to EXP-complete in order to get a hard problem.
NP-Complete is bad enough...
Cryptographic assumptions (e.g., the existence of one way functions) are also a good way to create ...
3
votes
Accepted
How do researchers actually code novel architectures and layers?
In this particular case, I don't know how are they implementing these complex layers, but in Keras/TensorFlow you can define your own layers by inheriting from ...
3
votes
Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?
Although LLM's like GPT-3 and LLAMA have gain public attention due to marketing, BERT is the foundation of all Large Language Models being open-source and the first one to base on transformer ...
2
votes
What is the loss function defined by Mnih and Hinton in their paper “Learning to Label Aerial Images from Noisy Data”?
Section 3.3 simply gives the equation for the negative log-likelihood.
They say that it takes the form of cross-entropy (because it just looks like a cross-entropy equation, perhaps?), but ...
2
votes
Accepted
Classification training using probabilites and not raw classes (factors)
Beta Regression
You could use beta-regression. I have no practical experience with this type of regression. However, it might be the right method for your task. As far as I understand, the link ...
2
votes
Will the job of Data Science is going to be at risk?
I highly doubt that the job itself will going to be at risk. It is rather the other way around: Data science and machine learning will replace a lot of other jobs. In the end there, at least, always ...
2
votes
Actual problems in Data Science/Machine Learning connected with music
You could begin your journey by picking up one of the topics present in the magenta project started by google. It's open-source so you could pick up some great ideas there.
Link to Magenta: https://...
2
votes
For a student who is a beginner in quantitative research and statistics, which is the better statistical tool to start: R or IBM SPSS? Why?
I suggest to use R since it is open source and very powerful and thus is used by many companies and researchers. R does not only allow to deal with large amounts of data, it also allows to do state-of-...
2
votes
How to determine the abnormality of a specific variable by taking into account all the other variables in the data?
If you want to focus on the outliers wrt the class, you can do as follows:
Using Isolation Forest
...
2
votes
Accepted
Where can I find the applied data science research papers?
If you're looking for conferences that focus on applied data science and have a high ranking, there are several options you can consider. While it's true that some conferences may have a more ...
2
votes
Where can I find the applied data science research papers?
I think the Open Data Science Conference ODSC is what you are looking for - industry leaders present some tools that they use in their work. The ones you've listed are research oriented, the material ...
2
votes
Accepted
Would adding Elastic Net as an additional Benchmark add any value when LASSO is already an included benchmark?
No, the reason is because Elastic Net as a cross between the L1 and L2 norms, would only ever select a subset of the variables that LASSO would select, or if the penalty is extremely close to having ...
1
vote
Accepted
Data Mining of unresearched data for a master's degree final project
If it's business oriented, there are many "business Wikipedia" type websites that have lots of data presented in the same format on each page, which will make them a lot simpler to scrape. ...
1
vote
The ideal function in R for fit fitting n LASSO Regressions on n data sets
A common choice would be the glmnet package which has "extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression."
Within the ...
1
vote
Resources for Promotion/Demotion Strategies for ML Item Recommendation Systems?
One option is what I did on a project for an e-commerce website. Items were recommended based on similarity scores to other items. The similarity scores were based on item embeddings. People could pay ...
1
vote
Accepted
Which specific AWS service to use for running Benchmark Regressions on datasets far too large to run locally on my laptop
One option is creating a EC2 instance. Choose a prebuilt AMI (Amazon Machine Image) with RStudio Server installed.
1
vote
Zero-shot learning for tabular data?
Zero-shot learning is a type of machine learning that allows a model to make predictions on previously unseen data. It is typically used in natural language processing and computer vision tasks, where ...
1
vote
Accepted
How to conclude the generality of any classification methods?
You're experiencing an unfortunately common issue with the current state of system/model evaluation. In addition to evaluating on different datasets, authors often leave out important details, such as ...
1
vote
Accepted
How do we know a neural network test accuracy is good enough when results vary with different runs?
So, the question is about how to report test accuracy, etc, when you see variation over executions.
As @Nikos M. has eluded to, you typically train and test the model at least 3 times and then report ...
1
vote
Accepted
In "Attention Is All You Need", why are the FFNs in (2) the same as two convolutions with kernel size 1?
A position-wise feedforward layer is just a matrix multiplication plus the addition of a bias vector for each position along the time dimension. You can express this as a 1D convolution of kernel size ...
1
vote
Accepted
1
vote
Accepted
Previous work Replication and Research ethics Ask Question
I wouldn't think so. If they're publishing their methodology, they want other people to see how well it works and apply it to their work. You'll probably want to explain why you think this method ...
1
vote
Accepted
T-DBSCAN - Implementing STOP logic
Just to point out a minor confusion that there seems to be in the wording: there is mixed use of the the words temporarily and temporally. [OP has since corrected this]
We really only care about the ...
1
vote
Classification training using probabilites and not raw classes (factors)
What you are describing is just the cross entry loss (also known as relative entropy or kullback-leibler divergence). If you have target probabilities that are one-hot you get the NLL form of it that ...
1
vote
Mapping between original feature space and an interpretable feature space
Welcome to the community @iaaml! I hope I understood the concept right by briefly going through your reference. This is my impression:
in 3.1, they say
For example, a possible interpretable ...
1
vote
Accepted
How can there be more true positive than positive?
Of course it is impossible to have a higher than $100\%$ true positive rate in your results.
The author does explain how he arrives at his results, but I must admit it is not at all obvious. In the ...
1
vote
Accepted
Will data science develop into a separate academic field?
At the risk of showing confirmation bias, yes, data science will become it's own field. The primary argument would be an economic one, data science enables a larger return on investment per resources ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
research × 70machine-learning × 25
deep-learning × 13
neural-network × 7
dataset × 6
r × 6
feature-selection × 5
data-mining × 4
statistics × 4
decision-trees × 4
bigdata × 4
classification × 3
nlp × 3
data × 3
mathematics × 3
lasso × 3
tensorflow × 2
time-series × 2
regression × 2
cnn × 2
predictive-modeling × 2
visualization × 2
data-science-model × 2
computer-vision × 2
text-mining × 2