3
$\begingroup$

Let's assume I have a column with float values (e.g., 3.12334354454, 5.75434331354, and so on). If I round these values to two decimal places (e.g., 3.12, 5.75),

I think the advantages and disadvantages of doing this would be:

Advantages:

  • Less memory required: Rounded values take up less space.

  • Smoother and faster calculations: Simplified values can speed up processing.

  • Better segregation: Rounded values might help in categorizing data more effectively. (Not sure).

Disadvantages:

  • Loss of information: Precision is reduced, which can be critical in some use cases.

Please provide your thoughts.

$\endgroup$

4 Answers 4

5
$\begingroup$

On top of the advantages you mentioned, rounding can also prevent overfitting. E.g., if you have a value of 1.234567 with target 0 and a value of 1.234568 with target 1, a tree model might want to learn a boundary at the middle point 1.2345675. If you round the model will learn 0.5 for 1.23 which might be more realistic.

An easy way to better see this is to consider the rounding operation as adding some triangular noise (not so far of gaussian noise).

As a side note, just be aware that in some language the operation of rounding will not change data type. Typically in python rounding will not reduce memory use, you have to change data type to reduce memory use. E.g. float32 to float16. After that the memory gain will also depends on how your ML pipeline handle datatypes.

$\endgroup$
1
  • $\begingroup$ The disadvantage "loses information" is exactly the same as the advantage of "reduces overfitting", <something something bias-variance tradeoff>. $\endgroup$
    – Ben Reiniger
    Commented 18 mins ago
3
$\begingroup$

The numeric precision is only reduced, if you round to a smaller number of digits than your input data really has.(It is not a waste of time to think about if the given decimals are realistic) Otherwise your many digits are just mathematical artifacts.

If you really loose significant information if you drop the least sign. decimal, then you should probably think about your scaling. That does not sound like a robust model.

$\endgroup$
2
$\begingroup$

Less memory required: Rounded values take up less space.

No. You reduced precision, the numbers are still Float, same number of bits in memory. Rounded means Float -> Integer, that would indeed reduce memory. E.g. 3.12 -> 3.

Smoother and faster calculations: Simplified values can speed up processing.

Not if you still use Float. True if you use Integers. More suitable to hardware accelerators.

Better segregation: Rounded values might help in categorizing data more effectively.

As suggested by others, adding a bit of noise to your training data could help with generalisation and preventing overfitting.

Loss of information: Precision is reduced, which can be critical in some use cases.

True, could be critical, especially if you perform true rounding (to int types).

Note. Using int types (a) is a valid technique, especially in inference, on embedded platforms (e.g. mobile phone - standalone, no cloud). Even when initial training (b) of the model was done with Float (in a much powerful environment, data centres multiplrocessing, network, cloud).

(a) even int types with a very reduced number of bits, e.g. 8-bit int, or even less.

(b) might need some post-training adaptation.

New contributor
adsp42 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$
2
$\begingroup$

As mentioned in other answers, rounding a 32-bit float to another 32-bit float will not save any memory. All 32-bit floats use 32 bits of storage.

Another thing to consider is that decimal rounding can be problematic with floating point values. The reason is that there's no exact representation (for example) of 0.1 in floating point. A floating-point number is binary. The fractions that can be represented exactly with floating point are 0.5, 1/4, 1/8, 1/16, etc. and sums of those values. The decimal value 0.1 is a repeating value in this scheme. So, if you round 0.125 to 0.1, you've actually taken an exact value that is easily represented as a float and turned it into non-exact value which uses all places available in the float. In other words, decimal rounding may increase the places used in a floating value.

If you store your data as CSV or some other textual value, rounding will reduce the space required for that format. This may be useful, especially if you end up with false precision when converting from floats to decimals.

And lastly, while I don't think it's a common situation, you should be aware of the 'butterfly effect' which was coined by someone who found that rounding values used in one of their models caused it to produce wildly different results:

This was enough to tell me what had happened: the numbers that I had typed in were not the exact original numbers, but were the rounded-off values that had appeared in the original printout. The initial round-off errors were the culprits; they were steadily amplifying until they dominated the solution.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.