Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (2023)

[This article was first published onKeyword: r - Appsilon | Awesome Enterprise R dashboards, and kindly contributed to itR-Blogger]. (Problems with content can be reported on this pageHere)

Want to share your content on R bloggers?Click hereif you have a blog, orHereonly if.

Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (1)

If you've played around with dating enough, you're bound to hit some popular dead ends. There are many, from misspelled site names and addresses to placeholder values ​​accidentally entered into the data line. One of the most frustrating things, if not the biggest pain point, is riskround numbers.

In most cases, you won't get what you expect, no matter what language you're working with. The good thing about a group like ours is that we don't just share our wins, but also the things that make us tear our hair out. This blog post was inspired by one such discussion.

Tired of checking data?Automate it! and create references to R and Shiny.

Hey, does this look good to you? The rounding problem in programmingRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (2)

Most things start with an innocuous mistake, or at least what appears to be. And so begins this story.

Round (0.5)

If you type the above into the R console, you would expect the standard math procedure unless you already know the full context. The result should be 1, so to speak, and when you press return, the screen will flash.

>> 0

But how? Something is definitely wrong here. To confirm this, let's try another example.

Krug (1.5) > 2

(Video) CS50 2022 - Lecture 6 - Python

And if you look closely, these two numbers have one property in common. They are even numbers. This is the first thing you'll learn when you learn rounding in R, even if you learn it the hard way:R rounds to parity.

But why round up to R? Who benefits?Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (3)

Tl;dr: You win and it's the only logical and deterministic way.

As it turns out, a lot. Rounding to even has its roots in the so-called standardIEC 60559. The standard requires rounding to the nearest even number. So Round(0.5) becomes 0, and even Round(-1.5) becomes -2. However, the standard is not operating system and error agnostic performance, which is where the second problem comes in, but we'll come back to that. First we must try to understand the logic behind this pattern.

Let's go to a piece of recent history and quote Greg Snow and his famous statement from 2008.

"The logic behind the round-to-even rule is that we're trying to represent an underlying continuous value, and if x comes from a truly continuous distribution, then the probability that x==2.5 is 0, and 2.5 was probably before rounding from anything between 2.45 and 2.54999999999999...if we use the rounding rule we learned in elementary school, then double rounding means that values ​​between 2.45 and 2.50 are rounded to 3 (since first rounded to 2.5). can result in an upward bias. To remove the bias, we must either go back to 2.5 before rounding (which is often impossible to the point of impractical), or simply round the half time and times down (or better, round proportionally to our probability). see values ​​below or above 2.5, which will be rounded to 2.5, but that will be close to 50/ 50 for most underlying allocations). The stochastic approach would consist of a rounding function that would randomly choose which way to round, but deterministic types aren't happy with that, so "round to even" (round to odd should work about the same) is chosen as a consistent rounding rule and down to about 50/50 .

If you are dealing with data where 2.5 is likely to represent an exact value (eg money), it may be better to multiply all values ​​by 10 or 100 and work with whole numbers, then , just convert again for the final print. Note that 2.50000001 rounds to 3. So if you keep multiple digits of precision in the final output, the rounding will go in the direction you expect, or you can add 0.000000001 (or some other small number) to your values ​​just before rounding, but it can skew your Estimates worse."(These)

Maybe we'll see a hands-on experiment.

We wanted to show you the effect of different rounding methods in R/python/javascript, butThey don't even spendDifferent methods from Round on Steam!

Are you using the RStudio IDE?Increase your productivity with our most popular shortcuts and tips!

Good luckJulyimplements different rounding methods and we can play with them.

An experiment, an attemptRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (4)

Let's take a large vector of a thousand random numbers from 0 to 1. Then round each number in this vector to 1 decimal place in three different ways: round to even (the default), round up, and round down downwards. Note that RoundUp corresponds to our school's rounding technique. In the end, we compare which waist is closeraverageoriginal vector.

using Random, Statisticsx = rand(MersenneTwister(0), 1_000)y1 = round.(x, digits=1)y2 = round.(x, RoundUp, digits=1)y3 = round.(x, RoundDown, digits=1 ). )

ResultsRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (5)

So what are the means?

mean(x), mean(y1), mean(y2), mean(y3) (0.5006018120380458, 0.5012000000000001, 0.5496999999999999, 0.4497000000000)

We see that the vector mean is even after roundingManycloser to the average of the original value, while rounding up or down gives the average valuediscount 10. Even rounding is a way of handling binding rounding in a deterministic (i.e. no randomness) way that has proven to be the simplest and most reliable, although it may seem strange at first.

(Video) How to ACTUALLY learn to code... 7 Roadmaps for 2023

But that's not the end!Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (6)

There's another problem with why R works the way it does, aside from the round-to-even rule. Another devil is at play here, and this is himfinal floating point precision.

Wait, that's a lot of words. Okay, let's go one by one. R only stores values ​​up to about 53 binary or about 22 floating point numbers. In other words, everything after that digit is lost and ignored. While this isn't a problem for a number as small as 0.5, it turns out to be a big hassle when the numbers are more precise, which just means there are more decimal places.

In general, this is not an R-specific problem, but the above limitations are specific to R. There is also a notoriousR FAQ-Fragededicated to her. The following quote is a key point in this response.

All other numbers are internally rounded to (typically) 53 binary digits. As a result, two floating-point numbers are not reliably equal unless they are calculated using the same algorithm, and even then not always.

So overall, unless the two numbers are processed in exactly the same way, it's impossible to say for sure how R will equate them. But you might be wondering how this relates to rounding?

If you have a number that exceeds 22 decimal places, you will see a representation of it that is not true because the precision is truncated.

For example:

> num <- 2,4999999999999999999999 > num[1] 2,5> round(num)[1] 2

If we write num here, we lose precision because the digits go beyond 22. However, if we reduce the number by 9, precision is preserved.

It is also associated with infamyProblem(for binary mode):

> 0.1 + 0.2 == 0.3[1] FALSE

(or when working with decimal numbers):

x = ⅓ = 0,33333,3*x = 0,99999,3x =0,99999 ≠1,

pod() i strop()Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (7)

Although great alternatives, floor() and ceiling() are often not preferred because they always round to an integer. It is often used to leave some decimal places intact. When we round, we often try to reduce the precision while keeping the representation of the digits we drop intact. These functions do not maintain it.

why not cut itRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (8)

Truncation is an option, of course, but if we truncate 1.25 and 1.21 to a decimal place, they would both be 1.2, and that wouldn't be an exact representation either. If you look closely, the truncation only rounds positive numbers down and negative numbers up. We have seen that it is biased.

Ok, I'll just use Python for my rounding needsRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (9)

It's all a bit much, isn't it? But life is rarely that simple when it comes to brass pins. Python is also not without its problems. Also no language, it's IEEE 754 standardRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (10).

(Video) DuckDB: Hi-performance SQL queries on pandas dataframe (Python)

However, as written in the standard, the process is not hardware/implementation independent.

It might be funny, but Python's built-in rounding procedure works differently than numpy's:

Σε [1]: importiere numpy als npIn [2]: np.round(0.15, 1)Out[2]: 0.2In [3]: round(0.15, 1)Out[3]: 0.1

Is this a problem? It usually isn't, but sometimes it can be. Of course, you will find application detailsin documentation.

So, let's move on to JavaScript and then all the mathRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (11)

JavaScript has a Math.round() method for rounding decimals. It also has Math.ceil() and Math.floor() methods. The Math.round() method rounds to the nearest integer. If the fractional part of the number is greater than or equal to 0.5, the argument is rounded to the nearest whole number. If the fractional part of the number is less than 0.5, the argument is rounded to the nearest whole number.

Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (12)

To round to a certain number of digits, a common solution is to divide the number by 10^x and then multiply the result by 10^x, where x is the number of digits to be rounded.

Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (13)

JavaScript seems to be more consistent with true numeric rounding.

So what should I do? I need logical rounding in R!Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (14)

First,Never use floating point numbers to represent numbers similar to moneyin the computer's memory. Use the exclusive decimal type if your language supports it (as inPythontheJava)or convert the money float to an integer by multiplying by a factor of 10 and generally avoid floats in these cases. For quantities you normally use floats for, this shouldn't be a problem. If this is the case, do not use floatsRounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (15).

If you're looking for a function that mimics real, logical rounding in R, you can choose this alternative we foundpacket overflow.

true_round <- function(number, digits) { posneg <- sign(number) number <- abs(number) * 10^digit <- number + 0.5 + sqrt(.Machine$double.eps) number <- trunc (number ) Nummer <- Nummer / 10 ^ Ziffern Nummer * posneg}

Another solution javascript could adopt is to multiply the number and divide the result by 10^x, where x is the number of digits to round. This isn't perfect and won't always give you the results you want, but it can work in some cases.

Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (16)

Note that the results in the first two examples are different, but the results in the last two examples are consistent.

(Video) Scientific Python: Past, Future and Present

If the request is to check for equality between decimal digits up to x decimal digits. Then we can simply take the difference and add the limit. And without curves. So:

As max(abs(y - x)) > thresholdthen x and y are not equal. For example:
Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (17)

Rounding up R, Julia and Python - is it over yet?Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (18)

Yes, and finally, in this article we looked at rounding tricks in R and other languages. All in all, things are messier than they appear on the surface. Choosing to make a language work in a certain way inevitably leads to bad results for some use cases. But the good thing about software is that if it doesn't work for you, there's always a way or at least some scope to fix it.

There's something unusual about rounding numbers in most languages, and it's important to keep this in mind so you can spot the usual suspect right away during your next analysis.Rounding in R: Common Data Collision Frustrations and Solutions in R, Julia, and Python (19)

Above all (pun intended), stay sharp. It's not the end of the world yet - it's just an inaccurate number that may one day cause it.

Big app running slow? Don't worry, you might have got ithard start with slow database.

This post first appeared on appsilon.com/blog/.

Connected

I am doingLeave a commentFor the author, follow the link and comment on his blog:Keyword: r - Appsilon | Awesome Enterprise R dashboards.

R-bloggers.comOffersdaily email updatesaRNews and tutorials toolearn Rand many other topics.Click here to post or find an R/Data Science job.

Want to share your content on R bloggers?Click hereif you have a blog, orHereonly if.

(Video) SLC-RUG January 2023 - Lightning Talks

Videos

1. CS50 2020 - Lecture 6 - Python
(CS50)
2. Learn Python by Building Five Games - Full Course
(freeCodeCamp.org)
3. Sunday Lightning Talks
(PyCon AU)
4. Python for linear algebra (for absolute beginners)
(Mike X Cohen)
5. Sarah Gibson - Sharing Reproducible Python Environments with Binder
(EuroPython Conference)
6. Python Tutorial 3 - Estimating Wind Chill Index using Functions and Dictionaries
(ProjectPythia)

References

Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated: 06/05/2023

Views: 6615

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.