The Mystery of NaN Comparison

Hi Pythonistas!

When working with floating-point numbers in Python (or most programming languages), encountering NaN—short for "Not a Number"—can lead to puzzling results, especially when it comes to comparisons.

What is NaN?

NaN represents undefined or unrepresentable numerical results. Examples include:

Dividing 0.0 by 0.0.
Taking the square root of a negative number (in real numbers).
Operations resulting from invalid computations like math.log(-1).

>>> import numpy as np
>>> 
>>> np.float64(0.0) / np.float64(0.0)
<console>:1: RuntimeWarning: invalid value encountered in scalar divide
np.float64(nan)
>>> np.sqrt(-1)
<console>:1: RuntimeWarning: invalid value encountered in sqrt
np.float64(nan)
>>> np.log(-1)
<console>:1: RuntimeWarning: invalid value encountered in log
np.float64(nan)
>>>

In Python, NaN is defined in the IEEE 754 floating-point standard and is available through modules like math or numpy:

code

import math

nan = math.nan

The Mystery Begins

The most intriguing property of NaN is its comparison behavior. Unlike normal numbers:

NaN is not equal to itself!

NaN is neither greater than nor less than any value, including itself.

code

>>> import math
>>> 
>>> nan = math.nan
>>> nan
nan
>>> nan == nan
False
>>> nan > nan
False
>>> nan < nan
False

Why does this happen?

The IEEE 754 standard specifies that NaN is used to signify that a value is undefined or unrepresentable. Comparing such values doesn’t make logical sense, so all comparisons involving NaN are designed to return False.

Identifying NaN

Since NaN != NaN, checking for NaN requires a special function:

Using math.isnan:

code

>>> import math
>>> nan = math.nan
>>> math.isnan(nan)
True
>>>

With NumPy (if you're using arrays or advanced numerical computations):

code

>>> import numpy as np
>>> nan = np.nan
>>> np.isnan(nan)
np.True_
>>>

Common Pitfalls

Silent Propagation: Any operation involving NaN results in NaN. For example:

code

>>> result = nan+1
>>> result
nan
>>>

Incorrect Checks: Developers often mistakenly check for equality if value == math.nan is always False. Instead, always use math.isnan.

Practical Applications

While NaN can be frustrating, it is incredibly useful in real-world scenarios:

Missing Data: In datasets, NaN is often used to represent missing or invalid entries.

Guarding Against Invalid Operations: Propagation of NaN ensures that calculations involving invalid data don’t silently return incorrect results.

NaN is mysterious but logical when you understand its purpose. It reminds us that not every value can be quantified or compared. The next time you find yourself baffled by NaN != NaN, take a moment to appreciate the quirks of floating-point arithmetic!