Motivation

Dataclasses in Python were introduced to simplify the creation of classes that are primarily used to store data. They address several common pain points in object-oriented programming:

Object Equality

Value vs Reference Semantics

Traditional Python classes use reference semantics1 by default:

class Potato():
    def __init__(self, weight):
        self.weight = weight

first = Potato(100)
second = Potato(100)

print(first == second)
Execution Result
$ python classes_reference.py
False

That makes comparing different objects of the class structurally hard. You need to override __eq__ method and make it work as you’d expect.

On the other hand, with dataclasses, one would get value semantics2 on equality checks:

from dataclasses import dataclass

@dataclass
class Potato():
    weight: int

first = Potato(100)
second = Potato(100)

print(first == second)
Execution Result
$ python data_classes_value.py
True

Structural Equality

Boilerplate Reduction

Copying Objects

Features

Use Cases

  1. Data Transfer Objects (DTOs)
  2. Configuration objects
  3. Immutable data structures (when using frozen=True)
  4. API responses and requests
  5. Representing database records
  6. Complex number systems or mathematical objects
  7. Game state objects

When to Avoid Them

  1. When you need more control over attribute access (use properties or descriptors instead)
  2. For classes with many methods and little data (traditional classes may be more appropriate)
  3. When you need to maintain backwards compatibility with older Python versions (< 3.7)
  4. If your class requires a complex __init__ method with extensive logic
  5. When working with mutable collections as default values (can lead to unexpected behavior)
  6. If you need to implement a custom __new__ method
  7. For very large classes with hundreds of attributes (may impact performance)

Remember, while dataclasses are powerful, they’re not a one-size-fits-all solution. Always consider your specific use case and requirements when deciding whether to use a dataclass or a traditional class.


  1. In reference semantics:

    • Variables hold references (memory addresses) to objects, not the objects themselves.
    • When you assign an object to a variable, you’re creating a new reference to that object, not copying the object.
    • When you pass an object to a function or method, you’re passing a reference to the object, not a copy of it.
    • Comparing two variables with == checks if they refer to the same object in memory, not if the objects have the same content.
    • Modifying an object through one reference affects all other references to that object.
     ↩︎
  2. value semantics is more intuitive, when comparing two objects of a dataclass, it would compare them structurally. ↩︎