3. Dataclasses || pepegar.com

Motivation

Dataclasses in Python were introduced to simplify the creation of classes that are primarily used to store data. They address several common pain points in object-oriented programming:

Object Equality

Value vs Reference Semantics

Traditional Python classes use reference semantics¹ by default:

class Potato():
    def __init__(self, weight):
        self.weight = weight

first = Potato(100)
second = Potato(100)

print(first == second)

Execution Result

$ python classes_reference.py
False

That makes comparing different objects of the class structurally hard. You need to override __eq__ method and make it work as you’d expect.

On the other hand, with dataclasses, one would get value semantics² on equality checks:

from dataclasses import dataclass

@dataclass
class Potato():
    weight: int

first = Potato(100)
second = Potato(100)

print(first == second)

Execution Result

$ python data_classes_value.py
True

Structural Equality

Dataclasses automatically generate __eq__ method
Equality is based on the values of the fields, not object identity
This behavior aligns with how we intuitively expect data objects to behave

Boilerplate Reduction

Eliminates the need to write repetitive __init__, __repr__, and __eq__ methods
Reduces the chance of errors in these common methods
Makes code more readable and maintainable

Copying Objects

Provides easy ways to create shallow and deep copies of objects
Useful for creating variations of existing objects without modifying the original

Features

Automatic generation of special methods: __init__, __repr__, __eq__
Customizable field options: default values, frozen instances, etc.
Post-init processing with __post_init__ method
Integration with type hinting
Support for inheritance
Easy conversion to/from dictionaries
Customizable comparison methods (__lt__, __le__, etc.)

Use Cases

Data Transfer Objects (DTOs)
Configuration objects
Immutable data structures (when using frozen=True)
API responses and requests
Representing database records
Complex number systems or mathematical objects
Game state objects

When to Avoid Them

When you need more control over attribute access (use properties or descriptors instead)
For classes with many methods and little data (traditional classes may be more appropriate)
When you need to maintain backwards compatibility with older Python versions (< 3.7)
If your class requires a complex __init__ method with extensive logic
When working with mutable collections as default values (can lead to unexpected behavior)
If you need to implement a custom __new__ method
For very large classes with hundreds of attributes (may impact performance)

Remember, while dataclasses are powerful, they’re not a one-size-fits-all solution. Always consider your specific use case and requirements when deciding whether to use a dataclass or a traditional class.

In reference semantics:
- Variables hold references (memory addresses) to objects, not the objects themselves.
- When you assign an object to a variable, you’re creating a new reference to that object, not copying the object.
- When you pass an object to a function or method, you’re passing a reference to the object, not a copy of it.
- Comparing two variables with == checks if they refer to the same object in memory, not if the objects have the same content.
- Modifying an object through one reference affects all other references to that object.
↩︎
value semantics is more intuitive, when comparing two objects of a dataclass, it would compare them structurally. ↩︎

3. Dataclasses

Table of contents