How Python Lists Work Internally

Posted by Afsal on 27-Dec-2024

Hi Pythonistas!

Python lists are one of the most versatile and commonly used data structures. While they may appear straightforward, there’s a lot happening under the hood that makes them efficient yet flexible. Let’s explore how Python lists work internally.

What is a Python List?
A list in Python is an ordered, mutable, heterogeneous collection of items.

code

my_list = [1, "hello", 3.14]


Ordered: Items maintain the order in which they are added.
Mutable: You can change the contents of a list (add, remove, modify).
Heterogeneous: Lists can contain elements of different data types.

The Internal Representation of a Python List
Under the hood, Python lists are implemented as dynamic arrays:A list is essentially a contiguous block of memory that stores references (pointers) to objects. This allows for random access (constant-time retrieval of elements via an index).
Here’s an example visualization:

my_list = [10, "Python", 3.14]
Memory (list): [ref1, ref2, ref3]
Memory (objects): 
ref1 -> 10
ref2 -> "Python"
ref3 -> 3.14


Each element in the list is stored as a reference (pointer) to the actual object in memory, not the object itself. This allows lists to store objects of different types.

Dynamic Resizing

Initial Allocation
When a list is created, Python allocates a certain amount of memory (e.g., space for 4 elements)

my_list = []  # An empty list, but memory is reserved for a few elements.

 Adding Elements
As elements are added (e.g., with append()), Python fills the pre-allocated space

my_list.append(1)
my_list.append(2)


If the list exceeds its current capacity, Python:Allocates a new, larger block of memory (with additional unused space).
Copies references from the old memory block to the new one.
This process is called dynamic resizing.

Growth Factor

To minimize frequent resizing, Python uses a growth factor:The size of the new array is roughly 1.125x the current size (implementation-specific). This ensures that resizing happens less frequently as the list grows.

Time Complexity of List Operations

Access (list[i]) O(1) Direct access to elements using an index.
Append (list.append) Amortized O(1) Resizing may occasionally make this O(n).
Insert (list.insert) O(n) Shifts elements to make space for the new one.
Remove (list.remove) O(n) Searches and shifts elements after removal.
Pop (list.pop()) O(1) or O(n) O(1) if popping from the end, O(n) for others.

Heterogeneous Items: Since lists store references to objects, they can hold objects of any type.
Variable Size: Unlike arrays in lower-level languages (like C), Python lists are not fixed in size. They grow dynamically as elements are added.

Compactness: Lists in Python are memory-efficient because they store references rather than duplicating data.

>>> import sys
>>> my_list = []
>>> sys.getsizeof(my_list)
56
>>> my_list.append(1)
>>> sys.getsizeof(my_list)
88
>>> my_list.append(1)
>>> sys.getsizeof(my_list)
88
>>> my_list.append(2)
>>> sys.getsizeof(my_list)
88
>>> my_list.append(3)
>>> sys.getsizeof(my_list)
88
>>> 

Dynamic Resizing: Lists grow dynamically without the programmer worrying about memory allocation

Random Access: Because lists are backed by arrays, element access by index is very fast (O(1)).Versatility:

Lists are a Swiss army knife in Python, suitable for most scenarios where an ordered, mutable collection is needed.

When to Avoid Python Lists?
While lists are powerful, they may not always be the best choice

For Numerical Computations: Use NumPy arrays for better performance.
For Fast Membership Testing: Use sets.
For Fixed-Size Collections: Use tuples.

Key Takeaways

Python lists are implemented as dynamic arrays that store references to objects.
They grow dynamically using a growth factor (~1.125x) to minimize resizing costs.
Lists support fast random access (O(1)) but slower insertion and deletion in the middle (O(n)).
Their versatility makes them a cornerstone of Python programming.
Understanding how lists work internally allows you to use them more efficiently and make informed decisions about alternative data structures.