Consider this iterable:
it = (1, 2, 3, 0, 4, 5, 6, 7)
Let’s say you want to build another iterable that includes only the numbers that appear
starting from the element 0
. Usually, I’d do this:
# This returns (0, 4, 5, 6, 7).
from_zero = tuple(elem for idx, elem in enumerate(it) if idx >= it.index(0))
While this is quite terse and does the job, it won’t work with a generator. There’s an even
more generic and terser way to do the same thing with itertools.dropwhile
function. Here’s
how to do it:
from itertools import dropwhile
# This returns the same thing as before (0, 4, 5, 6, 7).
from_zero = tuple(dropwhile(lambda x: x != 0, it))
Here, itertools.dropwhile
is a generator function that returns elements from an iterable
starting from the first element for which the predicate returns False
. The predicate is a
function that takes one argument and returns a boolean value.
The dropwhile
function takes two arguments:
- A function (the predicate), which takes one argument and returns a boolean value.
- An iterable, which can be any object that can be iterated over, such as a list, tuple, string, or even another generator.
The dropwhile
function starts iterating over the elements of the iterable, and drops the
elements for which the predicate returns True
. It then returns all the remaining elements
of the iterable, regardless of whether they satisfy the condition or not.
Apart from being concise, this implementation is more generic and can be used for other purposes like skipping the header lines in a file. For example:
from itertools import dropwhile
with open("/etc/passwd") as f:
for line in dropwhile(lambda x: x.startswith("#"), f):
print(line)
This will print all the lines from the /etc/passwd
file after the header comments:
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
...
Finally, let’s see how you can skip straight to the data rows in a CSV file that contains arbitrary comments and headers like this:
# persons.csv
This is a comment
These are some other comments
The fake header starts from the next line
id,name,age,height
The real header starts from here
ID,Name,Age,Height
1,John,20,1.8
2,Jane,21,1.7
3,Jack,22,1.6
import csv
from itertools import dropwhile
with open("persons.csv", "r") as f:
reader = csv.DictReader(f, fieldnames=("ID", "Name", "Age", "Height"))
# Rows without comments.
rows = dropwhile(lambda x: x["ID"] != "ID", reader)
# Skip the header.
next(rows)
for row in rows:
print(row)
Running this will give you the dicts containing the data rows only:
{'ID': '1', 'Name': 'John', 'Age': '20', 'Height': '1.8'}
{'ID': '2', 'Name': 'Jane', 'Age': '21', 'Height': '1.7'}
{'ID': '3', 'Name': 'Jack', 'Age': '22', 'Height': '1.6'}
Recent posts
- SSH saga
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Quicker startup with module-level __getattr__
- Docker mount revisited
- Topological sort
- Writing a circuit breaker in Go
- Discovering direnv
- Notes on building event-driven systems
- Bash namerefs for dynamic variable referencing