If you’ve ever used the pandas library in Python, you probably know that there are two ways to select a Series (meaning a column) from a DataFrame:
# dot notation df.col_name # bracket notation df['col_name']
Which method should you use? I’ll make the case for each, and then you can decide…
Why use bracket notation?
The case for bracket notation is simple: It always works.
Here are the specific cases in which you must use bracket notation, because dot notation would fail:
# column name includes a space df['col name'] # column name matches a DataFrame method df['count'] # column name is stored in a variable var = 'col_name' df[var] # new column is created through assignment df['new'] = 0
In other words, bracket notation always works, whereas dot notation only works under certain circumstances. That’s a pretty compelling case for bracket notation!
As stated in the Zen of Python:
There should be one– and preferably only one –obvious way to do it.
Why use dot notation?
If you’ve watched any of my pandas videos, you may have noticed that I use dot notation. Here are four reasons why:
Reason 1: Dot notation is easier to type
Dot notation is three fewer characters to type than bracket notation. And in terms of finger movement, typing a single period is much more convenient than typing brackets and quotes.
This might sound like a trivial reason, but if you’re selecting columns dozens (or hundreds) of times a day, it makes a real difference!
Reason 2: Dot notation is easier to read
Most of my pandas code is a made up of chains of selections and methods. By using dot notation, my code is mostly adorned with periods and parentheses (plus an occasional quotation mark):
# dot notation df.col_one.sum() df.col_one.isna().sum() df.groupby('col_two').col_one.sum()
If you instead use bracket notation, your code is adorned with periods and parentheses plus lots of brackets and quotation marks:
# bracket notation df['col_one'].sum() df['col_one'].isna().sum() df.groupby('col_two')['col_one'].sum()
I find the dot notation code easier to read, as well as more aesthetically pleasing.
Reason 3: Dot notation is easier to remember
With dot notation, every component in a chain is separated by a period on both sides. For example, this line of code has 4 components, and thus there are 3 periods separating the individual components:
# dot notation df.groupby('col_two').col_one.sum()
If you instead use bracket notation, some of your components are separated by periods, and some are not:
# bracket notation df.groupby('col_two')['col_one'].sum()
With bracket notation, I often forget whether there’s supposed to be a period before
['col_one'], or both before and after
With dot notation, it’s easier for me to remember the correct syntax.
Reason 4: Dot notation limits the usage of brackets
Brackets can be used for many purposes in pandas:
df[['col_one', 'col_two']] df.iloc[4, 2] df.loc['row_label', 'col_one':'col_three'] df.col_one['row_label'] df[(df.col_one > 5) & (df.col_two == 'value')]
If you also use bracket notation for Series selection, you end up with even more brackets in your code:
df['col_one']['row_label'] df[(df['col_one'] > 5) & (df['col_two'] == 'value')]
As you use more brackets, each bracket becomes slightly more ambiguous as to its purpose, imposing a higher mental burden on the person reading the code. By using dot notation for Series selection, you reduce bracket usage to only the essential cases.
If you prefer bracket notation, then you can use it all of the time! However, you still have to be familiar with dot notation in order to read other people’s code.
If you prefer dot notation, then you can use it most of the time, as long as you are diligent about renaming columns when they contains spaces or collide with DataFrame methods. However, you still have to use bracket notation when creating new columns.
Which do you prefer? Let me know in the comments!