Vladimir Iakolev: Analysing music habits with Spotify API and Python

Hero image

I’m using Spotify since 2013 as the main source of music, and back at that time the app automatically created a playlist for songs that I liked from artists’ radios. By innertion I’m still using the playlist to save songs that I like. As the playlist became a bit big and a bit old (6 years, huh), I’ve decided to try to analyze it.

Boring preparation

To get the data I used Spotify API and spotipy as a Python client. I’ve created an application in the Spotify Dashboard and gathered the credentials. Then I was able to initialize and authorize the client:

import spotipy import spotipy.util as util  token = util.prompt_for_user_token(user_id,                                    'playlist-read-collaborative',                                    client_id=client_id,                                    client_secret=client_secret,                                    redirect_uri='http://localhost:8000/') sp = spotipy.Spotify(auth=token) 

Tracks metadata

As everything is inside just one playlist, it was easy to gather. The only problem was that user_playlist method in spotipy doesn’t support pagination and can only return the first 100 track, but it was easily solved by just going down to private and undocumented _get:

playlist = sp.user_playlist(user_id, playlist_id) tracks = playlist['tracks']['items'] next_uri = playlist['tracks']['next'] for _ in range(int(playlist['tracks']['total'] / playlist['tracks']['limit'])):     response = sp._get(next_uri)     tracks += response['items']     next_uri = response['next']  tracks_df = pd.DataFrame([(track['track']['id'],                            track['track']['artists'][0]['name'],                            track['track']['name'],                            parse_date(track['track']['album']['release_date']) if track['track']['album']['release_date'] else None,                            parse_date(track['added_at']))                           for track in playlist['tracks']['items']],                          columns=['id', 'artist', 'name', 'release_date', 'added_at'] ) 
tracks_df.head(10) 
id artist name release_date added_at
0 1MLtdVIDLdupSO1PzNNIQg Lindstrøm & Christabelle Looking For What 2009-12-11 2013-06-19 08:28:56+00:00
1 1gWsh0T1gi55K45TMGZxT0 Au Revoir Simone Knight Of Wands – Dam Mantle Remix 2010-07-04 2013-06-19 08:48:30+00:00
2 0LE3YWM0W9OWputCB8Z3qt Fever Ray When I Grow Up – D. Lissvik Version 2010-10-02 2013-06-19 22:09:15+00:00
3 5FyiyLzbZt41IpWyMuiiQy Holy Ghost! Dumb Disco Ideas 2013-05-14 2013-06-19 22:12:42+00:00
4 5cgfva649kw89xznFpWCFd Nouvelle Vague Too Drunk To Fuck 2004-11-01 2013-06-19 22:22:54+00:00
5 3IVc3QK63DngBdW7eVker2 TR/ST F.T.F. 2012-11-16 2013-06-20 11:50:58+00:00
6 0mbpEDdZHNMEDll6woEy8W Art Brut My Little Brother 2005-10-02 2013-06-20 13:58:19+00:00
7 2y8IhUDSpvsuuEePNLjGg5 Niki & The Dove Somebody (drum machine version) 2011-06-14 2013-06-21 09:28:40+00:00
8 1X4RqFAShNL8aHfUIpjIVr Gorillaz Kids with Guns – Hot Chip Remix 2007-11-19 2013-06-23 19:00:57+00:00
9 1cV4DVeAM5AstrDlXgvzJ7 Lykke Li I’m Good, I’m Gone 2008-01-28 2013-06-23 22:31:52+00:00

The first naive idea of data to get was the list of the most appearing artists:

tracks_df \     .groupby('artist') \     .count()['id'] \     .reset_index() \     .sort_values('id', ascending=False) \     .rename(columns={'id': 'amount'}) \     .head(10) 
artist amount
260 Pet Shop Boys 12
334 The Knife 11
213 Metronomy 9
303 Soulwax 8
284 Röyksopp 7
180 Ladytron 7
94 Depeche Mode 7
113 Fever Ray 6
324 The Chemical Brothers 6
233 New Order 6

But as taste can change, I’ve decided to get top five artists from each year and check if I was adding them to the playlist in other years:

counted_year_df = tracks_df \     .assign(year_added=tracks_df.added_at.dt.year) \     .groupby(['artist', 'year_added']) \     .count()['id'] \     .reset_index() \     .rename(columns={'id': 'amount'}) \     .sort_values('amount', ascending=False)  in_top_5_year_artist = counted_year_df \     .groupby('year_added') \     .head(5) \     .artist \     .unique()  counted_year_df \     [counted_year_df.artist.isin(in_top_5_year_artist)] \     .pivot('artist', 'year_added', 'amount') \     .fillna(0) \     .style.background_gradient() 

#T_86ce1a46_e565_11e9_86bb_acde48001122row0_col0 { background-color: #9cb9d9; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row0_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row0_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row0_col3 { background-color: #e3e0ee; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row0_col4 { background-color: #4295c3; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row0_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row0_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row1_col0 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row1_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row1_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row1_col3 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row1_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row1_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row1_col6 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row2_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row2_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row2_col2 { background-color: #73a9cf; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row2_col3 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row2_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row2_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row2_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row3_col0 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row3_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row3_col2 { background-color: #056faf; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row3_col3 { background-color: #e3e0ee; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row3_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row3_col5 { background-color: #2685bb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row3_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row4_col0 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row4_col1 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row4_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row4_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row4_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row4_col5 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row4_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row5_col0 { background-color: #4295c3; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row5_col1 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row5_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row5_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row5_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row5_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row5_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row6_col0 { background-color: #4295c3; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row6_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row6_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row6_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row6_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row6_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row6_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row7_col0 { background-color: #4295c3; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row7_col1 { background-color: #f2ecf5; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row7_col2 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row7_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row7_col4 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row7_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row7_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row8_col0 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row8_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row8_col2 { background-color: #056faf; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row8_col3 { background-color: #e3e0ee; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row8_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row8_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row8_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row9_col0 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row9_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row9_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row9_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row9_col4 { background-color: #4295c3; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row9_col5 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row9_col6 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row10_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row10_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row10_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row10_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row10_col4 { background-color: #4295c3; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row10_col5 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row10_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row11_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row11_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row11_col2 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row11_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row11_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row11_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row11_col6 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row12_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row12_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row12_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row12_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row12_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row12_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row12_col6 { background-color: #73a9cf; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row13_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row13_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row13_col2 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row13_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row13_col4 { background-color: #4295c3; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row13_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row13_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row14_col0 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row14_col1 { background-color: #f2ecf5; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row14_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row14_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row14_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row14_col5 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row14_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row15_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row15_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row15_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row15_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row15_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row15_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row15_col6 { background-color: #73a9cf; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row16_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row16_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row16_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row16_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row16_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row16_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row16_col6 { background-color: #73a9cf; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row17_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row17_col1 { background-color: #f2ecf5; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row17_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row17_col3 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row17_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row17_col5 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row17_col6 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row18_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row18_col1 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row18_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row18_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row18_col4 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row18_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row18_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row19_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row19_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row19_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row19_col3 { background-color: #e3e0ee; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row19_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row19_col5 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row19_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row20_col0 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row20_col1 { background-color: #96b6d7; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row20_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row20_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row20_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row20_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row20_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row21_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row21_col1 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row21_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row21_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row21_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row21_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row21_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row22_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row22_col1 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row22_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row22_col3 { background-color: #73a9cf; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row22_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row22_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row22_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row23_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row23_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row23_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row23_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row23_col4 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row23_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row23_col6 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row24_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row24_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row24_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row24_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row24_col4 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row24_col5 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row24_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row25_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row25_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row25_col2 { background-color: #056faf; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row25_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row25_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row25_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row25_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row26_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row26_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row26_col2 { background-color: #73a9cf; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row26_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row26_col4 { background-color: #dbdaeb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row26_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row26_col6 { background-color: #056faf; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row27_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row27_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row27_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row27_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row27_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row27_col5 { background-color: #2685bb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row27_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row28_col0 { background-color: #023858; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row28_col1 { background-color: #f2ecf5; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row28_col2 { background-color: #056faf; color: #f1f1f1; } #T_86ce1a46_e565_11e9_86bb_acde48001122row28_col3 { background-color: #e3e0ee; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row28_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row28_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row28_col6 { background-color: #d0d1e6; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row29_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row29_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row29_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row29_col3 { background-color: #b4c4df; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row29_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row29_col5 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row29_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row30_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row30_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row30_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row30_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row30_col4 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row30_col5 { background-color: #2685bb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row30_col6 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row31_col0 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row31_col1 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row31_col2 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row31_col3 { background-color: #fff7fb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row31_col4 { background-color: #9cb9d9; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row31_col5 { background-color: #2685bb; color: #000000; } #T_86ce1a46_e565_11e9_86bb_acde48001122row31_col6 { background-color: #fff7fb; color: #000000; }

year_added 2013 2014 2015 2016 2017 2018 2019
artist
Arcade Fire 2 0 0 1 3 0 0
Clinic 1 0 0 2 0 0 1
Crystal Castles 0 0 2 2 0 0 0
Depeche Mode 1 0 3 1 0 2 0
Die Antwoord 1 4 0 0 0 1 0
FM Belfast 3 3 0 0 0 0 0
Factory Floor 3 0 0 0 0 0 0
Fever Ray 3 1 1 0 1 0 0
Grimes 1 0 3 1 0 0 0
Holy Ghost! 1 0 0 0 3 1 1
Joe Goddard 0 0 0 0 3 1 0
John Maus 0 0 4 0 0 0 1
KOMPROMAT 0 0 0 0 0 0 2
LCD Soundsystem 0 0 1 0 3 0 0
Ladytron 5 1 0 0 0 1 0
Lindstrøm 0 0 0 0 0 0 2
Marie Davidson 0 0 0 0 0 0 2
Metronomy 0 1 0 6 0 1 1
Midnight Magic 0 4 0 0 1 0 0
Mr. Oizo 0 0 0 1 0 3 0
New Order 1 5 0 0 0 0 0
Pet Shop Boys 0 12 0 0 0 0 0
Röyksopp 0 4 0 3 0 0 0
Schwefelgelb 0 0 0 0 1 0 4
Soulwax 0 0 0 0 5 3 0
Talking Heads 0 0 3 0 0 0 0
The Chemical Brothers 0 0 2 0 1 0 3
The Fall 0 0 0 0 0 2 0
The Knife 5 1 3 1 0 0 1
The Normal 0 0 0 2 0 0 0
The Prodigy 0 0 0 0 0 2 0
Vitalic 0 0 0 0 2 2 0

As a bunch of artists was reappearing in different years, I decided to check if that correlates with new releases, so I’ve checked the last ten years:

counted_release_year_df = tracks_df \     .assign(year_added=tracks_df.added_at.dt.year,             year_released=tracks_df.release_date.dt.year) \     .groupby(['year_released', 'year_added']) \     .count()['id'] \     .reset_index() \     .rename(columns={'id': 'amount'}) \     .sort_values('amount', ascending=False)  counted_release_year_df \     [counted_release_year_df.year_released.isin(         sorted(tracks_df.release_date.dt.year.unique())[-11:]     )] \     .pivot('year_released', 'year_added', 'amount') \     .fillna(0) \     .style.background_gradient() 

#T_e6282bbc_e62d_11e9_86bb_acde48001122row0_col0 { background-color: #2182b9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row0_col1 { background-color: #cacee5; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row0_col2 { background-color: #eae6f1; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row0_col3 { background-color: #023858; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row0_col4 { background-color: #cdd0e5; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row0_col5 { background-color: #73a9cf; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row0_col6 { background-color: #1379b5; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row1_col0 { background-color: #73a9cf; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row1_col1 { background-color: #b4c4df; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row1_col2 { background-color: #cacee5; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row1_col3 { background-color: #4295c3; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row1_col4 { background-color: #d8d7e9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row1_col5 { background-color: #73a9cf; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row1_col6 { background-color: #acc0dd; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row2_col0 { background-color: #9fbad9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row2_col1 { background-color: #73a9cf; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row2_col2 { background-color: #9cb9d9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row2_col3 { background-color: #73a9cf; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row2_col4 { background-color: #afc1dd; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row2_col5 { background-color: #dbdaeb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row2_col6 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row3_col0 { background-color: #023858; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row3_col1 { background-color: #529bc7; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row3_col2 { background-color: #dbdaeb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row3_col3 { background-color: #4295c3; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row3_col4 { background-color: #d8d7e9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row3_col5 { background-color: #9cb9d9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row3_col6 { background-color: #e8e4f0; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row4_col0 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row4_col1 { background-color: #023858; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row4_col2 { background-color: #eae6f1; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row4_col3 { background-color: #f0eaf4; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row4_col4 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row4_col5 { background-color: #023858; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row4_col6 { background-color: #f4eef6; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row5_col0 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row5_col1 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row5_col2 { background-color: #023858; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row5_col3 { background-color: #73a9cf; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row5_col4 { background-color: #afc1dd; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row5_col5 { background-color: #187cb6; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row5_col6 { background-color: #2f8bbe; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row6_col0 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row6_col1 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row6_col2 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row6_col3 { background-color: #0567a2; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row6_col4 { background-color: #bfc9e1; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row6_col5 { background-color: #9cb9d9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row6_col6 { background-color: #acc0dd; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row7_col0 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row7_col1 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row7_col2 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row7_col3 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row7_col4 { background-color: #023858; color: #f1f1f1; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row7_col5 { background-color: #73a9cf; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row7_col6 { background-color: #acc0dd; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row8_col0 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row8_col1 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row8_col2 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row8_col3 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row8_col4 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row8_col5 { background-color: #9cb9d9; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row8_col6 { background-color: #509ac6; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row9_col0 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row9_col1 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row9_col2 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row9_col3 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row9_col4 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row9_col5 { background-color: #fff7fb; color: #000000; } #T_e6282bbc_e62d_11e9_86bb_acde48001122row9_col6 { background-color: #023858; color: #f1f1f1; }

year_added 2013 2014 2015 2016 2017 2018 2019
year_released
2010.0 19 8 2 10 6 5 10
2011.0 14 10 4 6 5 5 5
2012.0 11 15 6 5 8 2 0
2013.0 28 17 3 6 5 4 2
2014.0 0 30 2 1 0 10 1
2015.0 0 0 15 5 8 7 9
2016.0 0 0 0 8 7 4 5
2017.0 0 0 0 0 23 5 5
2018.0 0 0 0 0 0 4 8
2019.0 0 0 0 0 0 0 14

Audio features

Spotify API has an endpoint that provides features like danceability, energy, loudness and etc for tracks. So I gathered features for all tracks from the playlist:

features = [] for n, chunk_series in tracks_df.groupby(np.arange(len(tracks_df)) // 50).id:     features += sp.audio_features([*map(str, chunk_series)]) features_df = pd.DataFrame.from_dict(filter(None, features)) tracks_with_features_df = tracks_df.merge(features_df, on=['id'], how='inner') 
tracks_with_features_df.head() 
id artist name release_date added_at danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature
0 1MLtdVIDLdupSO1PzNNIQg Lindstrøm & Christabelle Looking For What 2009-12-11 2013-06-19 08:28:56+00:00 0.566 0.726 0 -11.294 1 0.1120 0.04190 0.494000 0.282 0.345 120.055 359091 4
1 1gWsh0T1gi55K45TMGZxT0 Au Revoir Simone Knight Of Wands – Dam Mantle Remix 2010-07-04 2013-06-19 08:48:30+00:00 0.563 0.588 4 -7.205 0 0.0637 0.00573 0.932000 0.104 0.467 89.445 237387 4
2 0LE3YWM0W9OWputCB8Z3qt Fever Ray When I Grow Up – D. Lissvik Version 2010-10-02 2013-06-19 22:09:15+00:00 0.687 0.760 5 -6.236 1 0.0479 0.01160 0.007680 0.417 0.818 92.007 270120 4
3 5FyiyLzbZt41IpWyMuiiQy Holy Ghost! Dumb Disco Ideas 2013-05-14 2013-06-19 22:12:42+00:00 0.752 0.831 10 -4.407 1 0.0401 0.00327 0.729000 0.105 0.845 124.234 483707 4
4 5cgfva649kw89xznFpWCFd Nouvelle Vague Too Drunk To Fuck 2004-11-01 2013-06-19 22:22:54+00:00 0.461 0.786 7 -6.950 1 0.0467 0.47600 0.000003 0.495 0.808 159.882 136160 4

After that I’ve checked changes in features over time, only instrumentalness had some visible difference:

sns.boxplot(x=tracks_with_features_df.added_at.dt.year,             y=tracks_with_features_df.instrumentalness) 

Instrumentalness over time

Then I had an idea to check seasonality and valence, and it kind of showed that in depressing months valence is a bit lower:

sns.boxplot(x=tracks_with_features_df.added_at.dt.month,             y=tracks_with_features_df.valence) 

Valence seasonality

To play a bit more with data, I decided to check that danceability and valence might correlate:

tracks_with_features_df.plot(kind='scatter', x='danceability', y='valence') 

Dnaceability vs valence

And to check that the data is meaningful, I checked instrumentalness vs speechiness, and those featues looked mutually exclusive as expected:

tracks_with_features_df.plot(kind='scatter', x='instrumentalness', y='speechiness') 

Speachness vs instrumentalness

Tracks difference and similarity

As I already had a bunch of features classifying tracks, it was hard not to make vectors out of them:

encode_fields = [     'danceability',     'energy',     'key',     'loudness',     'mode',     'speechiness',     'acousticness',     'instrumentalness',     'liveness',     'valence',     'tempo',     'duration_ms',     'time_signature', ]  def encode(row):     return np.array([         (row[k] - tracks_with_features_df[k].min())         / (tracks_with_features_df[k].max() - tracks_with_features_df[k].min())         for k in encode_fields])  tracks_with_features_encoded_df = tracks_with_features_df.assign(     encoded=tracks_with_features_df.apply(encode, axis=1)) 

Then I just calculated distance between every two tracks:

tracks_with_features_encoded_product_df = tracks_with_features_encoded_df \     .assign(temp=0) \     .merge(tracks_with_features_encoded_df.assign(temp=0), on='temp', how='left') \     .drop(columns='temp') tracks_with_features_encoded_product_df = tracks_with_features_encoded_product_df[     tracks_with_features_encoded_product_df.id_x != tracks_with_features_encoded_product_df.id_y ] tracks_with_features_encoded_product_df['merge_id'] = tracks_with_features_encoded_product_df \     .apply(lambda row: ''.join(sorted([row['id_x'], row['id_y']])), axis=1) tracks_with_features_encoded_product_df['distance'] = tracks_with_features_encoded_product_df \     .apply(lambda row: np.linalg.norm(row['encoded_x'] - row['encoded_y']), axis=1) 

After that I was able to get most similar songs/songs with the minimal distance, and it selected kind of similar songs:

tracks_with_features_encoded_product_df \     .sort_values('distance') \     .drop_duplicates('merge_id') \     [['artist_x', 'name_x', 'release_date_x', 'artist_y', 'name_y', 'release_date_y', 'distance']] \     .head(10) 
artist_x name_x release_date_x artist_y name_y release_date_y distance
84370 Labyrinth Ear Wild Flowers 2010-11-21 Labyrinth Ear Navy Light 2010-11-21 0.000000
446773 YACHT I Thought the Future Would Be Cooler 2015-09-11 ADULT. Love Lies 2013-05-13 0.111393
21963 Ladytron Seventeen 2011-03-29 The Juan Maclean Give Me Every Little Thing 2005-07-04 0.125358
11480 Class Actress Careful What You Say 2010-02-09 MGMT Little Dark Age 2017-10-17 0.128865
261780 Queen of Japan I Was Made For Loving You 2001-10-02 Midnight Juggernauts Devil Within 2007-10-02 0.131304
63257 Pixies Bagboy 2013-09-09 Kindness That’s Alright 2012-03-16 0.146897
265792 Datarock Computer Camp Love 2005-10-02 Chromeo Night By Night 2010-09-21 0.147235
75359 Midnight Juggernauts Devil Within 2007-10-02 Lykke Li I’m Good, I’m Gone 2008-01-28 0.152680
105246 ADULT. Love Lies 2013-05-13 Dr. Alban Sing Hallelujah! 1992-05-04 0.154475
285180 Gigamesh Don’t Stop 2012-05-28 Pet Shop Boys Paninaro 95 – 2003 Remaster 2003-10-02 0.156469

The most different songs weren’t that fun, as two songs were too different from the rest:

tracks_with_features_encoded_product_df \     .sort_values('distance', ascending=False) \     .drop_duplicates('merge_id') \     [['artist_x', 'name_x', 'release_date_x', 'artist_y', 'name_y', 'release_date_y', 'distance']] \     .head(10) 
artist_x name_x release_date_x artist_y name_y release_date_y distance
79324 Labyrinth Ear Navy Light 2010-11-21 Boy Harsher Modulations 2014-10-01 2.480206
84804 Labyrinth Ear Wild Flowers 2010-11-21 Boy Harsher Modulations 2014-10-01 2.480206
400840 Charlotte Gainsbourg Deadly Valentine – Soulwax Remix 2017-11-10 Labyrinth Ear Navy Light 2010-11-21 2.478183
84840 Labyrinth Ear Wild Flowers 2010-11-21 Charlotte Gainsbourg Deadly Valentine – Soulwax Remix 2017-11-10 2.478183
388510 Ladytron Paco! 2001-10-02 Labyrinth Ear Navy Light 2010-11-21 2.444927
388518 Ladytron Paco! 2001-10-02 Labyrinth Ear Wild Flowers 2010-11-21 2.444927
20665 Factory Floor Fall Back 2013-01-15 Labyrinth Ear Navy Light 2010-11-21 2.439136
20673 Factory Floor Fall Back 2013-01-15 Labyrinth Ear Wild Flowers 2010-11-21 2.439136
79448 Labyrinth Ear Navy Light 2010-11-21 La Femme Runway 2018-10-01 2.423574
84928 Labyrinth Ear Wild Flowers 2010-11-21 La Femme Runway 2018-10-01 2.423574

Then I calculated the most avarage songs, eg the songs with the least distance from every other song:

tracks_with_features_encoded_product_df \     .groupby(['artist_x', 'name_x', 'release_date_x']) \     .sum()['distance'] \     .reset_index() \     .sort_values('distance') \     .head(10) 
artist_x name_x release_date_x distance
48 Beirut No Dice 2009-02-17 638.331257
591 The Juan McLean A Place Called Space 2014-09-15 643.436523
347 MGMT Little Dark Age 2017-10-17 645.959770
101 Class Actress Careful What You Say 2010-02-09 646.488998
31 Architecture In Helsinki 2 Time 2014-04-01 648.692344
588 The Juan Maclean Give Me Every Little Thing 2005-07-04 648.878463
323 Lindstrøm Baby Can’t Stop 2009-10-26 652.212858
307 Ladytron Seventeen 2011-03-29 652.759843
310 Lauer Mirrors (feat. Jasnau) 2018-11-16 655.498535
451 Pet Shop Boys Always on My Mind 1998-03-31 656.437048

And totally opposite thing – the most outstanding songs:

tracks_with_features_encoded_product_df \     .groupby(['artist_x', 'name_x', 'release_date_x']) \     .sum()['distance'] \     .reset_index() \     .sort_values('distance', ascending=False) \     .head(10) 
artist_x name_x release_date_x distance
665 YACHT Le Goudron – Long Version 2012-05-25 2823.572387
300 Labyrinth Ear Navy Light 2010-11-21 1329.234390
301 Labyrinth Ear Wild Flowers 2010-11-21 1329.234390
57 Blonde Redhead For the Damaged Coda 2000-06-06 1095.393120
616 The Velvet Underground After Hours 1969-03-02 1080.491779
593 The Knife Forest Families 2006-02-17 1040.114214
615 The Space Lady Major Tom 2013-11-18 1016.881467
107 CocoRosie By Your Side 2004-03-09 1015.970860
170 El Perro Del Mar Party 2015-02-13 1012.163212
403 Mr.Kitty XIII 2014-10-06 1010.115117

Conclusion

Although the dataset is a bit small, it was still fun to have a look at the data.

Gist with a jupyter notebook with even more boring stuff, can be reused by modifying credentials.

Planet Python

PyPy Development: PyPy v7.2 released

The PyPy team is proud to release the version 7.2.0 of PyPy, which includes two different interpreters:

  • PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.13
  • PyPy3.6: which is an interpreter supporting the syntax and the features of Python 3.6, including the stdlib for CPython 3.6.9.

The interpreters are based on much the same codebase, thus the double release.

As always, this release is 100% compatible with the previous one and fixed several issues and bugs raised by the growing community of PyPy users. We strongly recommend updating. Many of the fixes are the direct result of end-user bug reports, so please continue reporting issues as they crop up.

You can download the v7.2 releases here:

With the support of Arm Holdings Ltd. and Crossbar.io, this release supports the 64-bit aarch64 ARM architecture. More about the work and the performance data around this welcome development can be found in the blog post.

This release removes the “beta” tag from PyPy3.6. While there may still be some small corner-case incompatibilities (around the exact error messages in exceptions and the handling of faulty codec errorhandlers) we are happy with the quality of the 3.6 series and are looking forward to working on a Python 3.7 interpreter.

We updated our benchmark runner at https://speed.pypy.org to a more modern machine and updated the baseline python to CPython 2.7.11. Thanks to Baroque Software for maintaining the benchmark runner.

The CFFI-based _ssl module was backported to PyPy2.7 and updated to use cryptography version 2.7. Additionally, the _hashlib, and crypt (or _crypt on Python3) modules were converted to CFFI. This has two consequences: end users and packagers can more easily update these libraries for their platform by executing (cd lib_pypy; ../bin/pypy _*_build.py). More significantly, since PyPy itself links to fewer system shared objects (DLLs), on platforms with a single runtime namespace like linux, different CFFI and c-extension modules can load different versions of the same shared object into PyPy without collision (issue 2617).

Until downstream providers begin to distribute c-extension builds with PyPy, we have made packages for some common packages available as wheels.

The CFFI backend has been updated to version 1.13.0. We recommend using CFFI rather than c-extensions to interact with C, and cppyy for interacting with C++ code.

Thanks to Anvil, we revived the PyPy Sandbox, (soon to be released) which allows total control over a Python interpreter’s interactions with the external world.

We implemented a new JSON decoder that is much faster, uses less memory, and uses a JIT-friendly specialized dictionary. More about that in the recent blog post

We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for direct consulting work.
We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on PyPy, or general help with making RPython’s JIT even better. Since the previous release, we have accepted contributions from 27 new contributors, so thanks for pitching in.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7, 3.6. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This PyPy release supports:

  • x86 machines on most common operating systems (Linux 32/64 bit, Mac OS X 64-bit, Windows 32-bit, OpenBSD, FreeBSD)
  • big- and little-endian variants of PPC64 running Linux
  • s390x running Linux
  • 64-bit ARM machines running Linux

Unfortunately at the moment of writing our ARM buildbots are out of service, so for now we are not releasing any binary for the ARM architecture (32-bit), although PyPy does support ARM 32-bit processors.

What else is new?

PyPy 7.1 was released in March, 2019. There are many incremental improvements to RPython and PyPy, For more information about the 7.2.0 release, see the full changelog.

Please update, and continue to help us make PyPy better.

Cheers,
The PyPy team


Planet Python

Real Python: Cool New Features in Python 3.8

The newest version of Python is released today! Python 3.8 has been available in beta versions since the summer, but on October 14th, 2019 the first official version is ready. Now, we can all start playing with the new features and benefit from the latest improvements.

What does Python 3.8 bring to the table? The documentation gives a good overview of the new features. However, this article will go more in depth on some of the biggest changes, and show you how you can take advantage of Python 3.8.

In this article, you’ll learn about:

  • Using assignment expressions to simplify some code constructs
  • Enforcing positional-only arguments in your own functions
  • Specifying more precise type hints
  • Using f-strings for simpler debugging

With a few exceptions, Python 3.8 contains many small improvements over the earlier versions. Towards the end of the article, you’ll see many of these less attention-grabbing changes, as well as a discussion about some of the optimizations that make Python 3.8 faster than its predecessors. Finally, you’ll get some advice about upgrading to the new version.

Free Bonus: Click here to get access to a chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

The Walrus in the Room: Assignment Expressions

The biggest change in Python 3.8 is the introduction of assignment expressions. They are written using a new notation (:=). This operator is often called the walrus operator as it resembles the eyes and tusks of a walrus on its side.

Assignment expressions allow you to assign and return a value in the same expression. For example, if you want to assign to a variable and print its value, then you typically do something like this:

>>>

>>> walrus = False >>> print(walrus) False 

In Python 3.8, you’re allowed to combine these two statements into one, using the walrus operator:

>>>

>>> print(walrus := True) True 

The assignment expression allows you to assign True to walrus, and immediately print the value. But keep in mind that the walrus operator does not do anything that isn’t possible without it. It only makes certain constructs more convenient, and can sometimes communicate the intent of your code more clearly.

One pattern that shows some of the strengths of the walrus operator is while loops where you need to initialize and update a variable. For example, the following code asks the user for input until they type quit:

inputs = list() current = input("Write something: ") while current != "quit":     inputs.append(current)     current = input("Write something: ") 

This code is less than ideal. You’re repeating the input() statement, and somehow you need to add current to the list before asking the user for it. A better solution is to set up an infinite while loop, and use break to stop the loop:

inputs = list() while True:     current = input("Write something: ")     if current == "quit":         break     inputs.append(current) 

This code is equivalent to the one above, but avoids the repetition and somehow keeps the lines in a more logical order. If you use an assignment expression, you can simplify this loop further:

inputs = list() while (current := input("Write something: ")) != "quit":     inputs.append(current) 

This moves the test back to the while line, where it should be. However, there are now several things happening at that line, so it takes a bit more effort to read it properly. Use your best judgement about when the walrus operator helps make your code more readable.

PEP 572 describes all the details of assignment expressions, including some of the rationale for introducing them into the language, as well as several examples of how the walrus operator can be used.

Positional-Only Arguments

The built-in function float() can be used for converting text strings and numbers to float objects. Consider the following example:

>>>

>>> float("3.8") 3.8  >>> help(float) class float(object)  |  float(x=0, /)  |    |  Convert a string or number to a floating point number, if possible.  [...] 

Look closely at the signature of float(). Notice the slash (/) after the parameter. What does it mean?

Note: For an in-depth discussion on the / notation, see PEP 457 – Notation for Positional-Only Parameters.

It turns out that while the one parameter of float() is called x, you’re not allowed to use its name:

>>>

>>> float(x="3.8") Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: float() takes no keyword arguments 

When using float() you’re only allowed to specify arguments by position, not by keyword. Before Python 3.8, such positional-only arguments were only possible for built-in functions. There was no easy way to specify that arguments should be positional-only in your own functions:

>>>

>>> def incr(x): ...     return x + 1 ...  >>> incr(3.8) 4.8  >>> incr(x=3.8) 4.8 

It’s possible to simulate positional-only arguments using *args, but this is less flexible, less readable, and forces you to implement your own argument parsing. In Python 3.8, you can use / to denote that all arguments before it must be specified by position. You can rewrite incr() to only accept positional arguments:

>>>

>>> def incr(x, /): ...     return x + 1 ...  >>> incr(3.8) 4.8  >>> incr(x=3.8) Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: incr() got some positional-only arguments passed as            keyword arguments: 'x' 

By adding / after x, you specify that x is a positional-only argument. You can combine regular arguments with positional-only ones by placing the regular arguments after the slash:

>>>

>>> def greet(name, /, greeting="Hello"): ...     return f"{greeting}, {name}" ...  >>> greet("Łukasz") 'Hello, Łukasz'  >>> greet("Łukasz", greeting="Awesome job") 'Awesome job, Łukasz'  >>> greet(name="Łukasz", greeting="Awesome job") Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: greet() got some positional-only arguments passed as            keyword arguments: 'name' 

In greet(), the slash is placed between name and greeting. This means that name is a positional-only argument, while greeting is a regular argument that can be passed either by position or by keyword.

At first glance, positional-only arguments can seem a bit limiting and contrary to Python’s mantra about the importance of readability. You will probably find that there are not a lot of occasions where positional-only arguments improve your code.

However, in the right circumstances, positional-only arguments can give you some flexibility when you’re designing functions. First, positional-only arguments make sense when you have arguments that have a natural order but are hard to give good, descriptive names to.

Another possible benefit of using positional-only arguments is that you can more easily refactor your functions. In particular, you can change the name of your parameters without worrying that other code depends on those names.

Positional-only arguments nicely complement keyword-only arguments. In any version of Python 3, you can specify keyword-only arguments using the star (*). Any argument after * must be specified using a keyword:

>>>

>>> def to_fahrenheit(*, celsius): ...     return 32 + celsius * 9 / 5 ...  >>> to_fahrenheit(40) Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: to_fahrenheit() takes 0 positional arguments but 1 was given  >>> to_fahrenheit(celsius=40) 104.0 

celsius is a keyword-only argument, so Python raises an error if you try to specify it based on position, without the keyword.

You can combine positional-only, regular, and keyword-only arguments, by specifying them in this order separated by / and *. In the following example, text is a positional-only argument, border is a regular argument with a default value, and width is a keyword-only argument with a default value:

>>>

>>> def headline(text, /, border="♦", *, width=50): ...     return f" {text} ".center(width, border) ...  

Since text is positional-only, you can’t use the keyword text:

>>>

>>> headline("Positional-only Arguments") '♦♦♦♦♦♦♦♦♦♦♦ Positional-only Arguments ♦♦♦♦♦♦♦♦♦♦♦♦'  >>> headline(text="This doesn't work!") Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: headline() got some positional-only arguments passed as            keyword arguments: 'text' 

border, on the other hand, can be specified both with and without the keyword:

>>>

>>> headline("Python 3.8", "=") '=================== Python 3.8 ==================='  >>> headline("Real Python", border=":") ':::::::::::::::::: Real Python :::::::::::::::::::' 

Finally, width must be specified using the keyword:

>>>

>>> headline("Python", "🐍", width=38) '🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍 Python 🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍'  >>> headline("Python", "🐍", 38) Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: headline() takes from 1 to 2 positional arguments            but 3 were given 

You can read more about positional-only arguments in PEP 570.

More Precise Types

Python’s typing system is quite mature at this point. However, in Python 3.8, some new features have been added to typing to allow more precise typing:

  • Literal types
  • Typed dictionaries
  • Final objects
  • Protocols

Python supports optional type hints, typically as annotations on your code:

def double(number: float) -> float:     return 2 * number 

In this example, you say that number should be a float and the double() function should return a float, as well. However, Python treats these annotations as hints. They are not enforced at runtime:

>>>

>>> double(3.14) 6.28  >>> double("I'm not a float") "I'm not a floatI'm not a float" 

double() happily accepts "I'm not a float" as an argument, even though that’s not a float. There are libraries that can use types at runtime, but that is not the main use case for Python’s type system.

Instead, type hints allow static type checkers to do type checking of your Python code, without actually running your scripts. This is reminiscent of compilers catching type errors in other languages like Java, Rust, and Crystal. Additionally, type hints act as documentation of your code, making it easier to read, as well as improving auto-complete in your IDE.

Note: There are several static type checkers available, including Pyright, Pytype, and Pyre. In this article, you’ll use Mypy. You can install Mypy from PyPI using pip:

$   python -m pip install mypy 

In some sense, Mypy is the reference implementation of a type checker for Python, and is being developed at Dropbox under the lead of Jukka Lehtasalo. Python’s creator, Guido van Rossum, is part of the Mypy team.

You can find more information about type hints in Python in the original PEP 484, as well as in Python Type Checking (Guide).

There are four new PEPs about type checking that have been accepted and included in Python 3.8. You’ll see short examples from each of these.

PEP 586 introduce the Literal type. Literal is a bit special in that it represents one or several specific values. One use case of Literal is to be able to precisely add types, when string arguments are used to describe specific behavior. Consider the following example:

# draw_line.py  def draw_line(direction: str) -> None:     if direction == "horizontal":         ...  # Draw horizontal line      elif direction == "vertical":         ...  # Draw vertical line      else:         raise ValueError(f"invalid direction {direction!r}")  draw_line("up") 

The program will pass the static type checker, even though "up" is an invalid direction. The type checker only checks that "up" is a string. In this case, it would be more precise to say that direction must be either the literal string "horizontal" or the literal string "vertical". Using Literal, you can do exactly that:

# draw_line.py  from typing import Literal  def draw_line(direction: Literal["horizontal", "vertical"]) -> None:     if direction == "horizontal":         ...  # Draw horizontal line      elif direction == "vertical":         ...  # Draw vertical line      else:         raise ValueError(f"invalid direction {direction!r}")  draw_line("up") 

By exposing the allowed values of direction to the type checker, you can now be warned about the error:

$   mypy draw_line.py  draw_line.py:15: error:     Argument 1 to "draw_line" has incompatible type "Literal['up']";     expected "Union[Literal['horizontal'], Literal['vertical']]" Found 1 error in 1 file (checked 1 source file) 

The basic syntax is Literal[<literal>]. For instance, Literal[38] represents the literal value 38. You can express one of several literal values using Union:

Union[Literal["horizontal"], Literal["vertical"]] 

Since this is a fairly common use case, you can (and probably should) use the simpler notation Literal["horizontal", "vertical"] instead. You already used the latter when adding types to draw_line(). If you look carefully at the output from Mypy above, you can see that it translated the simpler notation to the Union notation internally.

There are cases where the type of the return value of a function depends on the input arguments. One example is open() which may return a text string or a byte array depending on the value of mode. This can be handled through overloading.

The following example shows the skeleton of a calculator that can return the answer either as regular numbers (38), or as roman numerals (XXXVIII):

# calculator.py  from typing import Union  ARABIC_TO_ROMAN = [(1000, "M"), (900, "CM"), (500, "D"), (400, "CD"),                    (100, "C"), (90, "XC"), (50, "L"), (40, "XL"),                    (10, "X"), (9, "IX"), (5, "V"), (4, "IV"), (1, "I")]  def _convert_to_roman_numeral(number: int) -> str:     """Convert number to a roman numeral string"""     result = list()     for arabic, roman in ARABIC_TO_ROMAN:         count, number = divmod(number, arabic)         result.append(roman * count)     return "".join(result)  def add(num_1: int, num_2: int, to_roman: bool = True) -> Union[str, int]:     """Add two numbers"""     result = num_1 + num_2      if to_roman:         return _convert_to_roman_numeral(result)     else:         return result 

The code has the correct type hints: the result of add() will be either str or int. However, often this code will be called with a literal True or False as the value of to_roman in which case you would like the type checker to infer exactly whether str or int is returned. This can be done using Literal together with @overload:

# calculator.py  from typing import Literal, overload, Union  ARABIC_TO_ROMAN = [(1000, "M"), (900, "CM"), (500, "D"), (400, "CD"),                    (100, "C"), (90, "XC"), (50, "L"), (40, "XL"),                    (10, "X"), (9, "IX"), (5, "V"), (4, "IV"), (1, "I")]  def _convert_to_roman_numeral(number: int) -> str:     """Convert number to a roman numeral string"""     result = list()     for arabic, roman in ARABIC_TO_ROMAN:         count, number = divmod(number, arabic)         result.append(roman * count)     return "".join(result)  @overload def add(num_1: int, num_2: int, to_roman: Literal[True]) -> str: ... @overload def add(num_1: int, num_2: int, to_roman: Literal[False]) -> int: ...  def add(num_1: int, num_2: int, to_roman: bool = True) -> Union[str, int]:     """Add two numbers"""     result = num_1 + num_2      if to_roman:         return _convert_to_roman_numeral(result)     else:         return result 

The added @overload signatures will help your type checker infer str or int depending on the literal values of to_roman. Note that the ellipses (...) are a literal part of the code. They stand in for the function body in the overloaded signatures.

As a complement to Literal, PEP 591 introduces Final. This qualifier specifies that a variable or attribute should not be reassigned, redefined, or overridden. The following is a typing error:

from typing import Final  ID: Final = 1  ...  ID += 1 

Mypy will highlight the line ID += 1, and note that you Cannot assign to final name "ID". This gives you a way to ensure that constants in your code never change their value.

Additionally, there is also a @final decorator that can be applied to classes and methods. Classes decorated with @final can’t be subclassed, while @final methods can’t be overridden by subclasses:

from typing import final  @final class Base:     ...  class Sub(Base):     ... 

Mypy will flag this example with the error message Cannot inherit from final class "Base". To learn more about Final and @final, see PEP 591.

The third PEP allowing for more specific type hints is PEP 589, which introduces TypedDict. This can be used to specify types for keys and values in a dictionary using a notation that is similar to the typed NamedTuple.

Traditionally, dictionaries have been annotated using Dict. The issue is that this only allowed one type for the keys and one type for the values, often leading to annotations like Dict[str, Any]. As an example, consider a dictionary that registers information about Python versions:

py38 = {"version": "3.8", "release_year": 2019} 

The value corresponding to version is a string, while release_year is an integer. This can’t be precisely represented using Dict. With the new TypedDict, you can do the following:

from typing import TypedDict  class PythonVersion(TypedDict):     version: str     release_year: int  py38 = PythonVersion(version="3.8", release_year=2019) 

The type checker will then be able to infer that py38["version"] has type str, while py38["release_year"] is an int. At runtime, a TypedDict is a regular dict, and type hints are ignored as usual. You can also use TypedDict purely as an annotation:

py38: PythonVersion = {"version": "3.8", "release_year": 2019} 

Mypy will let you know if any of your values has the wrong type, or if you use a key that has not been declared. See PEP 589 for more examples.

Mypy has supported Protocols for a while already. However, the official acceptance only happened in May 2019.

Protocols are a way of formalizing Python’s support for duck typing:

When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck. (Source)

Duck typing allows you to, for example, read .name on any object that has a .name attribute, without really caring about the type of the object. It may seem counter-intuitive for the typing system to support this. Through structural subtyping, it’s still possible to make sense of duck typing.

You can for instance define a protocol called Named that can identify all objects with a .name attribute:

from typing import Protocol  class Named(Protocol):     name: str  def greet(obj: Named) -> None:     print(f"Hi {obj.name}") 

Here, greet() takes any object, as long as it defines a .name attribute. See PEP 544 and the Mypy documentation for more information about protocols.

Simpler Debugging With f-Strings

f-strings were introduced in Python 3.6, and have become very popular. They might be the most common reason for Python libraries only being supported on version 3.6 and later. An f-string is a formatted string literal. You can recognize it by the leading f:

>>>

>>> style = "formatted" >>> f"This is a {style} string" 'This is a formatted string' 

When you use f-strings, you can enclose variables and even expressions inside curly braces. They will then be evaluated at runtime and included in the string. You can have several expressions in one f-string:

>>>

>>> import math >>> r = 3.6  >>> f"A circle with radius {r} has area {math.pi * r * r:.2f}" 'A circle with radius 3.6 has area 40.72' 

In the last expression, {math.pi * r * r:.2f}, you also use a format specifier. Format specifiers are separated from the expressions with a colon.

.2f means that the area is formatted as a floating point number with 2 decimals. The format specifiers are the same as for .format(). See the official documentation for a full list of allowed format specifiers.

In Python 3.8, you can use assignment expressions inside f-strings. Just make sure to surround the assignment expression with parentheses:

>>>

>>> import math >>> r = 3.8  >>> f"Diameter {(diam := 2 * r)} gives circumference {math.pi * diam:.2f}" 'Diameter 7.6 gives circumference 23.88' 

However, the real f-news in Python 3.8 is the new debugging specifier. You can now add = at the end of an expression, and it will print both the expression and its value:

>>>

>>> python = 3.8 >>> f"{python=}" 'python=3.8' 

This is a short-hand, that typically will be most useful when working interactively or adding print statements to debug your script. In earlier versions of Python, you needed to spell out the variable or expression twice to get the same information:

>>>

>>> python = 3.7 >>> f"python={python}" 'python=3.7' 

You can add spaces around =, and use format specifiers as usual:

>>>

>>> name = "Eric" >>> f"{name = }" "name = 'Eric'"  >>> f"{name = :>10}" 'name =       Eric' 

The >10 format specifier says that name should be right-aligned within a 10 character string. = works for more complex expressions as well:

>>>

>>> f"{name.upper()[::-1] = }" "name.upper()[::-1] = 'CIRE'" 

For more information about f-strings, see Python 3’s f-Strings: An Improved String Formatting Syntax (Guide).

The Python Steering Council

Technically, Python’s governance is not a language feature. However, Python 3.8 is the first version of Python not developed under the benevolent dictatorship of Guido van Rossum. The Python language is now governed by a steering council consisting of five core developers:

The road to the new governance model for Python was an interesting study in self-organization. Guido van Rossum created Python in the early 1990s, and has been affectionally dubbed Python’s Benevolent Dictator for Life (BDFL). Through the years, more and more decisions about the Python language were made through Python Enhancement Proposals (PEPs). Still, Guido officially had the last word on any new language feature.

After a long and drawn out discussion about assignment expressions, Guido announced in July 2018 that he was retiring from his role as BDFL (for real this time). He purposefully did not name a successor. Instead, he asked the team of core developers to figure out how Python should be governed going forward.

Luckily, the PEP process was already well established, so it was natural to use PEPs to discuss and decide on a new governance model. Through the fall of 2018, several models were proposed, including electing a new BDFL (renamed the Gracious Umpire Influencing Decisions Officer: the GUIDO), or moving to a community model based on consensus and voting, without centralized leadership. In December 2018, the steering council model was chosen after a vote among the core developers.

The Python Steering Council at PyCon 2019The Python Steering Council at PyCon 2019. From left to right: Barry Warsaw, Brett Cannon, Carol Willing, Guido van Rossum, and Nick Loghlan (Image: Geir Arne Hjelle)

The steering council consists of five members of the Python community, as listed above. There will be an election for a new steering council after every major release of Python. In other words, there will be an election following the release of Python 3.8.

Although it’s an open election, it’s expected that most, if not all, of the inaugural steering council will be reelected. The steering council has broad powers to make decisions about the Python language, but should strive to exercise those powers as little as possible.

You can read all about the new governance model in PEP 13, while the process of deciding on the new model is described in PEP 8000. For more information, see the PyCon 2019 Keynote, and listen to Brett Cannon on Talk Python To Me and on The Changelog podcast. You can follow updates from the steering council on GitHub.

Other Pretty Cool Features

So far, you’ve seen the headline news regarding what’s new in Python 3.8. However, there are many other changes that are also pretty cool. In this section, you’ll get a quick look at some of them.

importlib.metadata

There is one new module available in the standard library in Python 3.8: importlib.metadata. Through this module, you can access information about installed packages in your Python installation. Together with its companion module, importlib.resources, importlib.metadata improves on the functionality of the older pkg_resources.

As an example, you can get some information about pip:

>>>

>>> from importlib import metadata >>> metadata.version("pip") '19.2.3'  >>> pip_metadata = metadata.metadata("pip") >>> list(pip_metadata) ['Metadata-Version', 'Name', 'Version', 'Summary', 'Home-page', 'Author',  'Author-email', 'License', 'Keywords', 'Platform', 'Classifier',   'Classifier', 'Classifier', 'Classifier', 'Classifier', 'Classifier',   'Classifier', 'Classifier', 'Classifier', 'Classifier', 'Classifier',   'Classifier', 'Classifier', 'Requires-Python']  >>> pip_metadata["Home-page"] 'https://pip.pypa.io/'  >>> pip_metadata["Requires-Python"] '>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*'  >>> len(metadata.files("pip")) 668 

The currently installed version of pip is 19.2.3. metadata() gives access to most of the information that you can see on PyPI. You can for instance see that this version of pip requires either Python 2.7, or Python 3.5 or higher. With files(), you get a listing of all files that make up the pip package. In this case, there are almost 700 files.

files() returns a list of Path objects. These give you a convenient way of looking into the source code of a package, using read_text(). The following example prints out __init__.py from the realpython-reader package:

>>>

>>> [p for p in metadata.files("realpython-reader") if p.suffix == ".py"] [PackagePath('reader/__init__.py'), PackagePath('reader/__main__.py'),  PackagePath('reader/feed.py'), PackagePath('reader/viewer.py')]  >>> init_path = _[0]  # Underscore access last returned value in the REPL >>> print(init_path.read_text()) """Real Python feed reader  Import the `feed` module to work with the Real Python feed:      >>> from reader import feed     >>> feed.get_titles()     ['Logging in Python', 'The Best Python Books', ...]  See https://github.com/realpython/reader/ for more information """  # Version of realpython-reader package __version__ = "1.0.0"  ... 

You can also access package dependencies:

>>>

>>> metadata.requires("realpython-reader") ['feedparser', 'html2text', 'importlib-resources', 'typing'] 

requires() lists the dependencies of a package. You can see that realpython-reader for instance uses feedparser in the background to read and parse a feed of articles.

There is a backport of importlib.metadata available on PyPI that works on earlier versions of Python. You can install it using pip:

$   python -m pip install importlib-metadata 

You can fall back on using the PyPI backport in your code as follows:

try:     from importlib import metadata except ImportError:     import importlib_metadata as metadata  ... 

See the documentation for more information about importlib.metadata

New and Improved math and statistics Functions

Python 3.8 brings many improvements to existing standard library packages and modules. math in the standard library has a few new functions. math.prod() works similarly to the built-in sum(), but for multiplicative products:

>>>

>>> import math >>> math.prod((2, 8, 7, 7)) 784  >>> 2 * 8 * 7 * 7 784 

The two statements are equivalent. prod() will be easier to use when you already have the factors stored in an iterable.

Another new function is math.isqrt(). You can use isqrt() to find the integer part of square roots:

>>>

>>> import math >>> math.isqrt(9) 3  >>> math.sqrt(9) 3.0  >>> math.isqrt(15) 3  >>> math.sqrt(15) 3.872983346207417 

The square root of 9 is 3. You can see that isqrt() returns an integer result, while math.sqrt() always returns a float. The square root of 15 is almost 3.9. Note that isqrt() truncates the answer down to the next integer, in this case 3.

Finally, you can now more easily work with n-dimensional points and vectors in the standard library. You can find the distance between two points with math.dist(), and the length of a vector with math.hypot():

>>>

>>> import math >>> point_1 = (16, 25, 20) >>> point_2 = (8, 15, 14)  >>> math.dist(point_1, point_2) 14.142135623730951  >>> math.hypot(*point_1) 35.79106033634656  >>> math.hypot(*point_2) 22.02271554554524 

This makes it easier to work with points and vectors using the standard library. However, if you will be doing many calculations on points or vectors, you should check out NumPy.

The statistics module also has several new functions:

The following example shows the functions in use:

>>>

>>> import statistics >>> data = [9, 3, 2, 1, 1, 2, 7, 9] >>> statistics.fmean(data) 4.25  >>> statistics.geometric_mean(data) 3.013668912157617  >>> statistics.multimode(data) [9, 2, 1]  >>> statistics.quantiles(data, n=4) [1.25, 2.5, 8.5] 

In Python 3.8, there is a new statistics.NormalDist class that makes it more convenient to work with the Gaussian normal distribution.

To see an example of using NormalDist, you can try to compare the speed of the new statistics.fmean() and the traditional statistics.mean():

>>>

>>> import random >>> import statistics >>> from timeit import timeit  >>> # Create 10,000 random numbers >>> data = [random.random() for _ in range(10_000)]  >>> # Measure the time it takes to run mean() and fmean() >>> t_mean = [timeit("statistics.mean(data)", number=100, globals=globals()) ...           for _ in range(30)] >>> t_fmean = [timeit("statistics.fmean(data)", number=100, globals=globals()) ...            for _ in range(30)]  >>> # Create NormalDist objects based on the sampled timings >>> n_mean = statistics.NormalDist.from_samples(t_mean) >>> n_fmean = statistics.NormalDist.from_samples(t_fmean)  >>> # Look at sample mean and standard deviation >>> n_mean.mean, n_mean.stdev (0.825690647733245, 0.07788573997674526)  >>> n_fmean.mean, n_fmean.stdev (0.010488564966666065, 0.0008572332785645231)  >>> # Calculate the lower 1 percentile of mean >>> n_mean.quantiles(n=100)[0] 0.6445013221202459 

In this example, you use timeit to measure the execution time of mean() and fmean(). To get reliable results, you let timeit execute each function 100 times, and collect 30 such time samples for each function. Based on these samples, you create two NormalDist objects. Note that if you run the code yourself, it might take up to a minute to collect the different time samples.

NormalDist has many convenient attributes and methods. See the documentation for a complete list. Inspecting .mean and .stdev, you see that the old statistics.mean() runs in 0.826 ± 0.078 seconds, while the new statistics.fmean() spends 0.0105 ± 0.0009 seconds. In other words, fmean() is about 80 times faster for these data.

If you need more advanced statistics in Python than the standard library offers, check out statsmodels and scipy.stats.

Warnings About Dangerous Syntax

Python has a SyntaxWarning which can warn about dubious syntax that is typically not a SyntaxError. Python 3.8 adds a few new ones that can help you during coding and debugging.

The difference between is and == can be confusing. The latter checks for equal values, while is is True only when objects are the same. Python 3.8 will try to warn you about cases when you should use == instead of is:

>>>

>>> # Python 3.7 >>> version = "3.7" >>> version is "3.7" False  >>> # Python 3.8 >>> version = "3.8" >>> version is "3.8" <stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="? False  >>> version == "3.8" True 

It’s easy to miss a comma when you’re writing out a long list, especially when formatting it vertically. Forgetting a comma in a list of tuples will give a confusing error message about tuples not being callable. Python 3.8 additionally emits a warning that points toward the real issue:

>>>

>>> [ ...   (1, 3) ...   (2, 4) ... ] <stdin>:2: SyntaxWarning: 'tuple' object is not callable; perhaps            you missed a comma? Traceback (most recent call last):   File "<stdin>", line 2, in <module> TypeError: 'tuple' object is not callable 

The warning correctly identifies the missing comma as the real culprit.

Optimizations

There are several optimizations made for Python 3.8. Some that make code run faster. Others reduce the memory footprint. For example, looking up fields in a namedtuple is significantly faster in Python 3.8 compared with Python 3.7:

>>>

>>> import collections >>> from timeit import timeit >>> Person = collections.namedtuple("Person", "name twitter") >>> raymond = Person("Raymond", "@raymondh")  >>> # Python 3.7 >>> timeit("raymond.twitter", globals=globals()) 0.05876131607996285  >>> # Python 3.8 >>> timeit("raymond.twitter", globals=globals()) 0.0377705999400132 

You can see that looking up .twitter on the namedtuple is 30-40% faster in Python 3.8. Lists save some space when they are initialized from iterables with a known length. This can save memory:

>>>

>>> import sys  >>> # Python 3.7 >>> sys.getsizeof(list(range(20191014))) 181719232  >>> # Python 3.8 >>> sys.getsizeof(list(range(20191014))) 161528168 

In this case, the list uses about 11% less memory in Python 3.8 compared with Python 3.7.

Other optimizations include better performance in subprocess, faster file copying with shutil, improved default performance in pickle, and faster operator.itemgetter operations. See the official documentation for a complete list of optimizations.

So, Should You Upgrade to Python 3.8?

Let’s start with the simple answer. If you want to try out any of the new features you have seen here, then you do need to be able to use Python 3.8. Tools like pyenv and Anaconda make it easy to have several versions of Python installed side by side. Alternatively, you can run the official Python 3.8 Docker container. There is no downside to trying out Python 3.8 for yourself.

Now, for the more complicated questions. Should you upgrade your production environment to Python 3.8? Should you make your own project dependent on Python 3.8 to take advantage of the new features?

You should have very few issues running Python 3.7 code in Python 3.8. Upgrading your environment to run Python 3.8 is therefore quite safe, and you would be able to take advantage of the optimizations made in the new version. Different beta-versions of Python 3.8 have already been available for months, so hopefully most bugs are already squashed. However, if you want to be conservative, you might hold out until the first maintenance release (Python 3.8.1) is available.

Once you’ve upgraded your environment, you can start to experiment with features that are only in Python 3.8, such as assignment expressions and positional-only arguments. However, you should be conscious about whether other people depend on your code, as this will force them to upgrade their environment as well. Popular libraries will probably mostly support at least Python 3.6 for quite a while longer.

See Porting to Python 3.8 for more information about preparing your code for Python 3.8.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Planet Python

Success in Adoption: Get Help

Welcome back for the next post in our  adoption series about getting maximum success from your analytics strategy. In the first part, I introduced you to my adopted daughter, Tori, and this concept. In the second post, we discussed following a thorough vetting process. Next, we discussed how to commit to your decision. Today, we are on the fourth step of getting help.

To review, the five steps to adoption are listed below:

If you missed any of the previous blogs in this series, you can find them at the links above.

There Is No Expert on Day One

When Tori boarded the bus for her first day of pre-K, she was not even four years old. Our school district offered special needs pre-K and transportation for her. The idea was to slowly get her ready—physically, emotionally and mentally—for a school environment. She would only be in the classroom a few hours a day, but she had a long bus ride since there were only a couple of schools that offered these services for children like her with vision deficiencies and developmental disabilities.

In addition to this, we found we could get help in many other ways. Some services were included with our medical benefits or from the state of Georgia; others we had to pay for out of pocket. In just her first five years of life, Tori participated in:

  • Occupational therapy
  • Speech therapy
  • Play therapy (counseling for children)
  • Physical therapy
  • Center for the Visually Impaired
  • Many, many others, too

Without this critical help, we wouldn’t have made it through her early childhood. One of the things I’ve learned is how important it is to talk with others who have been through the same scary or stressful thing you might be going through. These methods of support also led us to meet people along the way that could encourage us, people who had been there before and made it to the other side.

Above: Tori needed special services as early as her first day of pre-K.

Building a Network of Support

Today, Tori still gets help in other ways for her special needs. We are grateful we have professionals out there to help us, and we also have an amazing loving family and community surrounding us.

In similar ways, you WILL need help in your analytics journey. It might be in the area of skillset for you and your team, or it might be in navigating major culture change in an organization that lends itself to more traditional thinking and operation. You may need help with underlying infrastructure or the database technology our team is using. You may need help with making a visualization that pops or understanding how trend lines and box plots are most useful.

Get into the habit of asking for help, early and often. You will be more successful because of it, and your organization likely will be, too. If you don’t get help when you need it, you’ll either waste a lot of energy, time and emotion throughout your journey, never getting to that end vision you had early on, or (even worse) the project could fail, leaving everyone scratching their heads.

Help Is Everywhere

Yes, we can help you. We would love to help you! There are countless ways that InterWorks can support you, but in the Tableau community, also, help is everywhere. In my life prior to being a consultant, I combined help from the community with a great partner like InterWorks. Some of the ways you can leverage free help are:

  • Attend your local Tableau User Group
  • Get involved online with the Tableau community (Twitter, LinkedIn, Tableau Community on Tableau.com)
  • Have regular calls with your Tableau rep because your success is their success, too
  • Attend Tableau Conference yearly
  • Get involved in online visualization discussions like Makeover Monday and Viz For Social Good, among others
  • Network with others in your industry

How InterWorks Can Help You

There came a time where my team needed the professionals. We didn’t have enough staffing for the project we needed to get done. We didn’t have time to do it on our own. We needed a fresh visual look and feel with our content. We needed new ideas because ours weren’t working well. Some of the ways I leveraged partners (specifically InterWorks) in the past include:

  • Develop dashboards to replace legacy systems
  • Improve UI/UX of current dashboards
  • Build a portal to help maximize usage for our execs and front-line employees
  • Empower our team with official training classes
  • Build high-performance analytics databases in technology that was new to our team and training us on that technology
  • Speak at internal analytics meetings to generate enthusiasm and build credibility
  • Offer strategic ideas and new ways to solve problems we couldn’t solve on our own

We can help. We’ve been there ourselves, or we’ve been there with our clients. Just like my family and I found people who had been there and lived to tell about it with Tori, the InterWorks team has been there and lived to tell about it with our clients whom we love. Let us help you, too.

Stay tuned for the fifth and final post in this series!

The post Success in Adoption: Get Help appeared first on InterWorks.

InterWorks

Packaging Book Update: v1.10

I have updated my book “Packaging for Apple Administrators”!

It contains lots of fixes, some new parts and updates with regards to macOS Catalina.

This book is now nearly three years old and if you bought it at the very beginning you have gotten eight updates for free!

(Historic sidenote: v1.1 was just a quick fix to remove some placeholder text, so that was the first version on the iBooks Store.)

If you have already purchased the book, you can go to Apple Books application on your Mac and choose ‘Check for available Downloads…’ from the ‘Store’ menu. In iOS tap on your iCloud account icon next to ‘Reading Now’ and then choose ‘Updates.’

Changes in this version (you can also find this in the book in the ‘Version History’ section):

  • added a note on the spkg command line tool for Suspicious Package
  • updated the list of Considerations for Installation Scripts with regards to packages used in Recovery and zsh
  • updated script code across various scripts to match my updated coding standards
  • added a note on zsh in About this Book
  • changed the sample script in the Payload-Free Packages section to enable Screen Sharing instead of SSH because of changes in macOS Catalina security
  • added information on Notarization to Packages and Gatekeeper
  • added a note on the new Catalina read-only system volume in Testing the Package
  • fixed some mis-spellings and inconsistencies
  • fixed some broken links in Recommended Reading
  • changed to new ‘Apple Books’ nomenclature
  • fixed a dead link in ‘Installation Scripts’

Go get it in the Books store!

Scripting OS X