Starting Line Target命令行基础操作Git用法Anaconda设置环境用Jupyter Notebook运行Python代码如何退出后续工作00-Hello熟悉一下jpynb使用librosa加载一段音频01.How to draw musicTargetRepresentation SheetSymbolicAudioSide Notes02 Audio FeatureGeneralTemporal features
Starting Line
Dec 26, 2020
Target
基于python3.7从0配置环境
命令行基础操作
- history:没有交互界面和鼠标操作时原始操作就是terminal,更直接能增加操作速度
- 题外话:终端在电脑中确实具备控制一切的能力
- 基本指令:
# 初始界面 HOSTNAME: ~ USER $ # 主机名+目录名称(默认~代表根目录)+用户名+等待命令输入符$或者% # 指令示例 $ ls #显示当前路径下所有文件夹 $ cd xxx #change directory进入当前路径下某个文件夹(直接路径/name,相对路径cd ./name) $ pwd #print working directory显示当前路径 $ cd .. #返回上层路径 $ git $ conda $ pip $ clear #假清屏,其实就是把历史记录上移
- 补充解释
- 上方menu里面Shell是什么?
- 窗口title的zsh是什么?
Last login: Sat Dec 26 16:18:34 on ttys000
shell是一个抽象概念,shell的一切操作都在计算机内部,负责处理人机交互,执行脚本等,是操作系统能正常运行的重要组成部分
bash,ash,zsh,tcsh等是shell这个抽象概念的一种具体的实现,都是一个程序,都能生成一个进程对象。
控制台是tty(键盘、显示器都是虚拟teletypewriter)
- 关于终端关联扩展阅读
Git用法
- 什么是git?
版本控制,是一种记录一个或若干文件内容变化,以便将来查阅特定版本修订情况的系统。
“最快发现到底是谁在什么时候删了一行不该删的代码“
- 安装Git:博主给出了安装教程
- 克隆该intro在github上的代码
$ git clone https://github.com/beiciliang/intro2musictech.git
- 除了git指令直接进行版本控制,还可以用github desktop
- 拓展阅读
Anaconda设置环境
- check文件夹里的文件
- 已经有Python,包括系统自带的2.7版本和python3.7,直接输python的话进入的是python 2.7version
- 下载anaconda
- 用anaconda建立虚拟环境,指定环境中的python版本和第三方库(最有利于同时使用不同版本的python)
# 用conda安装环境自动关联包 $ conda install nb_conda # 新建虚拟环境(叫py38),指定环境里python版本,同时把ipykernel(也就是jupyter notebook)这个库安装到py38 $ conda create -n py38 python=3.8 ipykernel # 激活环境 $ source activate py38 # 虚拟环境下用pip安装其他第三方库(也可用conda) $ cd intro2musictech $ pip install requirements.txt # 这里报错 ERROR: Command errored out with exit status 1: IMPORTANT WARNING: pkg-config is not installed. matplotlib may not be able to find some of its dependencies # 用brew下载freetype, libpng, pkg-config以后collecting的时候都没有问题 # 但是之后自动Building wheels for collected packages: librosa, matplotlib, numpy, scikit-learn, scipy, audioread, resampy # 从scikit-learn,scipy上报错了 # 不用txt指令,直接用pip install反而没报错 # 退出虚拟环境 $ source deactivate
- install package过程中需要build wheel
因为package都是以wheel格式文件打包的,下载了
.whl
后需要build用Jupyter Notebook运行Python代码
- 什么是Jupyter Notebook?
- 一种 Web 应用,能让用户将说明文本、数学方程、代码和可视化内容全部组合到一个易于共享的文档中
- 数据处理必备:用途包括数据清理和探索、可视化、机器学习和大数据分析
- 支持markdown来编写
- GitHub 上直接支持 Jupyter notebook 的渲染
- 文字表达化编程:直接在代码旁写出叙述性文档,而不是另外编写单独的文档(focus在解释代码)
- notebook内核不用python运行,可以把任何语言的代码发送给相应的内核
- Jupiter是木星,Jupyter是啥?
- jupyter来自IPython, 是一种交互式 shell,所以扩展名是
.ipynb
- 因为notebook跟编程语言无关(不一定要python),所以被改名成
Jupyter
分别是julia,Python和R
- 在py38环境下运行jupyter
$ jupyter notebook # 直到运行这个命令的时候才在conda list发现根本没下载 # 后续发现是因为刚刚安装完所有东西以后需要重新启动一下 # 浏览器弹出新页面:http://localhost:8888/tree # 运行00-Hello.ipynb的时候在import就开始报错 No module named 'numpy.core._multiarray_umath' # 尝试升级numpy解决 $ pip show numpy $ pip install --upgrade numpy # 发现本来是1.15版本,最新版本1.19
- 关联阅读
- 用jupyter notebook运行google colab(免费GPU)
如何退出
# 两次 ctrl+c # 退出py38虚拟环境 $ source deactivate # 忘记虚拟环境名 $ conda env list # 删除虚拟环境 $ conda env remove -n py38
后续工作
$ cd intro2musictech $ git pull $ source activate py38 (py38)$ jupyter notebook
00-Hello
Dec 26, 2020
熟悉一下jpynb
# 回到terminal里下载extentions $ conda install -c conda-forge jupyter_nbextensions_configurator # 下载以后发现不显示extension $ conda install -c conda-forge jupyter_contrib_nbextensions $ jupyter nbextension enable # 还是不显示,中途用conda remove把上面两个都删了,把terminal退出了,从base进jupyter notebook,就莫名其妙可以了,这次进去才有conda/extension两个界面
使用librosa加载一段音频
无论是Python本身内置模块,还是之前我们通过
pip
安装好的第三方模块(例如librosa
)只要安装完毕,这些模块就可以立刻通过
import
来调用由于我们需要通过
matplotlib.pyplot
显示图像,用IPython.display
播放音频还将用到
librosa
里的方法来加载一段音频,以及librosa.display
画波形以上都需要被调用:
%matplotlib inline # 使图像内嵌显示在notebook里,而非弹出一个新的窗口 import matplotlib.pyplot as plt import IPython.display as ipd import librosa, librosa.display # input x, sr = librosa.load('attachment/cat-meow.mp3') # output audio ipd.Audio(x, rate = sr) # 画出一个尺寸为长15高5的图像 plt.figure(figuresize = (15,5)) # 在图里放波形图 librosa.display.waveplot(x, sr, alpha = 0.8)
通过
librosa
模块里的load
方法,我们可以加载指定路径下的音频,如attachment
文件夹内的cat-meow.mp3
。这个地方的文件名写的是相对地址,因为这ipynb文件就在intro2的文件夹里,和attachment有同一个根目录。该方法会返回两个变量:
x
代表音频数据本身
sr
代表以何种采样率得到的数据x
01.How to draw music
Dec 26, 2020
Target
- What are the fundamental tasks drawn from score sheets?
- What is MIDI? And why is it a benchmark?
- Can you transform music to time-freq spectrum?
Representation
- Quick question: how do you feel music?
record, reproduce, and recreate.
- sounds familiar as a music lover?
performer:→ staff sheet
Technologist:MIDI → symbolic
Data Scientist:WAV, MP3→ audio file
# load example, sr = sample rate audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr = None) # output audio ipd.Audio(audio_data, rate = sr)
Sheet
- When you are trying to figure out if it’s C major, you are transcribing. For human, transcription is through ears and mind. For computers, transcription is:
- onset detection: when will the next note appear?
- pitch/F0 estimation: what’s the pitch of the new note?
- beat tracking: that 4/4 ryhthm
- chord progression
- key detection: C major
- optical music recognition: from sheet music to note information
And we’re working on them so hard for... ACCURACY!
- 👆besides physical info, there are higher-level info
- audio feature extraction
what makes music different: style, emotions
Symbolic
- history
- 最原始的symbolic representation是19世纪末自动演奏钢琴player piano用到的piano roll(纸卷上打孔位置记录音符
- 至今,piano roll的格式变成了计算机的乐谱,其电子化格式包含MusicXML和MIDI
- 在MusicXML文件中一个middle C需要用50行左右代码表示(很繁琐)→分析MusicXML使用music21第三方python库
- MIDI从80年代开始成为symbolic的主流
MIDI
Musical Instrument Digital Interface
将示例音乐片段对应的MIDI文件“打印”出来首先要安装
$ pip install pretty_midi
MIDI格式的本质是遵循着一个标准技术规格(MIDI 1.0)将所有音乐中涉及的元素编码成数字数据的结构。该标准也声明了硬件和软件之间传输MIDI的协议,方便MIDI在各种合成器(synthesizer)和数字音乐工作站(DAW)之间被广泛使用。
import mido midi_data = mido.MidiFile(filename = 'attachment/mir01-midi.mid') for i, track in enumerate(midi_data.tracks): print('Track {}: {}'.format(i, track.name)) for msg in track: print(msg)
- visualization
units in MIDI are ticks based on TPB(ticks per beat) and BPM
- Why MIDI is the ground truth? because it’s more precise than manual notations
Audio
- WAV VS. MP3
- mp3 is the lower-quality (compressed) version of wav
- properties of audio files
sample rate
- detour:
Nyquist–Shannon sampling theorem
- reconstruct continuous signals in discrete signals with bandwidth limits (with no loss of information)
- 44100Hz sample rate → represent frequencies in (0, 22050) Hz
- human perceptible frequencies in range (20, 20000) Hz
bit depth
: smallest units in amplitude- 16 bit → 0~2^16 amplitude → 6*16 dB → -96~0 dB signal dynamic range
- Why bother? From sound to data and from data to audio
- wav in computers: a looooong vector
audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr=None) print("音频样本数:{}".format(len(audio_data)))
- detour:
Flourie Transformation
STFT
: the sliding window version of DFT/FFT- What if we use DFT straight? this has nothing to do with time, not understandable
- short-time Fourier Transform with hop length and window length
So... you can get a visual representation, which you saw on DVD a lot when you are kids, that tells you in each unit time, which frequencies ( in Hz) appear.
Any periodic signals can be reconstructed by a group of sine waves.
import numpy as np # 加载音频 audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr=None) # 对音频数据做STFT,使用窗长为n_fft即2048/44100=46ms # 对该窗下的音频数据做FFT则返回1+n_fft/2个频点上的内容 # 将当前窗往下移动hop_length即512/44100=11.6ms再做FFT # 重复操作以上内容实现滑动板FFT D = librosa.stft(audio_data, n_fft=2048, hop_length=512) print(audio_data) print(sr) # 得到的频谱包含振幅和相位两块信息 # 后续内容将更依赖振幅信息 magnitude, phase = librosa.magphase(D) plt.figure(figsize=(15,10)) # 画出音频数据在时域上的波形图 ax1 = plt.subplot(2,1,1) librosa.display.waveshow(audio_data, sr, alpha=0.8) plt.title('audio data') # 画出STFT后的功率谱(以dB为单位) ax2 = plt.subplot(2,1,2, sharex=ax1) librosa.display.specshow(librosa.amplitude_to_db(magnitude, ref=np.max), sr=sr, y_axis='linear', x_axis='time') plt.title('STFT spectrogram')
- Different spectrum for human ear perceptibility
- STFT: y axis as frequency bin, gap as resolution rate = sample rate / n_fft
- MEL: STFT is linear, which is counterintuitive, human ears can’t capture the difference of sound by multiplication of frequencies, mel scale works for non-linear cepstrum (log frequencies, then cosine them)
- CQT: more intuitive to map frequencies to pitches (how they are mapped is a whole other story....)
Side Notes
- why librosa?
LabROSA(Recognition and Organization of Speech and Audio)
- From roll image scan to MIDI - Stanford CCRMA
02 Audio Feature
General
- properties
- dynamics: audio features are time series of each fixed meaning properties, values vary through time. Dynamics are presented through mean and variance.
- cover range: instantaneous feature or global features
- abstraction: from physical features to understanding level of music
- Categories
- directly from audio waveform
Temporal Feature
- ADSR
- zero-crossing rate
- AR coefficient
Energy feature
- RMS energy
- After DFT/FFT
Spectral feature
- centroid
- skewness
- kurtosis
- MFCC
- After models
- sound source separation
- sinusoidal harmonic model -
Harmonic feature
- music noise ratio
- inspired by ears
Perceptral feature
- loundness
- Before extracting features
- RMSE energy
# 加载音频 audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr=None) # 调用librosa中的rmse直接对音频每个长度为2048的帧进行计算得到均方根能量 rmse = librosa.feature.rms(audio_data, frame_length=2048, hop_length=512) # 画出音频波形及均方根能量随时间的变化 plt.figure(figsize=(10,5)) librosa.display.waveshow(audio_data, sr, alpha=0.8) times = librosa.frames_to_time(np.arange(len(rmse.T)), sr=sr, hop_length=512) plt.plot(times, rmse.T, label='RMS Energy') plt.legend(loc='best')
- frame as a unit:
1 frame = hop lenght / sample rate (seconds)
- D[f, t] = complex number (magnitude + phase * i)
- magnitude =
numpy.abs(D)
- phase =
numpy.angle(D)
audio data Vector → STFT matrix [frequency bin, time]
- don’t mix this with MEL Freq, which is taken log only on frequency
Temporal features
ADSR
envelope - in energy time series (a great tool for distinguishing timbre!)
ZCR
zero-crossing rate- sample rate revisit
the average number of samples obtained in one second