Intro 2 MIR

beiciliang/intro2musictech

公众号"无痛入门音乐科技"开源代码，欢迎微信扫码关注或直接搜索intro2musictech！ ♬ 以后有新的公众号文章发布，其ipynb格式会在GitHub这里同步更新 ♬ ♫ 方便读者在自己的电脑上一边阅读一边执行代码，快速入门，无痛skr ♫ 以下内容适用于编程零基础的读者，如果你已经清楚如何 git clone本项目，并能在一个基于Python 3的虚拟环境内安装Jupyter Notebook以及 requirements.txt中的第三方库之后，不报错地加载 00-Hello.ipynb 并运行其中代码，恭喜你，编程环境配置成功！在计算机还没有酷炫的交互界面甚至连鼠标都没有的年代，人们通过命令行来操作程序，如果你学会了在命令行下如何操作，表面上能看起来像个黑客，实际上能大大加快操作速度。假如你是MacOS或Linux用户，博主希望你懂得如何使用终端(Terminal)；假如你是Windows用户，则希望你懂得如何使用命令窗口(Command Prompt)或PowerShell。以下内容以在MacOS上操作为例！ ✎ 打开命令行界面后，应该会看到一个白色或黑色的窗口，正等待着你的命令： HOSTNAME:~ USER$ ✎ 首先 ls 表示列出当前路径下的所有文件： HOSTNAME:~ USER$ ls 输入后回车，窗口中会返回所有文件和文件夹的名字。 ✎ 假设一个叫 Downloads的文件夹在上述返回的名字列表中，进入这个文件夹需要 cd ： HOSTNAME:~ USER$ cd Downloads ✎ 回车后"当前路径"已经由~变成这个文件夹的位置，输入 pwd 可以再确认： HOSTNAME:Downloads USER$ pwd 回车后会显示当前路径为 /Users/USER/Downloads ✎ 如果需要返回上一级目录，依然可以使用 cd ： HOSTNAME:Downloads USER$ cd ..

https://github.com/beiciliang/intro2musictech

Starting Line Target 命令行基础操作 Git用法 Anaconda设置环境用Jupyter Notebook运行Python代码如何退出后续工作 00-Hello 熟悉一下jpynb 使用librosa加载一段音频 01.How to draw music Target Representation Sheet Symbolic Audio Side Notes 02 Audio Feature General Temporal features

Starting Line

Dec 26, 2020

「README」无痛入门音乐科技门槛须知

音乐科技泛指当代计算机科学与硬件技术在音乐艺术上的开发与应用。本公众号将在后续文章中，重点介绍其在音乐信息检索与新型乐器/音乐交互上的入门知识。 ♬ 博主强烈建议读者能大致了解本文涉及的所有概念，确保无痛跨个门槛先 ♬ 无论是哪一种音乐的表征形式，都会尽量涵盖音乐中的所有元素。比较直观的当属五线谱，随便拿出拉赫玛尼诺夫前奏曲开头的两小节看一下： ...

https://mp.weixin.qq.com/s/S8Q5iSUMgKZQ5g-17dF8UA

「SETUP」从零设置编程环境

本文适用于编程零基础的读者，如果你已经清楚如何本项目，并能在一个基于Python 3.7的虚拟环境内安装 requirements.txt 后，不报错地加载本文 JupyterNotebook 部分中的 00-Hello.ipynb ，恭喜你，编程环境配置成功！其ipynb格式会在GitHub上同步更新方便读者在自己的电脑上一边阅读一边执行代码 ☞ 点击最下方阅读原文即下方链接进入本文GitHub页获取相关文件: ➥ https://github.com/beiciliang/intro2musictech/blob/master/README.md 在计算机还没有酷炫的交互界面甚至连鼠标都没有的年代，人们通过命令行来操作程序，如果你学会了在命令行下如何操作，表面上能看起来像个黑客，实际上能大大加快操作速度。假如你是MacOS或Linux用户，博主希望你懂得如何使用终端(Terminal)；假如你是Windows用户，则希望你懂得如何使用命令窗口(Command Prompt)或PowerShell。以下内容以在MacOS上操作为例！ ✎ 打开终端界面后，应该会看到一个白色或黑色的窗口，正等待着你的命令： ✎ 首先表示列出当前路径下的所有文件：输入后回车，窗口中会返回所有文件和文件夹的名字。回车后会显示当前路径为 ✎ 如果需要返回上一级目录，依然可以使用：这些就是最最基本的命令行了！下面的部分会继续讲解其他命令行的用法，需要时刻注意路径是否正确，指令之间是否有空格分隔等等。 Git即版本控制，是一种记录一个或若干文件内容变化，以便将来查阅特定版本修订情况的系统。程序员们用它才能最快发现到底是谁在什么时候删了一行不该删的代码。 ✎ 首先需要将Git安装在你的计算机上，安装指南链接如下： ➥ https://git-scm.com/book/zh/v2/%E8%B5%B7%E6%AD%A5-%E5%AE%89%E8%A3%85-Git ✎ 其次，你需要一个GitHub账号，这也能方便将来对自己代码的托管。 ✎ 之后你可以通过Git命令行，将公众号的代码克隆到你的计算机上：因为公众号的代码会随新文章的发布而增加更多内容，博主建议读者发现有新文章发布后通过来同步更新。 ☞ 如果实在觉得各种git指令太晦涩，可通过GitHub Desktop软件进行版本控制。 ☞ 如果想更深入了解git和GitHub，英文好的读者可直接参考官方帮助文档，中文资料可参考： ➥ https://github.com/xirong/my-git/blob/master/how-to-use-github.md 其中 attachment 文件夹里包含一些音乐素材和图片，以为后缀的文件都是Jupyter Notebook，重点是 requirements.txt 中所有的Python库，成功安装这些才能确保今后所有中的Python代码能跑起来。不过首先需要解决的大事儿是，如何安装Python？我们可借助Anaconda，即一个预装了很多我们用的到或用不到的第三方库的Python。 ✎ 首先去官网根据自己系统的型号，下载 Anaconda3 并安装，但是国内的同学也许会发现官网下载太慢，这种情况可从国内清华大学开源软件镜像站进行下载并配置镜像: ☞ 官方下载链接 ➥ https://www.anaconda.com/download/ ☞ 清华大学镜像站 ➥ https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/ ✎ 安装过程中，建议勾选把Anaconda加入环境变量，并设置Anaconda所带的Python 3.6为系统默认的Python版本，不过这些步骤不做问题也不大，反正之后我们要建立一个新的Python 3.7虚拟环境！另，建议国内读者通过镜像成功安装Anaconda之后，修改其包管理镜像为国内源，即运行下方两个命令行： ✎ 安装Anaconda后，用它建立虚拟环境可以指定要安装在环境中的Python版本和第三方库，避免在自己的电脑上瞎安装/起冲突/费时间，这对需要同时用Python版本2和3的程序员非常有必要！ ✎ 下面的命令行将一步步带领读者走向成功的一半！ B.

https://mp.weixin.qq.com/s/ngvmPl5S7QI-PqUUBtbQ3w

Target

基于python3.7从0配置环境

命令行基础操作

history：没有交互界面和鼠标操作时原始操作就是terminal，更直接能增加操作速度

题外话：终端在电脑中确实具备控制一切的能力

基本指令：


# 初始界面
HOSTNAME: ~ USER $ 
# 主机名+目录名称（默认～代表根目录）+用户名+等待命令输入符$或者%

# 指令示例
$ ls #显示当前路径下所有文件夹
$ cd xxx #change directory进入当前路径下某个文件夹(直接路径/name,相对路径cd ./name)
$ pwd #print working directory显示当前路径
$ cd .. #返回上层路径
$ git
$ conda
$ pip
$ clear #假清屏，其实就是把历史记录上移

补充解释

上方menu里面Shell是什么？

shell是一个抽象概念，shell的一切操作都在计算机内部，负责处理人机交互，执行脚本等，是操作系统能正常运行的重要组成部分

窗口title的zsh是什么？

bash，ash，zsh，tcsh等是shell这个抽象概念的一种具体的实现，都是一个程序，都能生成一个进程对象。

Last login: Sat Dec 26 16:18:34 on ttys000

控制台是tty（键盘、显示器都是虚拟teletypewriter)

关于终端关联扩展阅读

玩转 Terminal 终端：入门指南及进阶技巧 - 少数派

这是句玩笑话，但是你可能或多或少从电影中见过某个电脑大神或者专业黑客，在一块黑色的窗口前随便敲一点代码，某个问题就迎刃而解了。虽然实际情况不会这么容易，但是不可否认，终端在电脑中确实具备控制一切的能力。你可能会觉得终端太过深奥，其实不然。终端可难可易，有些时候你只需要知道一些简单的用法，就可以自己上手解决一些日常问题了。本文我会首先带你零基础认识终端，了解其常见用写法，告诉你新手入门哪些地方可以碰，那些地方是雷区。接着会实践一些实用命令，如关闭报错提示，显示隐藏文件夹，修改截图类型，整理应用程序栏等。最后我会分析一下这些实用命令的写法，如何举一反三，并谈几个涉及转换文稿格式，快速安装软件和硬盘扩容等操作的进阶命令。我们总在说在终端中如何操作，那么终端到底是什么呢？为什么它会有这么大的权利？要说清终端是什么，我们先来看看操作系统的组成。简化来说，操作系统分为两个部分，一部分称作内核，另一部分成为用户交互界面。内核部分负责系统的全部逻辑操作，由海量命令组成，这一部分是系统运行的命脉，不与用户接触；交互界面则是开机之后所有我们所看到的东西，比如窗口，软件，应用程序等等。那么我们若我们想对系统内核的某些操作逻辑做出一些修改，应该怎么办呢？终端就是连接内核与交互界面的这座桥，它允许用户在交互界面上打开一个叫做「Terminal 终端」的应用程序，在其中输入命令，系统会直接给出反馈。因为终端这座桥，实际允许用户间接控制系统内核，也就是系统的大脑，因此它理论上具备控制一切的权利。终端是系统中一个应用程序，你可以直接在所有程序中找到它，点击打开就行。我比较喜欢使用聚焦搜索，输入「终端」或「Terminal」，看到终端被选中了按下回车即可。本节会从零开始，一步步讲解基础知识。若你只想查看实际命令，可以完全跳过这一部分。终端启动后，就会进入一个问你要指令的状态，你只需要将指令输入在光标后，按下键盘回车，指令就会被执行。那么什么是命令？命令就是你告诉电脑希望它做什么的那句话。若我现在希望告诉电脑说 hi，这句话的命令就是 say hi ，就这么简单。命令由三个部分组成，第一个部分是命令对象，在 say hi 这个命令中，「say」是我们的命令对象，我们希望电脑说话；第二个部分是修饰命令对象的关键词，可有可无，若我希望电脑说话时慢一点，可以输入 say -r 500 hi ，这里的「-r 500」则是修饰说话语速的关键词；第三部分是命令内容，这里填写希望电脑说的内容是「hi」这句话。我们先来尝试让电脑将我们所打的话复述出来，这一步的指令是 echo "想说的话" ，将这句指令复制进终端并回车后，可以看到，它会将我们输入进去的文字在下一行重新打出来。终端运行常常离不开对文件，文件夹的操作。当你需要使用终端对文件夹进行操作时，终端需要你告诉它，你想要修改的文件在哪里，这时我们需要了解路径的概念。系统中的每一个文件都有一个存放位置，这一存放位置就称作路径。终端启动后，它的默认路径在当前用户文件夹的根目录上，为了确定这一点，你可以输入 pwd 命令来查看当前路径。在下图中，输入命令后，终端告诉我当前路径是 /Users/我的名字。路径的一般写法为 /文件夹名/文件夹名，但路径其实分为两种。一种叫相对路径，另一种叫绝对路径，它们两有什么区别呢？我们从当前路径说起，当前路径指的是现在终端所处的位置，若你想改变当前路径，则可以输入 cd /其他文件夹。比如我希望系统将当前路径改为所有应用程序文件夹，则输入 cd /Applications ，在下图中可以看到，Legolas 前出现了一个 Applications，表示当前路径已经在应用程序文件夹中。刚刚提到的路径的一般写法「/文件夹名/文件夹名」，它指的其实就是绝对路径，你必须指定它从根目录一直到达具体的文件夹。与其相对的是相对路径，相对路径允许你告诉终端从现在开始，接下来应该怎么走。相对路径的书写方法实在绝对路径前加一个 .

https://sspai.com/post/45534

Git用法

什么是git？

Learn Git Branching

An interactive Git visualization tool to educate and challenge!

https://learngitbranching.js.org/

版本控制，是一种记录一个或若干文件内容变化，以便将来查阅特定版本修订情况的系统。

“最快发现到底是谁在什么时候删了一行不该删的代码“

安装Git：博主给出了安装教程

Git - 安装 Git

Git is a member of Software Freedom Conservancy, which handles legal and financial needs for the project. Conservancy is currently raising funds to continue their mission. Consider becoming a supporter!

https://git-scm.com/book/zh/v2/%E8%B5%B7%E6%AD%A5-%E5%AE%89%E8%A3%85-Git

克隆该intro在github上的代码

$ git clone https://github.com/beiciliang/intro2musictech.git

除了git指令直接进行版本控制，还可以用github desktop

拓展阅读

xirong/my-git

作为一名开发者，GitHub上面有很多东西值得关注学习，可是刚刚接触GitHub，怎样一步步学习使用GitHub？怎样更高效的利用GitHub？在这里搜集整理网络上面的资料，汇总成这么一篇repo 《GitHub使用指南》，供大家一起学习。更多关于 GitHub 的内容请查看： GitHubHelp 查找需要的信息。原文地址： ...

https://github.com/xirong/my-git/blob/master/how-to-use-github.md

Anaconda设置环境

check文件夹里的文件

attachment是素材，ipynb是jupyter notebook，requirements里的python库是以后要用到的所有packages

已经有Python，包括系统自带的2.7版本和python3.7，直接输python的话进入的是python 2.7version

下载anaconda

graphical是指下载pkg navigator，command line是用terminal下载

用anaconda建立虚拟环境，指定环境中的python版本和第三方库（最有利于同时使用不同版本的python）


# 用conda安装环境自动关联包
$ conda install nb_conda

# 新建虚拟环境（叫py38），指定环境里python版本，同时把ipykernel（也就是jupyter notebook）这个库安装到py38
$ conda create -n py38 python=3.8 ipykernel

# 激活环境
$ source activate py38

# 虚拟环境下用pip安装其他第三方库（也可用conda）
$ cd intro2musictech
$ pip install requirements.txt

# 这里报错
ERROR: Command errored out with exit status 1:
IMPORTANT WARNING:
        pkg-config is not installed.
        matplotlib may not be able to find some of its dependencies
# 用brew下载freetype, libpng, pkg-config以后collecting的时候都没有问题
# 但是之后自动Building wheels for collected packages: librosa, matplotlib, numpy, scikit-learn, scipy, audioread, resampy
# 从scikit-learn，scipy上报错了
# 不用txt指令，直接用pip install反而没报错

# 退出虚拟环境
$ source deactivate

注意除非当前路径与该txt文件相同，否则要在指令中声明txt所在的具体路径。不然就像图中这样报错，最后终于对了

install package过程中需要build wheel

因为package都是以wheel格式文件打包的，下载了.whl后需要build

用Jupyter Notebook运行Python代码

什么是Jupyter Notebook？

一种 Web 应用，能让用户将说明文本、数学方程、代码和可视化内容全部组合到一个易于共享的文档中
数据处理必备：用途包括数据清理和探索、可视化、机器学习和大数据分析
支持markdown来编写
GitHub 上直接支持 Jupyter notebook 的渲染
文字表达化编程：直接在代码旁写出叙述性文档，而不是另外编写单独的文档（focus在解释代码）
notebook内核不用python运行，可以把任何语言的代码发送给相应的内核

Jupiter是木星，Jupyter是啥？

jupyter来自IPython，是一种交互式 shell，所以扩展名是.ipynb
因为notebook跟编程语言无关（不一定要python），所以被改名成Jupyter 分别是julia，Python和R

在py38环境下运行jupyter



$ jupyter notebook
# 直到运行这个命令的时候才在conda list发现根本没下载
# 后续发现是因为刚刚安装完所有东西以后需要重新启动一下

# 浏览器弹出新页面：http://localhost:8888/tree

# 运行00-Hello.ipynb的时候在import就开始报错
No module named 'numpy.core._multiarray_umath'
# 尝试升级numpy解决
$ pip show numpy
$ pip install --upgrade numpy
# 发现本来是1.15版本，最新版本1.19

关联阅读

Jupyter notebook 是什么？

Jupyter notebook。 Jupyter notebook 是一种 Web 应用，能让用户将说明文本、数学方程、代码和可视化内容全部组合到一个易于共享的文档中。 Jupyter Notebook 已迅速成为处理数据的必备工具。其用途包括数据清理和探索、可视化、机器学习和大数据分析。我为我的个人博客创建了一个 notebook 示例，它展示了 notebook 的许多特点。这项工作通常在终端中完成，也即使用普通的 Python shell 或 IPython 完成。可视化在单独的窗口中进行，而文字资料以及各种函数和类脚本包含在独立的文档中。但是，notebook 能将这一切集中到一处，让用户一目了然。 GitHub 上也直接支持 Jupyter notebook 的渲染。借助此出色的功能，你可以轻松地共享工作。 http:// nbviewer.jupyter.org/ 也会提供 GitHub 代码库中的 notebook ，以及存储在其他地方的 notebook。 notebook 是 Donald Knuth 在 1984 年提出的文字表达化编程的一种形式。在文字表达化编程中，直接在代码旁写出叙述性文档，而不是另外编写单独的文档。用 Donald Knuth

https://zhuanlan.zhihu.com/p/49963699

Eve

Eve is a programming language and IDE based on years of research into building a human-first programming platform. From code embedded in documents to a language without order, it presents an alternative take on what programming could be - one that focuses on us instead of the machine.

http://witheve.com/

如何优雅地使用 Jupyter？

除了基础的写文档之外，其实Jupyter还有N多功能，简直是一个集视频、图片、PPT、多种交互于一身的万花筒。如果不会用，你可能错过了Jupyter 99%的功能。 Medium上走向数据科学（Towards Data Science）专栏的作者Parul Pandey就总结了七大Jupyter的进阶用法，量子位编译如下~ Shell是一种与计算机进行文本交互的方式。一般来讲，当你正在使用Python编译器，需要用到命令行工具的时候，要在shell和IDLE之间进行切换。但是，如果你用的是Jupyter，就完全不用这么麻烦了，你可以直接在命令之前放一个"!"，就能执行shell命令，完全不用切换来切换去，就能在IPython里执行任何命令行。 In [1]: !ls example.jpeg list tmp In [2]: !pwd /home/Parul/Desktop/Hello World Folder' In [3]: !echo "Hello World" Hello World 我们甚至可以将值传递给shell，像下面这样： In [4]: files= !ls In [5]: print(files) ['example.jpeg', 'list', 'tmp'] In [6]: directory = !pwd In [7]: print(directory) ['/Users/Parul/Desktop/Hello World

https://www.zhihu.com/question/59392251

用jupyter notebook运行google colab（免费GPU）

比Jupyter还好用的Google数据分析神器Colab

使用过Jupyter（参看《「极客时间」带来的社区价值思考》章节：社区交流的基建设施 ...

https://www.cnblogs.com/kid551/p/8544908.html

如何退出


# 两次 ctrl+c
# 退出py38虚拟环境
$ source deactivate 

# 忘记虚拟环境名
$ conda env list

# 删除虚拟环境
$ conda env remove -n py38

后续工作


$ cd intro2musictech

$ git pull

$ source activate py38

(py38)$ jupyter notebook

00-Hello

Dec 26, 2020

熟悉一下jpynb


# 回到terminal里下载extentions
$ conda install -c conda-forge jupyter_nbextensions_configurator

# 下载以后发现不显示extension
$ conda install -c conda-forge jupyter_contrib_nbextensions
$ jupyter nbextension enable

# 还是不显示，中途用conda remove把上面两个都删了，把terminal退出了，从base进jupyter notebook，就莫名其妙可以了，这次进去才有conda/extension两个界面

使用librosa加载一段音频

无论是Python本身内置模块，还是之前我们通过pip安装好的第三方模块（例如librosa）

只要安装完毕，这些模块就可以立刻通过import来调用

由于我们需要通过matplotlib.pyplot显示图像，用IPython.display播放音频

还将用到librosa里的方法来加载一段音频，以及librosa.display画波形

以上都需要被调用：


%matplotlib inline # 使图像内嵌显示在notebook里，而非弹出一个新的窗口

import matplotlib.pyplot as plt
import IPython.display as ipd
import librosa, librosa.display

# input
x, sr = librosa.load('attachment/cat-meow.mp3')

# output audio
ipd.Audio(x, rate = sr)

# 画出一个尺寸为长15高5的图像
plt.figure(figuresize = (15,5))

# 在图里放波形图
librosa.display.waveplot(x, sr, alpha = 0.8)

通过librosa模块里的load方法，我们可以加载指定路径下的音频，如attachment文件夹内的cat-meow.mp3 。这个地方的文件名写的是相对地址，因为这ipynb文件就在intro2的文件夹里，和attachment有同一个根目录。

该方法会返回两个变量：

x代表音频数据本身

sr代表以何种采样率得到的数据x

这里会有warning但是不需要解决

01.How to draw music

Dec 26, 2020

Target

What are the fundamental tasks drawn from score sheets?

What is MIDI? And why is it a benchmark?

Can you transform music to time-freq spectrum?

Representation

Quick question: how do you feel music?

record, reproduce, and recreate.

sounds familiar as a music lover?

performer：→ staff sheet

Technologist：MIDI → symbolic

Data Scientist：WAV, MP3→ audio file


# load example, sr = sample rate
audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr = None)

# output audio
ipd.Audio(audio_data, rate = sr)

Sheet

When you are trying to figure out if it’s C major, you are transcribing. For human, transcription is through ears and mind. For computers, transcription is:

onset detection: when will the next note appear?
pitch/F0 estimation: what’s the pitch of the new note?
beat tracking: that 4/4 ryhthm
chord progression
key detection: C major
optical music recognition: from sheet music to note information

And we’re working on them so hard for... ACCURACY!

👆besides physical info, there are higher-level info

audio feature extraction

what makes music different: style, emotions

Symbolic

history

最原始的symbolic representation是19世纪末自动演奏钢琴player piano用到的piano roll（纸卷上打孔位置记录音符
至今，piano roll的格式变成了计算机的乐谱，其电子化格式包含MusicXML和MIDI
在MusicXML文件中一个middle C需要用50行左右代码表示（很繁琐）→分析MusicXML使用music21第三方python库
MIDI从80年代开始成为symbolic的主流

MIDIMusical Instrument Digital Interface

将示例音乐片段对应的MIDI文件“打印”出来首先要安装 $ pip install pretty_midi

MIDI格式的本质是遵循着一个标准技术规格（MIDI 1.0）将所有音乐中涉及的元素编码成数字数据的结构。该标准也声明了硬件和软件之间传输MIDI的协议，方便MIDI在各种合成器（synthesizer）和数字音乐工作站（DAW）之间被广泛使用。


import mido
midi_data = mido.MidiFile(filename = 'attachment/mir01-midi.mid')

for i, track in enumerate(midi_data.tracks):
	print('Track {}: {}'.format(i, track.name))
	for msg in track:
		print(msg)

visualization

Note: each note in midi port message is the start point and end point for the single note or a chord

units in MIDI are ticks based on TPB(ticks per beat) and BPM

Why MIDI is the ground truth? because it’s more precise than manual notations

Audio

WAV VS. MP3

mp3 is the lower-quality (compressed) version of wav

properties of audio files

sample rate

detour: Nyquist–Shannon sampling theorem

Nyquist-Shannon sampling theorem - Wikipedia

The Nyquist-Shannon sampling theorem is a theorem in the field of signal processing which serves as a fundamental bridge between continuous-time signals and discrete-time signals. It establishes a sufficient condition for a sample rate that permits a discrete sequence of samples to capture all the information from a continuous-time signal of finite bandwidth.

https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

reconstruct continuous signals in discrete signals with bandwidth limits (with no loss of information)

44100Hz sample rate → represent frequencies in (0, 22050) Hz
human perceptible frequencies in range (20, 20000) Hz

bit depth: smallest units in amplitude

16 bit → 0~2^16 amplitude → 6*16 dB → -96~0 dB signal dynamic range

Why bother? From sound to data and from data to audio

wav in computers: a looooong vector


audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr=None)
print("音频样本数：{}".format(len(audio_data)))

The task: transform audio data to time-frequency spectrum

💡

So... you can get a visual representation, which you saw on DVD a lot when you are kids, that tells you in each unit time, which frequencies ( in Hz) appear.

detour: Flourie Transformation

Any periodic signals can be reconstructed by a group of sine waves.

STFT: the sliding window version of DFT/FFT

librosa.stft - librosa 0.9.1 documentation

librosa. stft ( y , * , , , , , , , ) [source] Short-time Fourier transform (STFT). The STFT represents a signal in the time-frequency domain by computing discrete Fourier transforms (DFT) over short overlapping windows.

https://librosa.org/doc/latest/generated/librosa.stft.html?highlight=dft

What if we use DFT straight? this has nothing to do with time, not understandable
short-time Fourier Transform with hop length and window length


import numpy as np
# 加载音频
audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr=None)
# 对音频数据做STFT，使用窗长为n_fft即2048/44100=46ms
# 对该窗下的音频数据做FFT则返回1+n_fft/2个频点上的内容
# 将当前窗往下移动hop_length即512/44100=11.6ms再做FFT
# 重复操作以上内容实现滑动板FFT
D = librosa.stft(audio_data, n_fft=2048, hop_length=512)
print(audio_data)
print(sr)
# 得到的频谱包含振幅和相位两块信息
# 后续内容将更依赖振幅信息
magnitude, phase = librosa.magphase(D)

plt.figure(figsize=(15,10))

# 画出音频数据在时域上的波形图
ax1 = plt.subplot(2,1,1)
librosa.display.waveshow(audio_data, sr, alpha=0.8)
plt.title('audio data')

# 画出STFT后的功率谱（以dB为单位）
ax2 = plt.subplot(2,1,2, sharex=ax1)
librosa.display.specshow(librosa.amplitude_to_db(magnitude, ref=np.max), sr=sr, y_axis='linear', x_axis='time')
plt.title('STFT spectrogram')

Different spectrum for human ear perceptibility

STFT: y axis as frequency bin, gap as resolution rate = sample rate / n_fft
MEL: STFT is linear, which is counterintuitive, human ears can’t capture the difference of sound by multiplication of frequencies, mel scale works for non-linear cepstrum (log frequencies, then cosine them)

Mel-frequency cepstrum - Wikipedia

In sound processing, the mel-frequency cepstrum ( MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients ( MFCCs) are coefficients that collectively make up an MFC.

https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

CQT: more intuitive to map frequencies to pitches (how they are mapped is a whole other story....)

Side Notes

why librosa？

LabROSA(Recognition and Organization of Speech and Audio)

librosa - librosa 0.8.0 documentation

For a quick introduction to using librosa, please refer to the . For a more advanced introduction which describes the package design principles, please refer to the librosa paper at SciPy 2015.

https://librosa.org/doc/latest/index.html

From roll image scan to MIDI - Stanford CCRMA

Reproducing Piano rolls

Reproducing piano rolls are among the early music storage mediums, preserving fine details of a piano or organ performance on a continuous roll of paper with holes punched onto them. However, due to limited availability of well maintained playback instruments and the condition of fragile paper, rolls have remained elusive and generally inaccessible for study.

https://ccrma.stanford.edu/~kittyshi/pianoroll/pianoroll.html

02 Audio Feature

General

properties

dynamics: audio features are time series of each fixed meaning properties, values vary through time. Dynamics are presented through mean and variance.
cover range: instantaneous feature or global features
abstraction: from physical features to understanding level of music

Categories

directly from audio waveform

Temporal Feature

ADSR
zero-crossing rate
AR coefficient

Energy feature

RMS energy

After DFT/FFT

Spectral feature

centroid
skewness
kurtosis
MFCC

After models

sound source separation
sinusoidal harmonic model - Harmonic feature

music noise ratio

inspired by ears

Perceptral feature

loundness

Before extracting features

RMSE energy



# 加载音频
audio_data, sr = librosa.load('attachment/mir01-music-example.wav', sr=None)
# 调用librosa中的rmse直接对音频每个长度为2048的帧进行计算得到均方根能量
rmse = librosa.feature.rms(audio_data, frame_length=2048, hop_length=512)
# 画出音频波形及均方根能量随时间的变化
plt.figure(figsize=(10,5))
librosa.display.waveshow(audio_data, sr, alpha=0.8)
times = librosa.frames_to_time(np.arange(len(rmse.T)), sr=sr, hop_length=512)
plt.plot(times, rmse.T, label='RMS Energy')
plt.legend(loc='best')

From temporal to sprectrum

audio data Vector → STFT matrix [frequency bin, time]

frame as a unit: 1 frame = hop lenght / sample rate (seconds)
D[f, t] = complex number (magnitude + phase * i)
magnitude = numpy.abs(D)
phase = numpy.angle(D)

Power sprectrogram → magnitude(D) ^2
Log-sprectrogram → dB(log(magnitude(D)))

don’t mix this with MEL Freq, which is taken log only on frequency

Temporal features

ADSR envelope - in energy time series (a great tool for distinguishing timbre!)

hammered string and percussion have very short A, no D nor S, long R

ZCR zero-crossing rate

sample rate revisit

the average number of samples obtained in one second

Sampling (signal processing) - Wikipedia

In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or space; this definition differs from the usage in statistics, which refers to a set of such values.

https://en.wikipedia.org/wiki/Sampling_(signal_processing)#:~:text=the%20average%20number%20of%20samples%20obtained%20in%20one%20second