Sivan D.
Sivan D.

Sivan D.

About Me
About Me
Hi, welcome to my personal website! I'm Siwen (Sivan), a day n night geek in NYC.
💙 In the day, I'm a first-year Computer Science Ph.D. student @Music and Audio Research Lab (MARL), New York University.
🎸 In the night, I'm a music experimenter. I play electric guitar in some tiny band and make some tiny noise with bass, keys, and synths.
🏄🏻 The rest of the time I love surfing (I love Lisbon!), meditation, skateboarding, wakeboarding, and climbing…
And listening to podcasts for a whole day (One of my favorite - Philosophize This!). And taking notes, I take notes of anything.
🧘🏻‍♀️I care about mindfulness and productivity, and love sharing what I find inspiring.
My research interests:
In terms of topics, I am always attracted to human perceptions and embodiment, especially auditory perceptions, and how we model them in machines: machine listening and music information retrieval. In my past research life, I‘ve focused on sound source localization & event detection (SELD), spatial audio, audio-visual correspondence, and anti-spoofing (voice deepfake detection). On the other side (the artsy side), I’ve been experimenting with new musical interfaces, music analysis, and music generation. In the near future, I hope to narrow down and focus my research in soundscape generation and analysis.
In terms of methodologies, I have great interests in exploring neuro-symbolic learning, multi-modal deep learning, physics-inspired models, representation learning, and of course, all the challenges along with them, such as generalizability.
In my journey of pursuing knowledge and a research identity, I found joy in presenting myself as a people shocker that my mission is to make the audience startled and scared (in a good way though).
#machine listening #neuro-symbolic #audio representation learning


  • Sep 5, 2023: Joined NYU MARL as a Computer Science Ph.D. student advised by Prof. Juan Bello with my greaaaaatest pleasure!
  • Jul 8, 2023 - Aug 11, 2023: Studied funk/fusion at Berklee College of Music, Boston this summer
  • Jan 30, 2023 - May 12, 2023 : Interned as an Acoustic Mapping Intern at Dolby, San Francisco w/ the Sound Technology Group
  • Feb 8, 2023: Graduated and received my master’s degree from Columbia
  • Oct 6, 2022 : Presented a poster with AIR at SANE Workshop in MIT, Boston
  • Jun 4, 2022 : Attended NEMISIG at NJIT, New Jersey


Spatial-Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms, Iran R. Roman, Christopher Ick, Siwen Ding, Adrian S. Roman, Brian McFee, Juan P. Bello. Presented in SANE Workshop 2023, Submitted to ICASSP 2024.
Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strongspatio-temporal labels. SELD data is simulated by convolving spatially-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present room scaper, a library for SELD data simulation and augmentation. Compared to existing tools, room scaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (includ- ing movement) of foreground and background sound sources. room scaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use room scaper to add rooms and acoustic conditions to the DCASE SELD challenge data. Training a model with our data led to progressive performance improves as a di- rect function of acoustic diversity. These results show that room scaper is valuable to train robust SELD models.
SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-SpoofingSiwen Ding, You Zhang, and Zhiyao Duan. Presented in SANE Workshop 2022, accepted by ICASSP 2023.
Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consideration of the diversity of speakers. In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. For training, we propose an algorithm for the co-optimization of bona fide speech clustering and bona fide/spoof classification. For inference, we propose strategies to enable anti-spoofing for speakers without enrollment. Our proposed system outperforms existing state-of-the-art single systems with a relative improvement of 38% on equal error rate (EER) on the ASVspoof2019 LA evaluation set.
@article{ding2022samo, title={SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing}, author={Ding, Siwen and Zhang, You and Duan, Zhiyao}, journal={arXiv preprint arXiv:2211.02718}, year={2022} }


Projects Gallery

▷ B-Side

My Personal Databases
My Personal Databases

🔍 Find Me at...

📧 /
CV / Resume
CV / Resume

Made with 💙 and ☕️ by Sivan Last update on Apr 3, 2023