CV
Please see my resume in detail in this pdf ->
Basics
Name | Siwen (Sivan) Ding |
Title | Ph.D. Student |
sivan.d@nyu.edu | |
Phone | (646)-683-8105 |
Url | https://sivannavis.github.io/ |
Summary | I'm a researcher by day and musician by night. |
Work
-
2024.05 - 2024.08 Research Intern
Dolby Laboratories, Inc.
Video to Spatial Audio (FOA) Generation via Latent Diffusion
- Spatial Audio
- Multi-modal
- Audio Generation
- Diffusion
-
2023.01 - 2023.05 Acoustic Mapping Intern
Dolby Laboratories, Inc.
Robust User Localization in Acoustic Mapping via Speech Enhancement
- Spatial Audio
- Localization
Education
-
2023.09 - Present Brooklyn, NY, USA
Doctor of Philosophy
New York University
Computer Science
- Machine Learning
- Computer Vision
- Music Information Retrieval
- Information Visualization
- 3D Audio
- Digital Signal Processing
-
2021.09 - 2022.12 New York, NY, USA
Master of Science
Columbia University
Data Science
- Machine Learning
- Deep Learning and Neural Networks
- Probability and Statistics
- Reinforcement Learning
- Statistical Inference and Modeling
- Algorithms for Data Science
- Computer Systems
- Sonic and Visual Representations of Data
- Sound: Advanced
-
2017.09 - 2021.06 Wuhan, Hubei, CN
Bachelor of Engineering
Wuhan University
Energy Engineering (Thermodynamics)
- Maths: Advanced Mathematics, Linear Algebra, Probability
- Mechanics: Theoretical Mechanics, Mechanical Design, Material Mechanics, Fluid Mechanics, Quantum Mechanics
- Electronics: Electrical Engineering and Electronics Techniques, Principle of Automatic Control
- Dynamics: Thermodynamics, Heat Transfer, Multi-scale Modeling and Simulation
- Programming: C, C++, Java
Publications
-
2024.01.19 Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We present SpatialScaper, a library for SELD data simulation and augmentation.
-
2022.11.04 SAMO: Speaker Attractor Multi-Center One-Class Learning For Voice Anti-Spoofing
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space.
Projects
- 2022.01 - 2022.03
SoniZen
Mindful Meditation Experience via Data Sonification
- Designed a multi-modal real-time plug-in with OSC and MAX for Live with gestural, visual, and auditory input as sensor signals
- Mapped signals from wearables (watches and earphones) to controlled parameters with neural networks for live music performance
- 2022.05 - 2022.11
Voice Anti-Spoofing and Audio Deepfake Detection
2022 Summer Internship @AIR, UoR
- Designed a novel loss function with an algorithm for speaker attractor multi-center one-class supervised learning with 120K voice data
- Refined generalizability of audio spoofing detection to achieve SOTA EER by 38% relative improvement
- Illustrated model behaviors in cluster representation learning and classification through ablation and UMAP and t-SNE embedding visualization
- Leveraged cyclic learning rate and hyper-parameter tuning techniques to improve convergence of training process
- 2022.05 - 2022.10
Embodied Multi-Modal Machine Listening in Audio-Visual Navigation
2022 Summer Internship @MARL, NYU
- Instrument audio/image CNN and Transformer with reinforcement learning on HPC clusters in semantic audio-visual navigation
- Developed an API for audio feature extraction for baseline models to perform transfer learning in holistic downstream evaluation
- 2023.08 - 2024.01
Soundscape Simulation, Augmentation and Visualization
A Python Library for Soundscape Generation
- Developed a Python library for data simulation, augmentation, spatialization, and visualization of spatial audio
- Conducted ablation studies with a DCASE SELD challenge model to manifest 37% improvement of augmentation over baseline
- 2024.01 - 2024.03
Acoustic Spatial Visualizer
A Python Library for Soundscape Visualization
- Generate spatial audio with SpatialScaper and visualize its moving track with DeepWave
- The visualizer uses APGD algorithm to takes in a 32-channel spatialized audio and outputs a 2D/3D energy map
- 2024.03 - 2024.05
Neuro-Harmonilizer
A Python Library for Soundscape Visualization
- It maps any chord to a polar coordinates ϕ, ρ, where ϕ means the color orientation and the ρ means the tension class within the total 31 classes.
Skills
Computer Science and Data Science | |
Machine Learning | |
Deep Learning | |
Statistical Inference | |
Data Analysis |
Audio and Music Technology | |
Spatial Audio | |
Audio Representation Learning | |
Sound Art | |
Music Information Retrieval | |
Music Production |
Programming and Tools | |
Python | |
R | |
SQL | |
PyTorch | |
TensorFlow |
Engineering | |
Digital Signal Processing | |
Thermodynamics | |
CFD |
Awards
- 2017~2020
Merit Student Scholarship
Wuhan University
- 2017~2018
Hualin Machinery First-Class Scholarship
Wuhan University
- 2019
Relabo Innovation Scholarship
Wuhan University
- 2017~2019
Outstanding Student
Wuhan University
Interests
Music | |
Funk & Fusion | |
Jazz | |
RnB |
Sports | |
Climbing | |
Surfing | |
Wakeboarding | |
Stakeboarding |
Instruments | |
Electric guitar | |
Electric Bass | |
Synths | |
Guzheng |
Other Topics | |
Cinematography | |
Philosophy | |
Cognitive neuroscience | |
Poems |
Languages
Mandarin | |
Native |
English | |
Fluent |