CV

Please see my resume in detail in this pdf ->

Basics

Name Siwen (Sivan) Ding
Title Ph.D. Student
Email sivan.d@nyu.edu
Phone (646)-683-8105
Url https://sivannavis.github.io/
Summary I'm a researcher by day and musician by night.

Work

  • 2024.05 - 2024.08
    Research Intern
    Dolby Laboratories, Inc.
    Video to Spatial Audio (FOA) Generation via Latent Diffusion
    • Spatial Audio
    • Multi-modal
    • Audio Generation
    • Diffusion
  • 2023.01 - 2023.05
    Acoustic Mapping Intern
    Dolby Laboratories, Inc.
    Robust User Localization in Acoustic Mapping via Speech Enhancement
    • Spatial Audio
    • Localization

Education

  • 2023.09 - Present

    Brooklyn, NY, USA

    Doctor of Philosophy
    New York University
    Computer Science
    • Machine Learning
    • Computer Vision
    • Music Information Retrieval
    • Information Visualization
    • 3D Audio
    • Digital Signal Processing
  • 2021.09 - 2022.12

    New York, NY, USA

    Master of Science
    Columbia University
    Data Science
    • Machine Learning
    • Deep Learning and Neural Networks
    • Probability and Statistics
    • Reinforcement Learning
    • Statistical Inference and Modeling
    • Algorithms for Data Science
    • Computer Systems
    • Sonic and Visual Representations of Data
    • Sound: Advanced
  • 2017.09 - 2021.06

    Wuhan, Hubei, CN

    Bachelor of Engineering
    Wuhan University
    Energy Engineering (Thermodynamics)
    • Maths: Advanced Mathematics, Linear Algebra, Probability
    • Mechanics: Theoretical Mechanics, Mechanical Design, Material Mechanics, Fluid Mechanics, Quantum Mechanics
    • Electronics: Electrical Engineering and Electronics Techniques, Principle of Automatic Control
    • Dynamics: Thermodynamics, Heat Transfer, Multi-scale Modeling and Simulation
    • Programming: C, C++, Java

Publications

Projects

  • 2022.01 - 2022.03
    SoniZen
    Mindful Meditation Experience via Data Sonification
    • Designed a multi-modal real-time plug-in with OSC and MAX for Live with gestural, visual, and auditory input as sensor signals
    • Mapped signals from wearables (watches and earphones) to controlled parameters with neural networks for live music performance
  • 2022.05 - 2022.11
    Voice Anti-Spoofing and Audio Deepfake Detection
    2022 Summer Internship @AIR, UoR
    • Designed a novel loss function with an algorithm for speaker attractor multi-center one-class supervised learning with 120K voice data
    • Refined generalizability of audio spoofing detection to achieve SOTA EER by 38% relative improvement
    • Illustrated model behaviors in cluster representation learning and classification through ablation and UMAP and t-SNE embedding visualization
    • Leveraged cyclic learning rate and hyper-parameter tuning techniques to improve convergence of training process
  • 2022.05 - 2022.10
    Embodied Multi-Modal Machine Listening in Audio-Visual Navigation
    2022 Summer Internship @MARL, NYU
    • Instrument audio/image CNN and Transformer with reinforcement learning on HPC clusters in semantic audio-visual navigation
    • Developed an API for audio feature extraction for baseline models to perform transfer learning in holistic downstream evaluation
  • 2023.08 - 2024.01
    Soundscape Simulation, Augmentation and Visualization
    A Python Library for Soundscape Generation
    • Developed a Python library for data simulation, augmentation, spatialization, and visualization of spatial audio
    • Conducted ablation studies with a DCASE SELD challenge model to manifest 37% improvement of augmentation over baseline
  • 2024.01 - 2024.03
    Acoustic Spatial Visualizer
    A Python Library for Soundscape Visualization
    • Generate spatial audio with SpatialScaper and visualize its moving track with DeepWave
    • The visualizer uses APGD algorithm to takes in a 32-channel spatialized audio and outputs a 2D/3D energy map
  • 2024.03 - 2024.05
    Neuro-Harmonilizer
    A Python Library for Soundscape Visualization
    • It maps any chord to a polar coordinates ϕ, ρ, where ϕ means the color orientation and the ρ means the tension class within the total 31 classes.

Skills

Computer Science and Data Science
Machine Learning
Deep Learning
Statistical Inference
Data Analysis
Audio and Music Technology
Spatial Audio
Audio Representation Learning
Sound Art
Music Information Retrieval
Music Production
Programming and Tools
Python
R
SQL
PyTorch
TensorFlow
Engineering
Digital Signal Processing
Thermodynamics
CFD

Awards

Interests

Music
Funk & Fusion
Jazz
RnB
Sports
Climbing
Surfing
Wakeboarding
Stakeboarding
Instruments
Electric guitar
Electric Bass
Synths
Guzheng
Other Topics
Cinematography
Philosophy
Cognitive neuroscience
Poems

Languages

Mandarin
Native
English
Fluent