MPATE-GE 2039: Deep Learning for Media

Instructor: Magdalena Fuentes
Email: mfuentes@nyu.edu
Spring 2023,  3 credits
Class meetings: 2:00 PM – 4:30 PM W
Classroom: 35 West 4th Street, EDUC building, room 610
Office hours: https://bit.ly/mf_office_hours

 

Course Description

Deep learning, a sub-field of machine learning and artificial intelligence, has promoted breakthroughs in managing and creating media content, and continues to shape the future of the multimedia landscape. This course provides a hands-on, project-oriented introduction to deep learning, for the classification, retrieval, and creation of media content, with emphasis in audio-visual content. Students will create and work with existing deep learning models using Python libraries, and think critically about the application of these models for media. This course will provide students with an understanding of how these tools work, and how to use them in the context of their work, so they can participate in the ongoing revolution between machine learning and media content.

Course Objectives

The goal of this course is to provide students with:

  • Basic knowledge of deep learning, including definitions, pipeline development, and evaluation.
  • Practical skills for working with open source deep learning resources in Python.
  • Awareness of the current state-of-the art practices in the retrieval, classification and creation of audio-visual content.
  • The ability to critically analyze the behavior of deep learning systems in the context of media.

Student Learning Outcomes

By the end of the course, students will be able to:

  • Explain and describe basic concepts of deep learning, including definitions, pipeline development, and evaluation.
  • Code deep learning pipelines using open source Python libraries.
  • Assess deep learning systems in new audio-visual data.
  • Explain current trends in deep learning for the classification and generation of audio and image.

Required Skills

Basic knowledge of Python is required. Basic knowledge of linear algebra is recommended.

Required Course Materials

A computer with Internet access.

Course Requirements

Class participation

This course will be highly participatory and interactive. We will have conversational lectures, live-coding demonstrations and group readings. It is expected and essential that students participate actively in class. For the class readings, we are going to follow the role-playing paper reading format of Jacobson and Raffel. In this format, students read the same paper, but each student takes one specific role for that week. We will work with three different roles, to be determined at the beginning of the course. Each student will participate by choosing one role in the semester. Papers will be assigned to students during the first week of classes. See a short guide on how to read a paper.

Homework assignments

There will be 3 homework assignments. Each assignment will consist of Python code snippets and written components to be completed by you. These assignments are individual. You will have two weeks to complete each assignment. For the homework assignments, your lowest score will be dropped. If you can’t finish one assignment for any reason, you should make the time to cover the material, but no need to panic about late assignments.

Final Project

The final project can be done in groups of 2-3, and will have a code and presentation component. Projects will be presented during the last class. A project can cover the assessment of an existing deep learning system in new data, or a  new system proposed by the students. The project will have four assignments through the semester as follows:

Topic, group definition and data report. Each group will propose a project’s topic and composition and will submit a short report (max. 2 pages) describing the scope of the work and data resources they will use. The project’s final scope and composition will be determined jointly with the instructor.  Examples of feasible projects will be provided by the instructor in advance to guide the students in the definition of their project. The groups will submit a short report describing the nature and origin of the data they are planning to use for their project, along with some examples. E.g. If the students are using an existing dataset, they should explain why they chose that dataset, include a description of the dataset, and explain how they are planning to use it. If the students are planning to collect data or create a new dataset, they should detail their plan to do so and describe the resulting dataset as above.

Documented code, and data. Students will submit their working (and well documented!) code along with the data they investigated. The code should run out-of-the-box, and should have instructions on how to run it if needed (e.g. where should the data be placed?). The code should be well organized and well documented, explaining the decisions made and presenting a rigorous analysis of the data. Students are encouraged to follow the structure of the homework as a guide.

Final Presentation. All students from each group should participate in a final presentation. Each group should prepare slides as part of the presentation. Presentations should last between 5-10 minutes, and will be followed by a short Q&A from the class.

Your grade for the class will be determined by the following break-down:

  • 40% assignments (individual)
  • 30% final project (groups)
  • 30% class participation (individual)

Classes will be in person weekly unless indicated by the instructor, and materials will be available on NYU Brightspace.

Recommended Readings

Main textbook:

Other recommended readings:

  • Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. 2016. MIT Press.
  • Abu-Mostafa, Yaser S. Learning from data: a short course. 2012. AMLBook.
  • David Foster. Generative Machine Learning. 2019. O’Reilly Media, Inc
  • Tuomas Virtanen, Mark D. Plumbley, Dan Ellis. Computational Analysis of Sound Scenes and Events, Chapter 4 (Brian McFee).

Class Environment

Please do your part by seeking to promote the success of others, and by treating each other in ways that respect and celebrate the diversity of talent. Here are a few specific things that you should know about my policies on creating an inclusive and equitable class environment (both in the classroom and on the course website/forum):

  • Preparation: Students come to this class from a wide range of backgrounds, and greatly varying previous exposure to mathematics, media and programming. I want to assure students who may feel out of place here that you are indeed prepared to succeed in this class! If you feel that there are gaps in your knowledge, please speak to me and I will help you find additional materials as needed.
  • Classroom environment: The classroom is an open forum for discussion, and I encourage all students to feel free to ask questions in class. Please do not be afraid to ask any question, no matter how basic it may seem. What is basic to some of the class may be new to the rest.
  • Accessibility: If you have any accessibility requirements, please present a letter from the Moses Center to me at your earliest convenience, so that I can ensure that materials comply with your needs. I am always willing to do what it takes to support you.
  • Mental Health & Wellness: If you are experiencing undue personal and/or academic stress during the semester that may be interfering with your ability to perform academically, the NYU Wellness Exchange (212-443-9999) offers a range of services to assist and support you. I am available to speak with you about stresses related to your work in my course, and I can assist you in connecting with the Wellness Exchange. The Wellness Exchange offers drop-in services on campus on a regular basis. Additionally, if you anticipate any challenges with completing the assignments, readings, exams and other work required in this course, I encourage you to register with the Moses Center (212-998-4980 or mosescsd@nyu.edu) in advance so that you may be granted the proper academic accommodations.
  • Preferred name and pronouns: You are always welcome to write your preferred name on all class assignments, exams, etc. If you have a name and/or pronoun that doesn’t match the class roster delivered from the registrar, please let me know and I will ensure that you are addressed correctly in our class.
  • Class expenses (textbook, devices, etc.): If obtaining any material for use in our class presents a financial hardship for you, please let me know and I will do my best to arrange for loaner materials.
  • Feedback: I will solicit (anonymous) feedback from students throughout the course, but if you have pressing or specific issues, please do not hesitate to let me know if any aspect of our course or class community can be improved.
  • Academic Integrity and Honesty: All students are expected to do their own work on homework. Students may discuss homework with each other, as well as with the course instructor. Each student must turn in their own write-up of the homework. Excessive collaboration (i.e., beyond discussing problem set questions) can result in honor code violations. Questions regarding acceptable collaboration should be directed to the class instructor prior to the collaboration. It is a violation of the honor code to copy or derive homework solutions from other students (or anyone at all), textbooks, previous instances of this course, or other courses covering the same topics. Finally, a good point to keep in mind is that you must be able to explain and/or re-derive anything that you submit. Please also refer to the general NYU academic integrity statement.
  • Diversity: NYU values an inclusive and equitable environment for all students. I hope to foster a sense of community in this class and consider it a place where individuals of all backgrounds, beliefs, ethnicities, national origins, gender identities, sexual orientations, religious and political affiliations, and abilities will be treated with respect. It is my intent that all students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. If this standard is not being upheld, please feel free to speak with me.

Resources at NYU

Access your course materials: NYU Brightspace  (brightspace.nyu.edu)
Databases, journal articles, and more: Bobst Library (library.nyu.edu)
Assistance with strengthening your writing: NYU Writing Center (nyu.mywconline.com)
Obtain 24/7 technology assistance: IT Service Desk (nyu.edu/it/servicedesk)
University Policy on Academic Integrity
Moses Center for Students with Disabilities

Weekly Schedule

Week 0: Supplemental materials 

If you want to get some practice before we start with the homework assignments, take a look at:

Week 1, 01/25: Introduction to Machine Learning

  • Course structure and schedule
  • Why deep learning? Why now?
  • Frame a machine learning problem
  • Collect a dataset
  • Understand your data
  • Choose a measure of success
  • Recommended reading: DLwP Chapter 1, Chapter 6
  • Practice exercise:
    • GitHub basics
    • Submitting homework

Week 2, 02/01: The building blocks of neural networks

  • Data representations
    • Scalars
    • Vectors
    • Matrices
    • Tensors
    • Sequential data
  • Data operations
  • Practice exercise:
    • Train a first neural network for handwritten digit classification
    • Adapt the network for audio instrument classification
  • Recommended reading: DLwP Chapter 2.1-2.3

Week 3, 02/08: The engine of neural networks

  • Gradient-based optimization
    • Derivative
    • Derivative of tensors
    • Stochastic gradient descent
    • Backpropagation
  • Practice exercise:
    • Gradient descent
  • Recommended reading: DLwP Chapter 2.4

Week 4, 02/15: Setting up a deep learning project

  • Keras and Tensorflow
  • Layers and models
  • Loss function
  • Metrics
  • Inference
  • Practice exercise:
  • Recommended reading: DLwP Chapter 3
  • ASSIGN: paper readings for the semester

Week 5, 02/22: Classification, model generalization and evaluation

  • Overview of deep learning for classification
  • Underfitting and overfitting
  • Improving model fit
  • Model biases and ethical implications
  • Interpretability
  • Practical example: fixing the models
    • A biased classifier
    • An ethically questionable classifier
    • A classifier that doesn’t generalize
  • Recommended reading: DLwP Chapters 4, 5
  • ASSIGN: homework 1
  • Role-playing paper reading 1

Week 6, 03/01: Convolutional Neural Networks

  • And introduction to Computer Vision
  • Convolutional networks
  • Max-pooling
  • Practical example: Image classification
    • Training from scratch
  • Recommended reading: DLwP Chapter 8

Week 7, 03/08: Transfer learning and Data augmentation

  • Image data augmentation
  • Practical example: Manipulating and augmenting images
    • Data pre-processing
    • Feature extraction with a pre-trained model
  • Recommended reading: DLwP Chapter 8
  • DUE: homework 1
  • ASSIGN: homework 2
  • Role-playing paper reading 2

Week 8, 03/15: Spring Break

  • No class meeting

Week 9, 03/22:  Audio understanding

  • An introduction to Machine Listening
  • Audio manipulation
  • Spectrograms
  • Audio data augmentation
  • Practical example: Manipulating and augmenting audio
    • Data pre-processing
  • Recommended reading: Computational Analysis of Sound Scenes and Events, Chapter 4
  • DUE: homework 2
  • ASSIGN: homework 3

Week 10, 03/29:  Audio classification

  • Temporal convolutional networks for audio
  • Practical example: Sound classification
    • Training from scratch
    • Feature extraction with a pre-trained model
  • Recommended reading: Computational Analysis of Sound Scenes and Events, Chapter 4
  • Role-playing paper reading 3

Week 11, 04/05:  Cross-modal retrieval 

  • Overview and applications of cross-modal retrieval
  • Building a cross-modal retrieval model from audio and video architectures
  • Embedding space
  • Contrastive Loss
  • Practical example: Audio-visual cross modal retrieval
    • Audio and vision encoders
    • Loss implementation
    • Bring everything together
  • Recommended reading: N/A
  • DUE: homework 3
  • ASSIGN: project 1
  • Role-playing paper reading 4

Week 12, 04/12: Neural style transfer

  • Overview of style transfer in image and audio
  • Practical example: image style transfer
    • Style loss
    • Content loss
    • Bring everything together
  • Recommended reading: DLwP Chapter 12.3
  • DUE: project 1 (04/10)
  • ASSIGN: project 2
  • Role-playing paper reading 5

Week 13, 04/19: Image generation

  • Generative Adversarial Networks (GANs) for image generation
  • Challenges and trick for training a GAN
  • Practical example: GAN for image generation
    • Discriminator
    • Generator
    • Adversarial network
    • Bring everything together
  • Diffusion techniques overview
  • Recommended reading: DLwP Chapter 12.5
  • Role-playing paper reading 6

Week 14, 04/26: Audio generation

  • Variational Autoencoders for audio generation
  • Sampling from audio latent spaces
  • Practical example: Variational autoencoders for audio generation
    • Encoder/decoder network
    • Sampling
    • Bring everything together
  • Diffusion techniques overview
  • Recommended reading: DLwP Chapter 12.4
  • DUE: project 2
  • ASSIGN: project 3
  • Role-playing paper reading 7

Week 15, 05/03: Final project presentations

  • Final projects presentations. See the “Final Project” section for details.
  • DUE: project 3