COGS108: Data Science in Practice

Instructor: C. Alex Simpkins, Ph.D.

UCSD Summer Session 1, 2023

 

Please check site frequently for changes

Site last updated:


Frequently Asked Questions
Final Project

 

Announcements

  • Sun July 2, 2023, 2:45pm: This web page is going to be updated quite a bit today.
  • Sat July 1, 2023, 1am : Important last minute information will be placed here, usually with a time/date stamp so you know when it appeared

 

Syllabus

Course Description

This course has a variety of tools and resources through which we will help you learn and improve your knowledge about this topic as well as gain practical experience. You will see an introduction below to guide you as to where to look for information. We are very excited to have you in the course, and look forward to an excellent summer session!

Please note: Discussion sections will start week1, since we have such a short time frame. Week1 discussion section time will go to group formation and orienting to the class, getting you all on datahub, jupyter and python practice, and the first workshop/workbook lab (D1). I'll post the schedule on the github page and website.

 

Can it be taken remotely?

Yes, it is absolutely possible to take remotely. Lectures are podcast, Section is podcast, no in class exams, zoom office hours (and by appointment), assignments on datahub, canvas or google forms, project on github. HOWEVER, this does not mean it can be taken passively by the old osmosis methodology. This course requires active participation - ask questions, do the assignments, watch the lectures (I'll not just be stating the slides but will add  lot of practical insight, and there will be details you will miss if you skip it), watch the sections or go, interact with each other and us. It has also been established repeatedly in research (with data science of course!) that learning requires active participation to be effective. I'll be interacting with the class during lecture and I'll try to make zoom available for live questions and discussion, but it has reliability issues and should not be assumed. If Zoom is a problem we just will drop it and rely on office hours and section for live interaction as well as piazza and emails.

 

If you are sick

Please do not come if you are sick, since it's very doable to do remotely until you are better. I think the best experience will be in person with support of all the online resources, but you can definitely have a great experience remotely as well.

 

Piazza

Automatic roster sync is initially on, so most have been added to piazza, but we will switch to manual add for those who were brought in from the waitlist. If you can't access it now please let me know via email or canvas messages. The class piazza code will be posted on the canvas page. If you can't access the canvas page please send me an email and I'll send it to you.

 

Important links

 

Course objectives

  • Formulate a plan for and complete a data science project from start (question) to finish (communication)
  • Explain and carry out descriptive, exploratory, inferential, and predictive analyses in Python
  • Communicate results concisely and effectively in reports and presentations
  • Identify and explain how to approach an unfamiliar data science task

 

Prerequisites

  • Functional Brain
  • COGS 14B, MATH 18 or MATH 31AH,
  • COGS 18 or CSE 7 or CSE 8A or CSE 11

Course resources

This course is hosted in a few areas listed below. You can start at the web site and be taken to the appropriate endpoint of github, canvas, piazza, or the website home page itself. That way you don't have to think about where to look for particular elements. The links to the main course elements are here as well.

The information for this quarter is still being updated, so please bear with me. We will use the website as the jumping off point and for lecture slides, assignment deadlines and links, piazza for discussion, datahub for assignments and turn-ins, canvas for announcements and quizzes mostly (unless there is an issue and in that case I'll send an email), and occasionally last minute files or lecture slide postings. You will be provided a github repository for your group project and will gain experience interacting with git as well as learning the ins and outs of version control.

Assignments

  • Homework assignments will generally be turned in on datahub, using jupyter notebooks, structured to provide practical experience with the course material. They will in general increase in difficulty as your base knowledge expands throughout the course
  • There will be weekly assignments (4) and workbooks (~8) designed to prepare you for the assignments. The workbooks are worth points, and will be compled in section or you can do them on your own with the podcast. There are weekly lecture quizzes and a large group project where you will work with real data, perform data science you have learned and come to a conclusion about the topic, then communicate the results in a scientific paper format. We will also have surveys as part of participation exercises. The assignments page will be the place to go to understand what is due and when as well as where to go to complete the task
  • (click here to go to the assignments page
  • See the grading section for information about grading and %'s

 

Group/Final Project

  • There will be a large group project starting week2 (groups formed week1). This will give you the opportunity to actually formulate and ask a scientific question, perform analysis and modeling of realworld data and come to a conclusion, then communicate your results in a scientific report format
  • Information can be found in the projects page here (in process posting information)
  • See the grading section for information about grading and %'s

 

Course overview (from the github page)

This hands-on, practical course is intended to get you experience working on data science projects. This class goes beyond the appreciation for data and data science you (may have) learned in COGS 9 by doing the same things we talked about theoretically in that introductory course. Doing is rarely so simple. You will likely attempt to do something, do it wrong, learn from your mistakes, and with a bit of luck and skill, eventually succeed. That’s just part of the scientific process, and data science is no exception. This course is all about the practice of data science.

In focusing on the practice, there is theory that won’t be discussed and mathematical proofs that won’t be done. That is by design. In particular:

  1. There are entire courses dedicated to each of the topics we’ll cover. To have time to do anything, we can’t teach all the details in a single course.
  2. Experts in each of these domains are out there and excited to teach you the nitty gritty about each topic.
  3. We’re promoting data literacy. We believe that everyone who is data literate is at an advantage as they go out into the modern world. Data literacy is not limited to those who are computational gurus or math prodigies. You do not have to be either of those to excel at this course.

In this course, you will try many methods. You’ll even be asked to implement a technique that has not been explicitly taught. Again, this is by design. As a data scientist, you’ll regularly be asked to step outside of your comfort zone and into something new. Our goal is to get you as comfortable as possible in that space now. We want to provide you with a technical and a data science mindset that will allow you to ask the right questions for the problem at hand and set off alarm bells when something in your dataset or analysis is “off.”

 

Succeeding in the course

The course consists of (grade breakdown on home page):

  • Lectures 2x per week
  • Discussions 2x per week
  • 4 Weekly Jupyter notebook Assignments (Datahub)
  • 8 (2x Weekly) Jupyter notebook Workbooks (completed in section, turned in to Datahub, gives the tools for the assignments)
  • ~5 Weekly Lecture quizzes (Canvas)
  • 1 Group project starting Week 2, group formation week 1
  • SONA extra credit (2h max)
  • Other extra credit such as participation exercises turned in to canvas or google forms

The way to succeed in this course is

  • First to take on the attitude that learning the material, making it your own should be top priority. This is the most important thing to me - I am here to help you learn, put that as your focus and you'll do well. Also know that in my classes the only foolish questions are the unasked ones!
  • Next be sure to attend lectures or watch podcasts (and focus on them). Slides will be posted on the website with a link to the associated podcast.
  • Participate in the sections to complete the workbooks before the assignments. They are designed to make sure you have the tools to get through the assignments
  • Regularly check the web page for assignments and deadlines, handouts and links. All the various things assigned will be there so you can always orient to what is due when by looking at the assignments page.
  • If unsure about something, first read carefully the information we provide (web site, assignment text, whatever your question relates to) or UCSD's information such as for datahub questions, then post on piazza or send an email, we will get back to you. You can also pop into office hours or request an appointment
  • Conflict sometimes arises with groups - group dynamics can be complex and as a scientist or engineer you'll be working with teams often, so pay attention to what happened and think about how you could educate yourself to succeed in your team's goals and as individuals. We will also be lecturing on this. We will do our best to assist in resolving group issues. Try to work things out as early as possible, not just before the project is due.

 

How to behave in the course and expectations

We are here to help you. My goal is for you to walk out of the course having learned, regardless of your starting point. We are not here to prove anything, but rather to inspire you and open doors for you all. Let's work together! This course requires a great deal of effort, but is very rewarding. I expect everyone to act in a polite and professional manner, and try hard. Hostility, judgmental or overly critical attitudes or mockery in any form have no place in this course and will affect your grade negatively. I make myself very approachable and open to questions and discussion, but this only works if we can have mutual respect. So let's respect each other.

We will make this an inclusive, friendly and positive learning environment for everyone, including the instructional team in line with university (and my) policy. This means old, young, male/female/other, no matter where you are from, what you know or don't know, political affiliation, religion, etc. as long as you treat everyone well and are here to learn you are welcome, and know you are accepted by me. Please come with an open mind and plan to actively participate, work with each other and ask questions in a collaborative fashion. Students should not copy each other, but it is great to help each other succeed. It is critical to understand the answers and to actively participate in the course in order to 1)get a good grade, and 2)walk out having gained something for your life. Explaining to your peers something you understand and they might not can be a great way to deepen your understanding, so it is encouraged.

Some students will be already advanced in their knowledge. This is great, but realize no matter how much anyone knows, we are all human and thus have finite minds. The world is vast and so there is always more to learn about any topic. If you already have experience with these topics, you can gain in several ways:

  • You can as long as it is polite, assist other students. By answering each others' questions you will be discussing at times topics from perspectives you might not have considered.
  • Come to us and we can provide you with some reading and topics to consider, as well as to relate course content to further topics, examples, discuss your background and future goals, and discuss how to get more out of the course.
  • Experience always expands us. If you already have experience with a topic, look for the hole in your knowledge. Similar to our vision there is always a blind spot and our brains are great at filling in. There is an old saying "Big trees must have deep roots."

 

About facing challenge

Sometimes it can be uncomfortable to be faced with an unknown but by doing so you will learn. All knowledge was once a mystery, and began with the statement "I don't know."

Let's find out together. Welcome to COGS108 with Dr. Simpkins!

References/Textbooks

Regular readings will be provided in pdf format on the handouts page. These will include book chapters and scientific papers which elaborate on (or demonstrate applications of) the concepts presented in class, section, and on assignments.

There are no required readings, but to deeply understand the topics it is important to read about as well as practice these concepts.

References:

  • Grus J (2019, 2nd ed) Data Science from Scratch. (free with your ucsd login, more challenging but deeper)
  • Vanderplas, J (2018) Python Data Science Handbook (free, more short and direct)

Software/resources

You'll be using python (>3.7) and Jupyter Notebooks. The main ways to use these are one or more of the following:

  • UCSD DataHub (https://datahub.ucsd.edu) (nothing to install, and you must use this to turn in assignments)
  • Local (free) copy of python, jupyter notebooks and Anaconda distribution (tutorial will be posted on installation)
  • Some cloud service that provides access to these resources (Google Colab, Microsoft Azure, or other)

 

Grading

Grading will follow the fill-the-bucket principle. For each homework assignment and for the Midterm and Final test you will get score points. These will be added. The grade will be based on your score and the maximum achievable score. The course average will be scaled (only up if need be, not down!)

Tentatively:

4 Assignments

32%

8 Discussion workbooks

20%

1 Final project

30%

4 quizzes

30%

*behavioral multiplier (defaults to 1)

M={0.7, 1.0}
total possible will be 1000 pts, plus bonus
(100%+ bonus)*M

*To be explained in class. If you do not follow the course and university policies regarding treating each other (including the instructional team) respectfully, it will have a serious negative impact on your grade. We want this course to be a positive learning experience for everyone and a place for learning. We all have our emotions (this is natural) but please do your best to behave in a welcoming, positive and helpful manner. All communications with the instructional team and each other should avoid overly negative, judgmental or critical tones. This is a professional environment. If there is a problem, please come to us and we will help you work it out. It is not your role to judge each other, yourselves, us or the course, but rather to open your mind and learn as much as possible.

 Cheating and Academic Honesty Policies

First of all please DON'T CHEAT!!! It detracts from your learning in this class. When you go into the world you won't have the skills you should have gained here. Our goal is to help you learn, so if you have any problems, please come speak with us and we will help you resolve them to the best of our ability. That being said, the definition of cheating must be defined clearly:

Cheating on exams involves any form of copying from another student, giving or getting answers from another student, acquiring information in any way from an external source during the exam, or giving information to or receiving information from another individual which you should not receive during an exam (ie theories, data, answers, etc). You may ask questions during an exam of the instructor or TA's at any time. The TA's are not to give answers directly, but may provide hints.

Cheating on homeworks involves duplicating another person's code. You are to write your own code, unless the instructional team provides a starter code, or sample code for you to use. You may not use code from sources other than this course. You may not copy another student's code. However, you ARE encouraged to help each other and discuss the homeworks and material from the course. It is often through explaining something that one learns that concept even better than before. But when it comes to writing the code, you must do the actual writing of your own code. Programming is very much something you must do as well as study to learn it well, very similar to driving.

The Standard academic honesty policies of the university apply during this course as well. Click here for details

Instructional Team

Office hours:

Instructor:

Instructor email office hours
C. Alex Simpkins, Ph.D. rdprobotics "at"gmail"dot"com
  • Friday 12-1pm (Zoom link on canvas)
  • Saturday 12-1pm (Zoom link on canvas)

 

TA's:

Name email office hours
Abhishek Tanpure (TA) atanpure@ucsd.edu

We 3-4pm (offline)

Th 11am-12pm (Zoom link on canvas)

Rounak Sen (TA) r2sen@ucsd.edu

Mon 3-4pm (Zoom link on canvas)

Fri 11-12pm (Zoom link on canvas)

Hari Yadavalli (TA) hyadavalli@ucsd.edu

Tu 11-12pm (Zoom link on canvas)

Fri 1-2pm (Zoom link on canvas)

Antara Sengupta (IA) asengupt@ucsd.edu

Tu 1-2pm (Zoom link on canvas)

Th 2-3pm (Zoom link on canvas)

 

Lectures, sections and related information

 

TA/Instructor Section
Type
Time/day
Location
Dr. Simpkins
Lecture
Lecture
MW 11am-1:50pm
SOLIS 107
All
A01
Discussion section A01
MW 2-2:50pm
SOLIS 107

 

Final Exam :No final exam, only group project due Week5