Assignments – CS 7450 – Information Visualization

Homework Assignments (HW)

These individual assignments will help you develop your knowledge for design principles for Information Visualization. For each of these, the deadline to submit your work is by the start of class on the day they are due. Unless otherwise described, the submissions must be submitted to t-square. Note that some of them require you to bring hardcopies to class, as well as submit on t-square.

The grading distribution is broken down as follows. The sum of your assignments count for 30% of your total grade, broken down as:

HW1: 0 points
HW2: 6 points
HW3: 8 points
HW4: 9 points
HW5: 7 points

Homework 1: Survey

Complete this background survey. Nothing to submit on t-square. Only submit 1 response per person.

While this assignment does not have any points assigned to it, you should really do it. It will take only a few minutes, and really helps instructors and TAs plan so that your semester is educational, fun, and interesting.

Homework 2: Data Exploration and Analysis

The purpose of this assignment is to provide you with some experience exploring and analyzing data without using an information visualization system. Below is a dataset (that can be imported into Excel) about cereals. You should explore and analyze this data using Excel or simply by hand (drawing pictures is fine), but do not use any visualization tools. Your goal here is to perform an exploratory analysis of the data set, to better understand the data set and its characteristics, and to develop insights about the cereal data.

What to turn in: What you turn in should consist of three things:

First, list (bullet list of items) five “insights”, chunks of knowledge, or deeper questions that you either encountered or gained while exploring the data. An insight could be some understanding of the data and its characteristics that is not relatively obvious or intuitive. It is something that most people might not realize initially. Note that an insight or knowledge chunk simply may be a deeper question that arose in your mind while exploring the data. And your analysis may not have been sufficient to answer the question.

Second, write one paragraph about the process you used to do the exploration and analysis. Did you load the data into Excel, work manually, or do both? What did you do in Excel? Did you draw pictures? Just tell me (briefly) what you did.

Third, write one paragraph about challenges or problems that you encountered in doing the analysis this way. Did anything limit or frustrate you? If nothing did, perhaps there was something that was more difficult than you thought it should be. Nothing is perfect, so you should be able to list some potential issues here. So, to sum up, your assignment should have a bullet list of five items followed by two paragraphs.

Submit a pdf to the t-square assignment. If you drew things by hand, take a picture of the drawings and include those.

Grading: We will evaluate the quality of the insights you listed. We are looking for things that we find interesting or perhaps unexpected. This is subjective. For the second and third parts, we will evaluate if you did what the assignment asked.

Cereals data (xls format)
The data set should be pretty self-explanatory. The Manufacturer is a one letter code with the expected mapping (Q-Quaker Oats, P-Post, G-General Mills, K-Kelloggs, R-Ralston Purina, N-Nabisco) and Type is C (cold) or H (hot). Units and other details about the dataset can be found at http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html.

Homework 3: Multivariate Data Visualization

This HW has two options described below. You only have to do one of the two options. They are both worth the same amount of points, and are created to be of equal difficulty and time commitment. The choice of which to do is completely up to you.

Option A: Build a vis using d3.js

The purpose of this assignment is to provide you with a hands-on experience of building interactive visualizations on the web. The visualization framework/toolkit you will use is D3. Linked below is a csv file containing data about coffee product sales at a fictional company. You are required to create an interactive barchart visualization using the data.

We understand that some students in class do not have a programming background. Hence, we will provide tutorials to assist on the assignment both in class and outside class. In addition, the bottom of this assignment contains a list of D3 resources that may be helpful.

To begin, we have created a skeleton or template of the assignment for you to start with. It is available at: runnable template and code for template. The visualization shown in the template consists of two drop downs and an update button. Once the user chooses an attribute to show on the x-axis, say profit, and for the y-axis, say region, clicking on the update button should show a barchart with accumulated sum of profits over the different regions.

The following video shows what a correct final version of the assignment should look like, and it will help you understand what you need to implement. (Solution demo video).

Grading: The assignment will be graded out of a total of 8 points. There are two options for how you handle aggregation of the individual data values (this is one of the trickier aspects of D3 programming).

You can choose to generate the 4 sets of summary stats in an Excel file. The 4 sets would be sum of sales by region, sum of profits by region, sum of sales by category, and sum of profits by category. You can create four csv files with those values and use that as input for the different views. If you do this, you can still earn a total of 8 points should your program work correctly.
You can choose to generate the 4 sets of summation stats directly in javascript using D3. If you do this, a correct implementation would give you 1 extra credit point as a bonus.

What to turn in: We will use t-square to submit the code for the assignment. It is due at the start of class on the due date. Create a .zip of your directory that contains all code, css styles, data, and other parts that are needed for us to host your code and run your visualization.

Coffee sales data (csv format)
The csv file contains data items with eight attributes in comma-separated-value format. The data file has column headers, so the data should be relatively self explanatory. Do not treat date as an ordinal attribute for the height of the bars.

D3 resources
http://www.youtube.com/watch?v=8jvoTV54nXw – nice overview and run-through video/talk
http://alignedleft.com/tutorials/d3/ – thorough d3 tutorials from an academic instructor and the author of the open OReilly book, “Interactive Data Visualization for the Web” (look for free preview link for the actual book draft
http://sightlinevis.com/ – many d3 examples
https://www.youtube.com/user/d3Vienno/videos?view=0&flow=grid – many tutorial videos by d3Vienno
http://www.cs171.org/2015/resources/ – list of d3 resources from Harvard CS 171 class
https://github.com/mbostock/d3/wiki/Tutorials – big list of resources from the author of D3
https://github.com/mbostock/d3/wiki/API-Reference – well-done D3 documentation
http://www.d3noob.org – free ebook with lots of tips and tricks, actively updated
http://www.jeromecukier.net/wp-content/uploads/2012/10/d3-cheat-sheet.pdf – cheat sheet for D3, also see parent site for blog posts
https://groups.google.com/forum/?fromgroups=#!forum/d3-js – D3 Google group
http://bost.ocks.org/mike/selection/ – Guide to understanding selections, key part of D3.
http://benclinkinbeard.com/talks/2012/NCDevCon/ – A talk, with interactive examples and code snippets, explaining d3
http://www.udacity.com/course/data-visualization-and-d3js–ud507 – d3.js Udacity Course
http://bl.ocks.org/curran/3a68b0c81991e2e94b19 – Responsive Visualizations (Resizing)
http://bl.ocks.org/hubgit/raw/9133448/ – Nesting CSV Data
http://bost.ocks.org/mike/nest/ – Nesting Visualization Elements
http://www.visualcinnamon.com/blog – Creative Tutorials from Nadieh Bremer

Option B: Visual Design

The purpose of this assignment is to provide you with further practice in designing visualizations of data. In this assignment, the data set is more complex with many variables. The data set consists of information about whiskeys. (The data set is available via t-square.) This option of the HW is specifically for students who have very little or no programming background, but have more expertise in design-related disciplines.

Your objective in the assignment is to design a static visualization of this data set that will convey its key characteristics to the viewer. This assignment is somewhat analogous to the visualization of a year’s worth of weather data from the NY Times. You should represent your visualization on the front of a single piece of paper. You don’t need to faithfully and accurately map each variable, but you should provide enough detail so that the idea of your visualization is clear. A key to this assignment is to figure out what you want to represent (and what to omit) and how to represent that. You may have to abstract and summarize different aspects of the data — think carefully about this! On the back of your paper, or on another sheet, briefly explain your visualization well enough so that we can understand what you have done.

What to turn in: Submit your visualization on one piece of paper and an explanation of the visualization design as a pdf on t-square.

Grading: We will evaluate the effectiveness of your visualization for communicating the fundamental aspects of the data set. Does it give the viewer a good understanding of the different characteristics of the data? Here, we are looking for both effectiveness and creativity. (We do realize that people have differing levels of design ability and experience. Here, we are looking for a good effort, not necessarily some InfoVis conference paper-worthy new idea. Perhaps you can apply some of the ideas that you’ve learned from class so far.) The purpose of this assignment is to provide you with experience in the analysis of data like this and the design of visualizations to present the data.

Homework 4: Try and Critique Commercial InfoVis Systems

This assignment will familiarize you with a number of systems that have been built for analyzing multivariate data sets. You will be working with Qlik Sense, TIBCO Spotfire, and Tableau.

The goals of the assignment are for you to learn the capabilities provided by these types of systems, learn the visualization methods that they provide, and assess their utility in analyzing information repositories. You will work with some provided data sets in the assignment. Think about the kinds of questions that an analyst would be asking about the data sets. IMPORTANT: For the assignment, you only need work with two of the three commercial systems. The choice of which two is up to you. (Feel free to work with all three as well!)

The assignment has four parts:

1. Gain familiarity with the systems
Familiarize yourself with the visualization techniques and the user interfaces of the different systems. Each one has a tutorial that you should try out with a sample data set or a tutorial video to watch. Work your way through these materials and become familiar with the system, its interface and its capabilities.

2. Examine the sample data sets
Each tool includes a few sample data sets, but often it’s best to learn with something new. Five data sets are supplied the Resources page of t-square for you to consider: foods’ nutritional data (5976 items, 32 vars.), stocks (500 items, 30 vars.), baseball statistics (322 items, 24 vars.), college information (51 items, 22 vars.) and whiskeys (284 items, 8 vars.). You must work with the food nutrition data set and you are free to pick the one other set that is most interesting to you.

UPDATE Oct. 10 — It appears that Tableau does not include these datasets anymore in the trial versions. I’ve uploaded a .zip to the t-square resources folder for HW4. In it, you will find 5 datasets (baseball, coffee, food, collegeProfessors, and whiskeys). You will need to work with two datasets: the coffee dataset, and one of the other 4 (your choice).

Briefly scan the files and familiarize yourself with the variables. Generate and write down (you will need to turn them in) a few (at least 3) hypotheses to be considered, tasks to be performed, or questions to be asked about the data elements. Think about all the different kinds of analysis tasks that a person might want to perform in working with data sets such as these. For instance, someone working with a data set about breakfast cereals might have tasks like:

Identify the cereals with the most salt.
Do the different companies producing cereals have different styles of cereal that they favor?
Does high fat mean high calories?
What cereals would you recommend to someone on a diet who still wants some good taste?
Does the nutritional value of cereals vary a lot? If so, how?
etc.

Try not to make all of your questions be about correlations, which seems to be a common thing to do.

3. Load and examine the data sets into the systems
Load the coffee and other data set that you selected into each of the two visualization tools you selected, then consider your hypotheses, tasks, and questions. Also use the systems to explore the data sets and see if you can discover other interesting or unexpected findings in the data sets. Put yourself in the shoes of a data analyst, and consider questions that such a person would confront.

4. Write a report on your findings
Write up a summary of your exploration process, findings, and impressions of the systems. Include your hypotheses/tasks/questions and what you found. Furthermore, critique the different tools in a general sense. (Feel free to include screenshots to help explain your analyses and critiques.) What are the systems’ strengths and weaknesses? How do their visualization capabilities differ? For what kinds of user tasks is each tool suited? Focus more here on the visualization techniques as opposed to the particular user interface quirks, though you should feel free to comment on UI aspects when they are particularly good or bad. Additionally, for each tool, list one unexpected finding, insight, or discovery made while exploring one of the datasets with that tool. Explain how the system helped to facilitate the finding.

We recommend that you not walk through each question/task one-by-one for each of the two systems you used. (There simply won’t be space to do so.) You might want to include specific examples of how the systems assisted or did not assist work on specific tasks, however. Point out interesting, insightful observations; you don’t need to tell us how a system works — we already know that. Think of this like a report to your manager who wants to know what each system can provide, its pros and cons. Focus specifically on how its visualizations help or hinder analysis. How did the systems compare? Finally, if you had to recommend one system for your company to use, which one would you suggest?

What to turn in: Your document is limited to a maximum of 10 pages, single-spaced, reasonable font size, including embedded screenshots. Please bring two hardcopies to class on the day that it is due. Also, submit a pdf to t-square.

Acknowledgments: Tableau’s data visualization software is provided through the Tableau for Teaching program. We thank Tableau for making the system available to students in class.

Homework 5: Text Visualization Design

The purpose of this assignment is to provide you with further experience in analyzing and understanding mutlivariate data sets. The particular focus of this assignment is a data set that is rich with textual data. It is a document collection that consists of a set of product reviews of a Samsung TV from amazon.com. The data set is in xml format and available on t-square.

First, think about the kind of information that you would want to learn from this data. Put yourself in the shoes of a consumer. What things about the TV sets would you want to know? Push past the simple “Is it a good TV?” question.

Next, design a visualization of the data set that you feel would help a person learn about the television and understand the issues that identified earlier. Think especially hard about visualizing aspects of the data set that would be difficult to extract from simple search queries against it. Sketch your visualization on paper. I’m not looking for a working system here or even a detailed representation where all the actual data values are shown. Just do a conceptual design that shows what your visualization would look like if it was being applied on the data set in general. Try not to simply replicate some well-known infovis technique, however. Be effective, but also try to be creative. I will reward creative ideas. Also, don’t forget the interaction! A static piece of paper does not do justice to an interactive visualization, so you likely will need to explain how the viewer would interact with and update the display.

What to turn in: Draw/sketch/show your design on a piece of paper or a few pages (don’t go overboard). Feel free to annotate the sketch with small comments or captions to explain what it is and how it would work. On a separate page, explain your visualization design in a paragraph or two, how it would start, what the interaction would be, etc.

Grading: We will primarily rate the quality of your design. Here, we are looking especially for utility, and we’ll add on a bit of creativity assessment too. Would the visualization effectively facilitate people exploring the different televisions in order to make a purchase decision? We also will examine how clearly and effectively you explain your design. Remember that communicating one’s work is almost as important as the work itself. (We do realize that people have differing levels of design ability and experience. Do your best — Try to apply the design principles that we have been learning in class.) The purpose of this assignment is to provide you with experience in the analysis of text-centric data like this and the design of visualizations to present the data.