Wednesday, July 30, 2014

Processing, the Kinect and OpenNI

The Kinect was a easy decision for our project because it is not sensitive to the light conditions in the room at the time it is captured. Hence, if we use this in a dark room it will not be an issue. The Kinect camera works by creating a depth image. It uses infrared light to create an image that captures where the objects are in space. The Kinect camera resolution is 640x480. You can bump the camera up to 1280x1024 but the data will arrive at 10 frames per second rather than 20 frames per second.

The next decision we had to make was determining which processing library would work the best for what we are trying to accomplish with the Kinect. It boiled down to OpenKinect and OpenNI.
Dan Shiffman built on the work of the OpenKinect project to create a library for working with the Kinect in processing. OpenKinect drivers provide access to the Kinect's servo motors and has a very simple software license. The contributors to the OpenKinect project released their drivers under a fully open source license. In short, this means that you can use OpenKinect code in your own commercial and open source projects without having to pay a license fee to anyone. In response to OpenKinect PrimeSense released their own software for working with the Kinect. PrimeSense included more sophisticated software that would process the raw depth image to detect users and locate the position of their joints in three dimensions. They called their software OpenNI, NI standing for "Natural Interaction". OpenNI provides two key pieces of software that is useful to our goals. First is the OpenNI framework. this includes the drivers for accessing the basic depth data from the Kinect. This piece of software has a similar licensing situation as OpenKinect. However, the other feature that OpenNI provides does not have such a simple license. This feature is the user tracking. This license is provided by an external module, called NITE. NITE is not available under an open source license. It is a commercial product that belongs to PrimeSense. PrimeSense does provide a royalty-free license that you can use to make projects that use NITE with OpenNI. 

We chose to use OpenNI because it provides us with the option to use the user tracking and there was a good amount of reading material that explained and used the OpenNI library in Processing. OpenNI also has the advantage because it is designed to work with not just the kinect but also other depth cameras. This means that code we write using OpenNI to work with the Kinect will continue to work with newer depth cameras as they are released, saving us from needing to rewrite our applications depending on what camera we want to use. 

OpenNI recognizes heads, shoulders, elbows, wrists, chests, hips, knees, ankles, and feet. This is going to be an important part in tracking the crowd in our installation. One of the techniques we are considering using OpenNI's 3D features and Point Cloud systems. Below is are screenshots of a test sketch of a point cloud system that we created that creates the illusion that it is rotating completely around what the kinect is seeing.

Rotating Point Cloud Sketch Test Image (1)


Rotating Point Cloud Sketch Test Image (2)


Research and tutorials referenced from Making Things See by Greg Borenstein

Installation size research

How much distance could we effectively use our tech in? 
Finding a balance between comfort for a group of users to experience our interactive floor with others, realistic size of fabricating the project and number of participants is a tough task. We want to let as many people as possible experience and play together, but we have to be mindful of how much room people need to stand around and be able to move their bodies. On the other hand, we also want people to feel like they are having fun together and don't want to give too much cushion between bodies. Finally, we have to realize that what we are building has to conform with the size of our lab and our budget so we can not build a studio with a stage.
Imagination can do so much, let's really see it!
As researched in our previous blog post Floor Size and Negative Space Research we have a good point of reference thanks to Dance Deck of Signature Systems Group, LLC. They are the world's largest center for renting dance floors and recommend a minimum of 4.5 square feet per person to have enough room on a dance floor. Considering this figure per person and combining it with the maximum effective distance of a X-Box Kinect which we will be using in our project we can make a good decision on the size of our floor plan.

These constraints have led us to choose a 100x100 foot square for the time being. Ideally this could populate 20 people. Unlike a traditional dance floor, our project will include display walls and a place for our performer (at this point, a DJ) to set up their materials. We have reserved a space for this which we have deemed 10 square feet, leaving 90 square feet for the participants which still equates to 20. For testing purposes we are choosing not to work in the ideal scope as we tread new waters. We are setting our maximum at 10 users until further notice and aiming for a minimum of 5 users for ideal conditions.

We want to be able to let users really go wild in our environment. Furthermore, we understand that there are limitations to the technologies we will be using during this project due to our budget restraints. All hope isn't lost friends so don't fret. Hopefully after user testing encountering the many unforeseen variables of this project we can involve more participants within this floor space, or expand our floor. Stay tuned fans! There are no instruction manuals here, just discoveries being made! #FBA!
Seeing is believing. It was crucial for us to see our floor plan in person before moving forward.

Monday, July 28, 2014

Thermistor and Installation Size



The first picture above is the thermistor hooked into the Arduino UNO.  A thermistor gives readings about temperature change because temperature effects the thermistor's resistance (Temperature Sensor + Resistor = Thermistor.

The second picture is Nate and Miguel's first attempt at installation size.  Each box marks a corner of the installation size.  This is a very important aspect of our project as it will affect the user's experience.

Thursday, July 24, 2014




My turntables and speaker box working in harmony :)  Nate has his two turntables and mixer hooked up and connect to the speaker inside of his music box.  The music box was his project at CSU Summer Arts Inventor's Workshop 2014.

What Are Virtual Environments?

The Use of Immersive Virtual Reality in the Learning Sciences: Digital Transformations of Teachers, Students, and Social Context


Jeremy N. Bailenson and Nick Yee
Department of Communication
Stanford University


Jim Blascovich and Andrew C. Beall
Department of Psychology
Stanford University


Nicole Lundblad
Department of Symbolic Systems
Stanford University


Michael Jin
Department of Computer Science
Stanford University


The article primarily shows how virtual environments can help teachers engage students more by getting visual cues on what students aren’t getting enough ‘eye gaze’ and setting students in the virtual center and front of the classroom via virtual headset. More importantly, this article provides us with definitions for various types of environments we will be considering for our project. Below is an excerpt from the article, but first I will highlight the most important portion with itlaics that is written later in the article.Feel free to read the full excerpt as well;
Virtual Environments (VE’s) are distinct from other types of multimedia learning environments (e.g., Mayer, 2001). In this article, we define VEs as “synthetic sensory information that leads to perceptions of environments and their contents as if they were not syn- thetic” (Blascovich et al., 2002, p. 105). Typically, digital computers are used to generate these images and to enable real-time interaction between users and VEs. In principle, people can interact with a VE by using any perceptual channel, in- cluding visual (e.g., by wearing a head-mounted display [HMD] with digital dis- plays that project VEs), auditory (e.g., by wearing earphones that help localize sound in VEs), haptic (e.g., by wearing gloves that use mechanical feedback or air blast systems that simulate contact with object VEs), or olfactory (e.g., by wearing a nose piece or collar that releases different smells when a person approaches dif- ferent objects in VEs).


An immersive virtual environment (IVE) is one that perceptually surrounds the user, increasing his or her sense of presence or actually being within it. Consider a child’s video game; playing that game using a joystick and a television set is a VE. However, if the child were to have special equipment that allowed him or her to take on the actual point of view of the main character of the video game, that is, to control that character’s movements with his or her own movements such that the child were actually inside the video game, then the child would be in an IVE. In other words, in an IVE, the sensory information of the VE is more psychologically prominent and engaging than the sensory information of the outside physical world. For this to occur, IVEs typically include two characteristic systems. First, the users are unobtrusively tracked physically as they interact with the IVE. User actions such as head orientation and body position (e.g., the direction of the gaze) are automatically and continually recorded, and the IVE, in turn, is updated to re- flect the changes resulting from these actions. In this way, as a person in the IVE moves, the tracking technology senses this movement and renders the virtual scene to match the user’s position and orientation. Second, sensory information from the physical world is kept to a minimum. For example, in an IVE that relies on visual images, the user wears an HMD or sits in a dedicated projection room. By doing so, the user cannot see objects from the physical world, and consequently it is eas- ier for him or her to become enveloped by the synthetic information.


There are two important features of IVEs that will continually surface in later discussions. The first is that IVEs necessarily track a user’s movements, including body position, head direction, as well as facial expressions and gestures, thereby providing a wealth of information about where in the IVE the user is focusing his or her attention, what he or she observes from that specific vantage point, and what are his or her reactions to the environment. The second is that the designer of an IVE has tremendous control over the user’s experience and can alter the appear- ance and design of the virtual world to fit experimental goals, providing a wealth of real-time adjustments to specific user actions.


Collaborative virtual environments (CVEs) involve more than a single user. CVE users interact via avatars. For example, while in a CVE, as Person A communicates verbally and nonverbally in one location, the CVE technology can nearly instantaneously track his or her movements, gestures, expressions, and sounds. Person B, in another location, sees and hears Person A’s avatar exhibiting these behaviors in his or her own version of the CVE when it is networked to Person A’s CVE.


An affordance of a virtual environments in learning, is that virtual environments can be catered to the audience to encourage learning. One example is a doll house that was featured to encourage children to tell stories where a same age virtual avatar was the teaching agent, rather than an authoritative teacher figure. This opens up the potential in our virtual environment to encourage users to participate with others via dancing, jumping, chanting, ect. Imagine a virtual item that encourages jumping. Sensing the jumping moves a fire place air breather visual up and down, putting air into a balloon that explodes when full, creating particles that float to the beat of the music. This could encourage participation more than traditional ‘dancing’ and encourage those who don’t normally move around otherwise.


According to this article, co-learners can only enhance learning thanks to dialogue and shared experience.

VE’s offer the ability to provide multiple perspectives of the same scene. The visualizations in VE’s can also act as visual cues with the integration of other sensory technology. An advantage is that with user testing, behavioral profiles can be made once the user type is analyzed to enhance the ‘usability.’

Wednesday, July 23, 2014

Floor Size and Negative Space Research

Websites for estimating size of dance floor:

The first article is about a dance floor that is used to evaluate dancers or potentially be used for compositional purposes.  Earlier examples of dance floors with sensors use electrical contact or pressure sensors.  This particular dance floor uses proximity sensors.  Although this is the most accurate to date, it seems as though we don’t need something this fine tuned for use in our project, especially since we will be using a matrix of sensors to get the data we need.  The second article is about a project that uses motion tracking and pressure sensing floors to generate visuals.  This is similar to our project, but is used in a structured dance performance setting.  It is definitely a project that our project is standing on the shoulders of in terms of application. The third article is definitely my favorite and has led me to an awesome resource for future research: EDMC. It goes in depth on various aspects of the EDM culture and talks about the balance between light and dark, individual and community as well as the types of experiences that people have and are looking for at these events.  One of the things that I think will make this project a success is the sense of collaboration between the DJ, dancers and non-dancers.  The fourth article talks about the importance of how a space is set-up and how that affects the communication and interactions of the space, which is something that we should keep in mind for how we will have our space set-up.  Traffic flow and set-up will have a lot to do with the kind of experience our users are going to have.  Form and function are ultimately inseparable.

Tuesday, July 22, 2014

Virtual Reality, Avatars and the Chameleon Effect

Virtual Reality and Social Networks Will Be a Powerful Combination
Avatars will make social networks seductive.
By Jeremy N. Bailenson, Jim Blascovich


Half a billion people spend more than 20 hours a week ‘wearing’ avatars. A virtual representation of a politician could be more convincing than the person it represents, and your heart beats just as fast when your girlfriend winks at you from your computer screen as it does in person.
The Vice President of IBM predicts that by 2015 avatars will be used in meetings and eventually will play a larger role in communication.

The advantage of avatars is that technology can be used to detect only specific motions, translating them to the avatar and saving on bandwidth that HD pixels would use otherwise and avoiding lag.
These avatars can be filtered and set up to do things that no normal person can do which research has shown can be very influential. The brain has trouble telling the difference between real and virtual experiences. A study at Stanford University by Clifford Nass and Byron Reeves in the 1990’s showed that users who were on a slow computer were later asked to survey that machine’s performance, on that same machine and rated it more positively than when they were asked to survey on a different station. Perhaps they didn't want to hurt the computer’s feelings? So how will we treat a virtual avatar of our friends, heroes or business meetings?

People want to be more likable, hence why this virtual world is often preferred. Weight loss is instant and a bad hair day never happens. Tanya Chartrand, a social psychologist did a study at Duke where interviewers gestures were mimicked by the ones they interviewed without them knowing. They often preferred those who mimicked them, labeled “the chameleon effect.” This was later tried with avatars reciting policies at a university that mimicked the head movements of the listener with a four second delay. The Avatars that did this mimicry were rated higher by participants than those who did not.

In a different study participants were very open to liking avatars that detected the user’s looks and would produce avatars with 20% similar facial characteristics. These were subtle, but enough to show how this could be used in the future to persuade users into various situations.

These examples suggest a synergy between the mind and immersive virtual technology. On the one hand, the brain treats real and virtual experiences as the same. On the other hand, in virtual reality, the rules of grounded reality are suspended.

In virtual reality, avatars can age, grow, or become supermodels at the touch of a button. They can use conversational superpowers, including large-scale mimicry and gaze, or wear other people's faces and bodies. Rationally, people bring specific expectations, perhaps guarded ones, when interacting with virtual others. However, the brain cannot consistently keep this guard up for long stretches of social interaction, and these super avatars tend to elicit very human responses.

http://spectrum.ieee.org/telecom/internet/virtual-reality-and-social-networks-will-be-a-powerful-combination/0

Kinect Research and Applications

My research into the Microsoft Kinect has shown me a lot about how the Kinect can be used, potential issues and that there are a lot of existing projects and research that we can use to help us in our project.  Most of the articles talk about how important it is to calibrate the Kinect.  There are also a lot of pointers on how to leverage both the Infrared and RGB cameras for more accurate results.  There are also a number of languages that can be used with the Kinect, although we have already decided on Processing.  There are also a lot of works on how to get optimal readings in an indoor setting, which I am sure that we can use for our project.  Something else that was covered in my research is the potential of using multiple Kinects to do 3D scanning of people and environments.  This can also be used to try and evaluate someone’s performance if we had a pre-determined routine for them to be dancing to.  Ultimately, I feel as though the main things we need to worry about are calibration and how we plan to use the data we collect.
The XBOX 360 Kinect

Nonverbal Behavior and Collaboration Detection

Automatically Detected Nonverbal Behavior Predicts
Creativity in Collaborating Dyads


Andrea Stevenson Won • Jeremy N. Bailenson •
Suzanne C. Stathatos • Wenqing Dai


Key Word: Rapport occurs when two or more people feel that they are in sync or on the same wavelength because they feel similar or relate well to each other. Rapport is theorized to include three behavioral components: mutual attention, mutual positivity, and coordination.


Some research indicates that pairs can be more creative than individuals working alone, but what about non verbal behavior and creativity? Rapport, or a state of mutual positivity and interest that arises through the convergence of nonverbal expressive behavior in an interaction (Drolet and Morris 2000, p.27) has been linked to success in a number of interpersonal interactions. Rapport is important to judging the success of a virtual agent.


The concept of synchronous nonverbal behavior was first introduced by Condon and Ogston (1966). Synchrony however, is very difficult to rate, and time consuming. Many people who were asked to determine the level of synchronization in videos would often revert to rating it based on similarities in people (skin color, wardrobe, etc) rather than actual actions. Eventually they would have to remove the audio from the videos that were being rated and faces were blurred. Predicting movements can also increase the amount of data that has to be processed and interpreted. Researchers have since turned to more generic ways to predict and interpret body movement.


Methods include placing of sensors on participants joints and summation of pixels for video (schmidt et al. 2012; Ramseyer and Tschacher 2011). Join markers are accurate, but can be expensive and cumbersome. Video based techniques are inexpensive, but bad lighting and bad camera angles can lessen their value. However, Microsoft Kinect is an inexpensive method that does not require joint markers with it’s infrared emitter and sensor.


Kinect was used with a teacher and learner to analyze interaction. Two kinects were used (one to record person A, one for person B) and were told to come up with as many water conservation ideas as possible. Good ideas were considered “appropriate novelty” as determined by Oppezzo and Schwartz (2014) and good ideas were marked with a 1, while bad ideas were marked with a 0 towards the final score. Overall the study predicted collaborative behavior at at a rate of 87%.

Although this study isn’t completely related to our project, it does mention troubles and solutions to using a Kinect VS other methods of motion detection as well as provides an insight into silent collaboration in a silent environment, or in our case a noisy one where communication is limited.

http://vhil.stanford.edu/pubs/2014/won-jnb-nonverbal-predicts-creativity.pdf

Thursday, July 17, 2014

Enter The Fuzz - Move In Day

As the summer hits it's mid mark, we have finally been granted access to move into our graduate lab. One of the many privileges we are lucky to have here at California State University Eastbay is that the multimedia graduate program provides it's second year students with a large space to make their projects a reality. At first there we entered what resembled a storage unit, but after moving some furniture around and doing a bit of organizing it became a real work space.
The lab in it's messy glory.


Nate and Miguel clearing some space

Hugo's desk for the rest of the year. Will it stay clean?

Some decor, represent!