Infrastructure & Optimization for Data Science: Presentations

October 14 10:30 am-11:30 am
362 A Level Three GRBCC
TRACK: Data Science
Presentation / Lightning Talk
Optimization and Big Data: An Application from Car Rental Pricing
10:30 AM - 10:50 AM
LEVEL: Advanced

This work addresses the computational challenges introduced into the field of pricing optimization by the need to consume large-scale multi-dimensional data. We propose a methodology to decompose a large-scale network optimization into a multi-step sequential optimization, by leveraging the relations between pricing dimensions considered in the problem. Implementation on a large size US-wide car rental network runs three-step optimization under an hour, which makes a daily pricing update feasible.

High Availability and High Frequency Big Data Analytics
10:50 AM - 11:10 AM
LEVEL: Intermediate

Big Data analytics are becoming the industry standard, but are not yet ready for finance. Financial applications use “medium data” in the terabyte range, which is smaller than the typical use case but larger than can be handled using traditional databases, and also have a much tighter reliability and throughput requirement. This talk discusses techniques that can be used to tailor open-source Big Data solutions, specifically HBASE.

SEAD: Infrastructure for Managing Research Data in the Long Tail
11:10 AM - 11:30 AM
LEVEL: Intermediate

SEAD serves scientists by providing data infrastructure for the “long tail” where data is highly heterogeneous and dispersed. SEAD embraces a full-lifecycle perspective that begins with data collection and continues through publishing and depositing data into repositories that provide long-term access and preservation. Our technical focus is “SEAD Project Spaces” where scientists upload and organize their research data, and the “SEAD Matchmaker’s” rule-based mediation for depositing data into appropriate repositories.