Infrastructure & Optimization for Data Science: Presentations
This work addresses the computational challenges introduced into the field of pricing optimization by the need to consume large-scale multi-dimensional data. We propose a methodology to decompose a large-scale network optimization into a multi-step sequential optimization, by leveraging the relations between pricing dimensions considered in the problem. Implementation on a large size US-wide car rental network runs three-step optimization under an hour, which makes a daily pricing update feasible.
Big Data analytics are becoming the industry standard, but are not yet ready for finance. Financial applications use “medium data” in the terabyte range, which is smaller than the typical use case but larger than can be handled using traditional databases, and also have a much tighter reliability and throughput requirement. This talk discusses techniques that can be used to tailor open-source Big Data solutions, specifically HBASE.
SEAD serves scientists by providing data infrastructure for the “long tail” where data is highly heterogeneous and dispersed. SEAD embraces a full-lifecycle perspective that begins with data collection and continues through publishing and depositing data into repositories that provide long-term access and preservation. Our technical focus is “SEAD Project Spaces” where scientists upload and organize their research data, and the “SEAD Matchmaker’s” rule-based mediation for depositing data into appropriate repositories.