Data

We have a pedantic and obsessive focus on data as we believe that it is best to address complexity at the data layer rather than accumulate technical debt to be paid at software and process layers.

We also believe that the non-linear methods of machine and deep learning have increased the need for highly normalized, verified, and metadata augmented data. That is simply because bad data has much worse consequences when used in conjunction with these non-linear methods than it had in the era of classical linear statistics. We agree that highly normalized data models are often inflexible and difficult to implement compared to document and columnar data models. However, the cost of wrong inferences from invalid data is so high that it is often worth paying the price of rigorous data modeling upfront.

We are currently working on and researching how deep learning techniques can be used for constructing metametadata or ontological hierarchies for data domains. We see these techniques as prerequisites for practical approaches to large scale data fusion applications.