A personal toolkit for financial, economic, and urban research. Two modules: time series econometrics on public Brazilian data sources, and a geospatial pipeline covering 14 dimensions of urban form. Built to question market premises " not to confirm them.
Infrastructure, not product. bquant ingests raw data from BCB, IBGE, FGV, FipeZAP, RAIS, GeoSampa, and IPEA " cleans, normalizes, and exports parquet files consumed downstream by urban-space (real estate pricing) and solo-inteligente (RAG pipeline for land prospecting). The research posture is heterodox: treat consensus as a prior to stress-test, not a conclusion to accept.
Scrapes and normalizes seven Brazilian macro series from BCB, FGV, IBGE, and B3. Then runs a full battery of time series diagnostics " stationarity, autocorrelation structure, volatility clustering, structural breaks " before any model sees the data.
Orchestrates ingestion, normalization, and parquet export for all economic series. FipeZAP fallback chain resolves cidade/tipo/quartos -> regional -> national average across 300+ cities and 50+ Excel sheets.
Selic (base rate), IGMI-R (real estate returns by state), IVG-R (collateral value index), IBC-BR (GDP proxy).
IGP-M (rent inflation), INCC (construction costs), IIE-BR (economic uncertainty index). IBGE IPCA as general CPI baseline.
Price per m2 by city, type, and bedroom count. 49 normalized columns: sale/rent × residential/commercial × monthly and 12m variation.
ADF + KPSS stationarity. ACF/PACF for ARIMA structure. ARCH-LM for volatility clustering. Rolling 6m mean/std for structural break detection. Transformations: log1p, first difference.
Eight processors running in sequence via pipeline_geo.py. Spatial joins via cKDTree (O(log N)) link CEP coordinates to the nearest UDH centroid " sub-municipal human development units from ONU/IPEA. Output: 14 parquet tables covering São Paulo and national coverage.
GeoSampa IPTU " 10-year time series. Glob of IPTU_*.csv files. Property-level fiscal value history across the entire city.
2005.2021. Three series: general, industry, services. Score_Edu weighted from ANALF=0 to ESCOMP=8. Age median via age-bracket interpolation.
Municipal HDI + sub-municipal UDH (ONU/IPEA). Population by district. The spatial anchor for all neighborhood-level joins.
Density and sectoral composition of formal businesses by municipality. Measures economic diversification at neighborhood level.
Geocoding of postal codes. KDTree proximity match to UDH centroids " the join key that connects property data to all urban indicators.
Police district perimeters, crime occurrence rates. Spatial feature for safety-adjusted pricing models in urban-space.
bquant is infrastructure. It does not run models or generate predictions. It produces clean, versioned parquet files that downstream pipelines consume directly " no shared database, no API, just files on disk passed between research projects.
Consumes fipezap + all 14 geo tables. Builds hedonic pricing model for São Paulo residential market. Tests whether location, labor market quality, and IDH predict price per m2 better than proximity to metro alone.
Ingests parquets into a RAG pipeline. Identifies undervalued urban land parcels by cross-referencing fiscal IPTU values, zoning potential, and neighborhood quality scores from geo/ outputs.
bquant implements diagnostics, not black boxes. Every algorithm has a transparent null hypothesis and a clear rejection criterion.
Stationarity tests run in tandem. ADF rejects unit root; KPSS rejects stationarity. Disagreement flags structural break candidates.
Autocorrelation structure to identify ARIMA (p,d,q) orders before fitting. Prevents overfitting from model search.
Lagrange Multiplier test for ARCH effects. Identifies series with volatility clustering " signals need for GARCH modeling in asset return analysis.
scipy.spatial " O(log N) nearest-neighbor spatial join. Replaces brute-force polygon intersection for 300k+ CEP records against UDH centroids.
6-month rolling mean and standard deviation. Detects regime changes in macro series " IGPM / Selic divergence patterns pre and post rate cycles.
Feature normalization and dimensionality reduction for geo composite scores. Consumed by downstream models in urban-space and solo-inteligente.
Personal research project.