bquant " quantitative research on Brazilian data.

A personal toolkit for financial, economic, and urban research. Two modules: time series econometrics on public Brazilian data sources, and a geospatial pipeline covering 14 dimensions of urban form. Built to question market premises " not to confirm them.

Infrastructure, not product. bquant ingests raw data from BCB, IBGE, FGV, FipeZAP, RAIS, GeoSampa, and IPEA " cleans, normalizes, and exports parquet files consumed downstream by urban-space (real estate pricing) and solo-inteligente (RAG pipeline for land prospecting). The research posture is heterodox: treat consensus as a prior to stress-test, not a conclusion to accept.

Python pandas numpy statsmodels scipy (cKDTree) scikit-learn geopandas BigQuery openpyxl requests

econ/ " Time Series & Econometrics

Scrapes and normalizes seven Brazilian macro series from BCB, FGV, IBGE, and B3. Then runs a full battery of time series diagnostics " stationarity, autocorrelation structure, volatility clustering, structural breaks " before any model sees the data.

economia_pipeline.py DadosEconomicos + DadosFipezap

Orchestrates ingestion, normalization, and parquet export for all economic series. FipeZAP fallback chain resolves cidade/tipo/quartos -> regional -> national average across 300+ cities and 50+ Excel sheets.

BCB

Banco Central

Selic (base rate), IGMI-R (real estate returns by state), IVG-R (collateral value index), IBC-BR (GDP proxy).

FGV + IBGE

Inflation series

IGP-M (rent inflation), INCC (construction costs), IIE-BR (economic uncertainty index). IBGE IPCA as general CPI baseline.

FipeZAP

Real estate prices

Price per m2 by city, type, and bedroom count. 49 normalized columns: sale/rent × residential/commercial × monthly and 12m variation.

timeseries_eda.py

TimeSeriesEDA

ADF + KPSS stationarity. ACF/PACF for ARIMA structure. ARCH-LM for volatility clustering. Rolling 6m mean/std for structural break detection. Transformations: log1p, first difference.

Output
economia.parquet . fipezap_latest.parquet . fipezap_sp_ts.parquet
Stationarity tests
ADF (autolag='AIC') and KPSS on all series before modeling " non-stationary inputs never reach a model untransformed.
Volatility
ARCH-LM (statsmodels.stats.diagnostic.het_arch) " flags heteroskedastic series for separate variance modeling.
FipeZAP fallback
Exact city/type/bedrooms -> same city all bedrooms -> residential type -> state capital -> national average. Prevents silent NaNs from propagating downstream.

geo/ " Geospatial Urban Pipeline

Eight processors running in sequence via pipeline_geo.py. Spatial joins via cKDTree (O(log N)) link CEP coordinates to the nearest UDH centroid " sub-municipal human development units from ONU/IPEA. Output: 14 parquet tables covering São Paulo and national coverage.

IPTU São Paulo

GeoSampa IPTU " 10-year time series. Glob of IPTU_*.csv files. Property-level fiscal value history across the entire city.

RAIS Labor Market

2005.2021. Three series: general, industry, services. Score_Edu weighted from ANALF=0 to ESCOMP=8. Age median via age-bracket interpolation.

IDH / UDH

Municipal HDI + sub-municipal UDH (ONU/IPEA). Population by district. The spatial anchor for all neighborhood-level joins.

Establishments

Density and sectoral composition of formal businesses by municipality. Measures economic diversification at neighborhood level.

CEPs

Geocoding of postal codes. KDTree proximity match to UDH centroids " the join key that connects property data to all urban indicators.

Delegacias

Police district perimeters, crime occurrence rates. Spatial feature for safety-adjusted pricing models in urban-space.

Spatial join
scipy.spatial.cKDTree " O(log N) nearest-centroid lookup between CEP coordinates and UDH polygons. No brute-force iteration.
14 output tables
estab_mun . pop_brasil . ipea . onu . iptu_sp (10y) . idh_udh . pop_distrito_sp . rais_geral . rais_ind . rais_serv . delegacias . ceps
Education scoring
RAIS education field encoded from 0 (illiterate) to 8 (complete higher education). Score_Edu is a weighted average per UDH " used as a socioeconomic feature in downstream models.
Coverage
São Paulo (IPTU, CEPs, delegacias) + national (RAIS 2005.2021, IDH municipal, population, establishments).

Data flow " how bquant feeds downstream projects

bquant is infrastructure. It does not run models or generate predictions. It produces clean, versioned parquet files that downstream pipelines consume directly " no shared database, no API, just files on disk passed between research projects.

BCB . FGV . IBGE . FipeZAP . B3
->
economia_pipeline.py
->
economia.parquet
fipezap_latest.parquet
GeoSampa . RAIS . IBGE . IPEA . ONU
->
pipeline_geo.py
->
14 parquet tables
.
urban-space

Real estate pricing model

Consumes fipezap + all 14 geo tables. Builds hedonic pricing model for São Paulo residential market. Tests whether location, labor market quality, and IDH predict price per m2 better than proximity to metro alone.

solo-inteligente

Land prospecting pipeline

Ingests parquets into a RAG pipeline. Identifies undervalued urban land parcels by cross-referencing fiscal IPTU values, zoning potential, and neighborhood quality scores from geo/ outputs.

Algorithms & models

bquant implements diagnostics, not black boxes. Every algorithm has a transparent null hypothesis and a clear rejection criterion.

ADF + KPSS

Stationarity tests run in tandem. ADF rejects unit root; KPSS rejects stationarity. Disagreement flags structural break candidates.

ACF / PACF

Autocorrelation structure to identify ARIMA (p,d,q) orders before fitting. Prevents overfitting from model search.

ARCH-LM

Lagrange Multiplier test for ARCH effects. Identifies series with volatility clustering " signals need for GARCH modeling in asset return analysis.

cKDTree

scipy.spatial " O(log N) nearest-neighbor spatial join. Replaces brute-force polygon intersection for 300k+ CEP records against UDH centroids.

Rolling stats

6-month rolling mean and standard deviation. Detects regime changes in macro series " IGPM / Selic divergence patterns pre and post rate cycles.

MinMaxScaler + PCA

Feature normalization and dimensionality reduction for geo composite scores. Consumed by downstream models in urban-space and solo-inteligente.

Personal research project.