conference¶
src.utils.conference
¶
Shared conference metadata, area classification, and name-normalization helpers.
Every generator / enricher should import from here rather than redefining
its own SYSTEMS_CONFS, SECURITY_CONFS, _conf_area(),
_extract_conf_year(), or name-cleaning functions.
discover_conferences(website_root: str | None = None) -> tuple[frozenset[str], frozenset[str]]
¶
Return (systems, security) conference sets from the website.
Falls back to auto-detection of the website root, then to built-in defaults if the directory is not found (e.g. in tests). Discovered conferences are merged with the built-in fallbacks so that known conferences are always classified even before the pipeline auto-generates their pages.
Source code in src/utils/conference.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
canonicalize_name(name: str) -> str
¶
Map known name aliases to their canonical form.
Source code in src/utils/conference.py
127 128 129 130 131 132 | |
conf_area(conf_name: str) -> str
¶
Return 'systems', 'security', or 'unknown'.
Accepts a bare conference name ('OSDI') or a conf-year string
('osdi2024'). Any casing is accepted.
Source code in src/utils/conference.py
135 136 137 138 139 140 141 142 143 144 145 146 | |
parse_conf_year(conf_year_str: str) -> tuple[str, int | None]
¶
Parse 'osdi2024' → ('OSDI', 2024).
Returns (name_upper, year_int) on success, or
(conf_year_str.upper(), None) on failure.
Source code in src/utils/conference.py
154 155 156 157 158 159 160 161 162 163 | |
clean_name(name: str) -> str
¶
Remove DBLP disambiguation suffixes and collapse whitespace.
'Jane Doe 0001' → 'Jane Doe'
Source code in src/utils/conference.py
169 170 171 172 173 174 175 176 177 178 179 | |
normalize_name(name: str, *, strip_initials: bool = False) -> str
¶
Aggressive normalisation for cross-source matching.
Lower-cases, strips accents, removes dots, collapses whitespace.
Applies name alias canonicalisation first so that known aliases
(e.g. 'Bogdan "Bo" Stoica' → 'Bogdan Alexandru Stoica')
collapse to the same normalised key.
Optionally strips single-letter initials (e.g. "J. Doe" → "Doe")
and leading underscores for ranking deduplication.
Source code in src/utils/conference.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | |
normalize_title(title: str) -> str
¶
Normalize a paper title for fuzzy matching.
Lower-cases, strips punctuation (keeping word chars and spaces), and collapses whitespace. Used for deduplication and cross-source title matching.
Source code in src/utils/conference.py
207 208 209 210 211 212 213 214 215 216 | |
venue_to_conference(booktitle: str) -> str | None
¶
Map a DBLP booktitle to our conference identifier, or None.
Source code in src/utils/conference.py
251 252 253 254 255 256 257 258 259 260 261 262 263 264 | |
clean_member_name(raw_name: str) -> str | None
¶
Clean a committee member name.
Strips markdown links, trailing <br> tags, and skips placeholder names.
Returns the cleaned name, or None if the entry should be dropped.
Source code in src/utils/conference.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |