Mike Williamson<p>The Treasury Board's policy suite is conceptually a giant graph structure, but is frustratingly resistant to automated analysis.</p><p>Some annoyances:</p><p>The policy suite straddles tbs-sct.gc.ca and canada.ca and policies often draw their authorities from material on laws-lois.justice.gc.ca</p><p>There is frustratingly little common structure you can rely on. If you think you found a structure, you just need to see a few more policies</p><p>Links between policies or to laws rarely link to relevant sections</p><p>Only a few policies have an XML data representation, most are available only as HTML, making web scraping the most reliable approach</p><p>Markers indicating sections, clauses etc. are not consistent across HTML documents making web scraping extremely annoying</p><p>Multiple requirements often occur in a single ("and")</p><p>Enabling programmatic analysis of policy would be broadly valuable both inside and outside government.</p><p>This should be an <a href="https://infosec.exchange/tags/opendata" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opendata</span></a> <a href="https://infosec.exchange/tags/dataproduct" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>dataproduct</span></a> but it seems like these documents are largely treated like marketing material: if it looks OK in the browser it's done.</p><p><a href="https://infosec.exchange/tags/gcdigital" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gcdigital</span></a></p>