saos-tm.extractor.law-links
Module contains algorithm for extraction of references to Polish legislation.
conv-act-to-str
(conv-act-to-str act)Converts act part of the law link act to string.
The output string format is Dz. U. z XXXX r. Nr X poz. X
When no :journalNo is available only Dz. U. z XXXX r. poz. X is returned.
convert-art-to-str
(convert-art-to-str art)Converts article part of the law link art to string.
The output string format is art. X § X ust. X pkt X zd. X lit. X.
extract-law-links
(extract-law-links s use-local-explicit-dictionary use-local-implicit-dictionary use-global-dictionary)Extracts references to Polish legislation from a given string s. Flags use-local-explicit-dictionary use-local-implicit-dictionary use-global-dictionary (true or false) switch on or off different dictionary methods described below.
The result is a map containing two keys:
:orphaned-links- algorithm did not manage to map legislative act to them; contains list of maps with fields::artarticle part of link:txttext which algorithm did not manage to resolve to legislative act
:extracted-links- list containing successfully extracted links in form of maps with fields::artcoordinates of article parts of legislative act (described below):actcoordinates of act (described below)
Example:
(extract-law-links "art. 36 ustawy z dnia 1 sierpnia 1997 r.
o Trybunale Konstytucyjnym (Dz. U. Nr 102, poz. 643 ze zm.,
dalej: ustawa o TK). Zgodnie z § 116 pkt 3 wskazanego rozporządzenia,
w dziale" true true true)
{:orphaned-links
[{:art ("0" "116" "0" "3" "0" "0"), :txt "wskazanego"}],
:extracted-links
[{:art
{:lit "0", :zd "0", :pkt "0", :ust "0", :par "0", :art "36"},
:act {:journalEntry "643", :journalNo "102", :journalYear "1997"}}]}
It extracts two types of coordinates:
- article coordinates (
:art) - e.g. from text “art. 36 ust. 3” it will extract a list("36" "0" "3" "0" "0" "0"), subsequent elements in this list mean Polish artykuł, paragraf, ustęp, punkt, zdanie, litera act coordinates (
:act) - act coordinates can be given in text of Polish case law in different forms:- full description - e.g. from text “ustawy z dnia 1 sierpnia 1997 r. o Trybunale Konstytucyjnym (Dz. U. Nr 102, poz. 643)” it will extract a map
{:journalEntry "643", :journalNo "102", :journalYear "1997"}, in this case it is extracted withextract-act-coords-journal-with-dotorextract-year-journal-nmb-and-entryfunction by means of rules and regexes - non-full description - e.g. “ustawy o TK” or “u.w.l.”, in this case dictionary methods (described below) are used
- full description - e.g. from text “ustawy z dnia 1 sierpnia 1997 r. o Trybunale Konstytucyjnym (Dz. U. Nr 102, poz. 643)” it will extract a map
Dictionary methods:
- global dictionary - a list of regexes called
dictionary-for-actsis used to match texts that describe law acts - local implicit dictionary - input text is split by
acts-txts-split-regexto extract acts descriptions; from full descriptions (see above) act coords and full name of act are extracted (example of full name is “ustawy z dnia 1 sierpnia 1997 r. o Trybunale Konstytucyjnym”) - the name is used to match texts that describe law acts - local explicit dictionary - input text is split by
acts-txts-split-regexto extract acts descriptions; from full descriptions (see above) that contain explicit definition of law act abbreviation (e.g. “dalej: ustawa o TK”) the abbreviation is extracted to match texts that describe law acts
sort-arts
(sort-arts arts)Sorts the seq of arts according to succesive comparisons of :art, :par, :ust, :pkt, :lit, :zd fields.