saos-tm.extractor.law-links
Module contains algorithm for extraction of references to Polish legislation.
conv-act-to-str
(conv-act-to-str act)
Converts act part of the law link act
to string.
The output string format is Dz. U. z XXXX r. Nr X poz. X
When no :journalNo
is available only Dz. U. z XXXX r. poz. X
is returned.
convert-art-to-str
(convert-art-to-str art)
Converts article part of the law link art
to string.
The output string format is art. X § X ust. X pkt X zd. X lit. X
.
extract-law-links
(extract-law-links s use-local-explicit-dictionary use-local-implicit-dictionary use-global-dictionary)
Extracts references to Polish legislation from a given string s
. Flags use-local-explicit-dictionary
use-local-implicit-dictionary
use-global-dictionary
(true
or false
) switch on or off different dictionary methods described below.
The result is a map containing two keys:
:orphaned-links
- algorithm did not manage to map legislative act to them; contains list of maps with fields::art
article part of link:txt
text which algorithm did not manage to resolve to legislative act
:extracted-links
- list containing successfully extracted links in form of maps with fields::art
coordinates of article parts of legislative act (described below):act
coordinates of act (described below)
Example:
(extract-law-links "art. 36 ustawy z dnia 1 sierpnia 1997 r.
o Trybunale Konstytucyjnym (Dz. U. Nr 102, poz. 643 ze zm.,
dalej: ustawa o TK). Zgodnie z § 116 pkt 3 wskazanego rozporządzenia,
w dziale" true true true)
{:orphaned-links
[{:art ("0" "116" "0" "3" "0" "0"), :txt "wskazanego"}],
:extracted-links
[{:art
{:lit "0", :zd "0", :pkt "0", :ust "0", :par "0", :art "36"},
:act {:journalEntry "643", :journalNo "102", :journalYear "1997"}}]}
It extracts two types of coordinates:
- article coordinates (
:art
) - e.g. from text “art. 36 ust. 3” it will extract a list("36" "0" "3" "0" "0" "0")
, subsequent elements in this list mean Polish artykuł, paragraf, ustęp, punkt, zdanie, litera act coordinates (
:act
) - act coordinates can be given in text of Polish case law in different forms:- full description - e.g. from text “ustawy z dnia 1 sierpnia 1997 r. o Trybunale Konstytucyjnym (Dz. U. Nr 102, poz. 643)” it will extract a map
{:journalEntry "643", :journalNo "102", :journalYear "1997"}
, in this case it is extracted withextract-act-coords-journal-with-dot
orextract-year-journal-nmb-and-entry
function by means of rules and regexes - non-full description - e.g. “ustawy o TK” or “u.w.l.”, in this case dictionary methods (described below) are used
- full description - e.g. from text “ustawy z dnia 1 sierpnia 1997 r. o Trybunale Konstytucyjnym (Dz. U. Nr 102, poz. 643)” it will extract a map
Dictionary methods:
- global dictionary - a list of regexes called
dictionary-for-acts
is used to match texts that describe law acts - local implicit dictionary - input text is split by
acts-txts-split-regex
to extract acts descriptions; from full descriptions (see above) act coords and full name of act are extracted (example of full name is “ustawy z dnia 1 sierpnia 1997 r. o Trybunale Konstytucyjnym”) - the name is used to match texts that describe law acts - local explicit dictionary - input text is split by
acts-txts-split-regex
to extract acts descriptions; from full descriptions (see above) that contain explicit definition of law act abbreviation (e.g. “dalej: ustawa o TK”) the abbreviation is extracted to match texts that describe law acts
sort-arts
(sort-arts arts)
Sorts the seq of arts
according to succesive comparisons of :art
, :par
, :ust
, :pkt
, :lit
, :zd
fields.