Skip to content

Commit 5e48a01

Browse files
committed
Update lessons
1 parent 8fb8b53 commit 5e48a01

File tree

10 files changed

+76
-33
lines changed

10 files changed

+76
-33
lines changed

01_Getting_Metafacture.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,10 @@ installation is needed. The Playground is a web interface that helps you getting
1515
It is useful to test, share and export metafacture workflows.
1616

1717
Starting with [Chapter 6](https://github.com/metafacture/metafacture-tutorial/blob/main/06_MetafactureCLI.md)
18-
we switch from using Playground to running Metafacture on our own Hardware.
19-
At this point, to be able to follow the examples, you need a Linux/Unix Bash Shell (part of every Linux, MacOS and Windows >=10)
20-
with Metafacture Core and Metafacture Fix installed.
18+
we can switch from using Playground to running Metafacture on our own Hardware.
19+
But the examples are still provided in the playground.
20+
21+
To run Metafacture on your local maschine you need you need a Linux/Unix Bash Shell (part of every Linux, MacOS and Windows >=10) with Metafacture Core installed. In this course we are not teaching you how to use the command line. For that see:
22+
2123

2224
**Next lesson**: [02 Introduction into Metafacture Flux](./02_Introduction_into_Metafacture-Flux.md)

04_Fix-Path.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -118,14 +118,20 @@ So lets do some simple excercises:
118118
[See here](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-records%0A%7Cdecode-yaml%0A%7Cfix%28transformationFile%29%0A%7Cencode-json%28prettyPrinting%3D%22true%22%29%0A%7Cprint%0A%3B&transformation=move_field%28%22a%22%22%2C+%22title%22%29%0Apaste%28%22author%22%2C+%22...%22%2C+...%2C+%22~from%22%2C+...%29%0Aretain%28%22title%22%2C+%22author%22%29&data=---%0Aa%3A+Faust%0Ab+%3A%0A++ln%3A+Goethe%0A++fn%3A+JW%0Ac%3A+Weimar%0A%0A---%0Aa%3A+R%C3%A4uber%0Ab+%3A%0A++++ln%3A+Schiller%0A++++fn%3A+F%0Ac%3A+Weimar)
119119
</details>
120120

121+
## Repeated fields and arrays
121122

122123
There are two extra path structures that need to be explained:
123124

124-
* repeatable fields
125+
* repeated fields
125126
* arrays
126127

128+
In general: Repeated fields as well arrays are both handled as arrays. They can also call these internal arrays lists.
129+
Both names (list and array) are reflected in some fix functions (e.g. `add_array` or the `list`-Bind.)
130+
127131
In an data set an element sometimes can have multiple instances. Different data models solve this possibility differently. XML-Records can have all elements multiple times, element repition is possible and in many schemas it is (partly) allowed. E.g. the subject element exists three times:
128132

133+
### Working with repeated fields
134+
129135
```XML
130136
<subject>Metadata</subject>
131137
<subject>Datatransformation</subject>
@@ -152,6 +158,8 @@ If you want to refer to all creators then you can use the array wildcard `*` whi
152158
[See here](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-records%0A%7Cdecode-yaml%0A%7Cfix%28transformationFile%29%0A%7Cencode-json%28prettyPrinting%3D%22true%22%29%0A%7Cprint%0A%3B&transformation=append%28%22creator.1%22%2C%22+Jonas%22%29%0Aappend%28%22creator.2%22%2C%22+Shaw%22%29%0Aappend%28%22creator.3%22%2C%22+Andrews%22%29%0Aprepend%28%22creator.%2A%22%2C%22Investigator+%22%29&data=---%0Acreator%3A+Justus%0Acreator%3A+Peter%0Acreator%3A+Bob%0A)
153159
</details>
154160

161+
### Working with JSON and Yaml arrays
162+
155163
In JSON or YAML element repetion is possible but unusual. Instead of repeating elements repetition is constructed as list so that an element can have more than one value. This is called an array and looks like this in YAML:
156164

157165
In our book example e.g. we have the following array:
@@ -236,7 +244,7 @@ e.g.:
236244

237245
[Here is a way to collect and count all paths in all records by using the `list-fix-paths`-command.](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-lines%0A%7C+decode-pica%0A%7C+list-fix-paths%0A%7C+print%0A%3B&data=001@+%1Fa5%1F01-2%1E001A+%1F01100%3A15-10-94%1E001B+%1F09999%3A12-06-06%1Ft16%3A10%3A17.000%1E001D+%1F09999%3A99-99-99%1E001U+%1F0utf8%1E001X+%1F00%1E002@+%1F0Aag%1E003@+%1F0482147350%1E006U+%1F094%2CP05%1E007E+%1F0U+70.16407%1E007I+%1FSo%1F074057548%1E011@+%1Fa1970%1E017A+%1Farh%1E021A+%1FaDie+@Berufsfreiheit+der+Arbeitnehmer+und+ihre+Ausgestaltung+in+vo%CC%88lkerrechtlichen+Vertra%CC%88gen%1FdEine+Grundrechtsbetrachtg%1E028A+%1F9106884905%1F7Tn3%1FAgnd%1F0106884905%1FaProjahn%1FdHorst+D.%1E033A+%1FpWu%CC%88rzburg%1E034D+%1FaXXXVIII%2C+165+S.%1E034I+%1Fa8%1E037C+%1FaWu%CC%88rzburg%2C+Jur.+F.%2C+Diss.+v.+7.+Aug.+1970%1E%0A001@+%1F01%1Fa5%1E001A+%1F01140%3A08-12-99%1E001B+%1F09999%3A05-01-08%1Ft22%3A57%3A29.000%1E001D+%1F09999%3A99-99-99%1E001U+%1F0utf8%1E001X+%1F00%1E002@+%1F0Aa%1E003@+%1F0958090564%1E004A+%1Ffkart.+%3A+DM+9.70%2C+EUR+4.94%2C+sfr+8.00%2C+S+68.00%1E006U+%1F000%2CB05%2C0285%1E007I+%1FSo%1F076088278%1E011@+%1Fa1999%1E017A+%1Farb%1Fasi%1E019@+%1FaXA-AT%1E021A+%1FaZukunft+Bildung%1FhPolitische+Akademie.+%5BHrsg.+von+Gu%CC%88nther+R.+Burkert-Dottolo+und+Bernhard+Moser%5D%1E028C+%1F9130681849%1F7Tp1%1FVpiz%1FAgnd%1F0130681849%1FE1952%1FaBurkert%1FdGu%CC%88nther+R.%1FBHrsg.%1E033A+%1FpWien%1FnPolit.+Akad.%1E034D+%1Fa79+S.%1E034I+%1Fa24+cm%1E036F+%1Fx299+12%1F9551720077%1FgAdn%1F7Tb1%1FAgnd%1F01040469-7%1FaPolitische+Akademie%1FgWien%1FYPA-Information%1FhPolitische+Akademie%2C+WB%1FpWien%1FJPolitische+Akad.%2C+WB%1Fl99%2C2%1E036F/01+%1Fx12%1F9025841467%1FgAdvz%1Fi2142105-5%1FYAktuelle+Fragen+der+Politik%1FhPolitische+Akademie%1FpWien%1FJPolitische+Akad.+der+O%CC%88VP%1FlBd.+2%1E045E+%1Fa22%1Fd18%1Fm370%1E047A+%1FSFE%1Fata%1E%0A001@+%1Fa5%1F01%1E001A+%1F01140%3A19-02-03%1E001B+%1F09999%3A19-06-11%1Ft01%3A20%3A13.000%1E001D+%1F09999%3A26-04-03%1E001U+%1F0utf8%1E001X+%1F00%1E002@+%1F0Aal%1E003@+%1F0361809549%1E004A+%1FfHlw.%1E006U+%1F000%2CL01%1E006U+%1F004%2CP01-s-41%1E006U+%1F004%2CP01-f-21%1E007G+%1FaDNB%1F0361809549%1E007I+%1FSo%1F072658383%1E007M+%1F04413/0275%1E011@+%1Fa1925%1E019@+%1FaXA-DXDE%1FaXA-DE%1E021A+%1FaHundert+Jahre+Buchdrucker-Innung+Hamburg%1FdWesen+u.+Werden+d.+Vereinigungen+Hamburger+Buchdruckereibesitzer+1825-1925+%3B+Gedenkschrift+zur+100.+Wiederkehr+d.+Gru%CC%88ndungstages%2C+verf.+im+Auftr.+d.+Vorstandes+d.+Buchdrucker-Innung+%28Freie+Innung%29+zu+Hamburg%1FhFriedrich+Voeltzer%1E028A+%1F9101386281%1F7Tp1%1FVpiz%1FAgnd%1F0101386281%1FE1895%1FaVo%CC%88ltzer%1FdFriedrich%1E033A+%1FpHamburg%1FnBuchdrucker-Innung+%28Freie+Innung%29%1E033A+%1FpHamburg%1Fn%5BVerlagsbuchh.+Broschek+%26+Co.%5D%1E034D+%1Fa44+S.%1E034I+%1Fa4%1E%0A001@+%1Fa5%1F01-3%1E001A+%1F01240%3A01-08-95%1E001B+%1F09999%3A24-09-10%1Ft17%3A42%3A20.000%1E001D+%1F09999%3A99-99-99%1E001U+%1F0utf8%1E001X+%1F00%1E002@+%1F0Af%1E003@+%1F0945184085%1E004A+%1F03-89007-044-2%1FfGewebe+%3A+DM+198.00%2C+sfr+198.00%2C+S+1386.00%1E006T+%1F095%2CN35%2C0856%1E006U+%1F095%2CA48%2C1186%1E006U+%1F010%2CP01%1E007I+%1FSo%1F061975997%1E011@+%1Fa1995%1E017A+%1Fara%1E021A+%1Fx213%1F9550711899%1FYNeues+Handbuch+der+Musikwissenschaft%1Fhhrsg.+von+Carl+Dahlhaus.+Fortgef.+von+Hermann+Danuser%1FpLaaber%1FJLaaber-Verl.%1FS48%1F03-89007-030-2%1FgAc%1E021B+%1FlBd.+13.%1FaRegister%1Fhzsgest.+von+Hans-Joachim+Hinrichsen%1E028C+%1F9121445453%1F7Tp3%1FVpiz%1FAgnd%1F0121445453%1FE1952%1FaHinrichsen%1FdHans-Joachim%1E034D+%1FaVIII%2C+408+S.%1E045V+%1F9090001001%1E047A+%1FSFE%1Fagb/fm%1E%0A001@+%1F01-2%1Fa5%1E001A+%1F01239%3A18-08-11%1E001B+%1F09999%3A05-09-11%1Ft23%3A31%3A44.000%1E001D+%1F01240%3A30-08-11%1E001U+%1F0utf8%1E001X+%1F00%1E002@+%1F0Af%1E003@+%1F01014417392%1E004A+%1Ffkart.%1E006U+%1F011%2CA37%1E007G+%1FaDNB%1F01014417392%1E007I+%1FSo%1F0752937239%1E010@+%1Fager%1E011@+%1Fa2011%1E017A+%1Fara%1Fasf%1E021A+%1Fxtr%1F91014809657%1F7Tp3%1FVpiz%1FAgnd%1F01034622773%1FE1958%1FaLu%CC%88beck%1FdMonika%1FYPersonalwirtschaft+mit+DATEV%1FhMonika+Lu%CC%88beck+%3B+Helmut+Lu%CC%88beck%1FpBodenheim%1FpWien%1FJHerdt%1FRXA-DE%1FS650%1FgAc%1E021B+%1FlTrainerbd.%1E032@+%1Fg11%1Fa1.+Ausg.%1E034D+%1Fa129+S.%1E034M+%1FaIll.%1E047A+%1FSFE%1Famar%1E047A+%1FSERW%1Fasal%1E047I+%1Fu%24%1Fc04%1FdDNB%1Fe1%1E)
238246

239-
Other ways are also possible too.
247+
Other ways are also possible, too.
240248

241249
## Bonus: XML in MF and their paths
242250

@@ -258,5 +266,4 @@ title.lang
258266

259267
If you want to create xml with attributes then you need to map to this structure too. We will come back to lection working with xml in lesson 10.
260268

261-
262269
Next lessons: [05 More Fix Concepts](./05-More-Fix-Concepts.md)

05-More-Fix-Concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ end
160160

161161
Metafacture supports lots of conditionals, find a list of all of them [here](https://github.com/metafacture/metafacture-documentation/blob/master/Fix-function-and-Cookbook.md#conditionals).
162162

163-
Hint: Some conditionals have variations with `all_` or `any_` while they behave in the same way if you process them on simple string-elements. They also can be used with arrays/lists then the conditional has different out-come depending on the fact that all (`all_`) or at least one (`any_`) value of an array matches the requierement.
163+
Hint: Some conditionals have variations with `all_`, `any_` or `none` while they behave in the same way if you process them on simple string-elements. They also can be used with arrays/lists then the conditional has different out-come depending on the fact that all (`all_`) or at least one (`any_`) value of an array matches the requierement. `none` checks if the conditionally does not match.
164164

165165
## Selectors
166166

06_MetafactureCLI.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ For this lesson basic knowledge of the commandline is recommended.
1111
Check if Java 11 or higher is installed with `java -version` in your terminal.
1212
If not, install JAVA 11 or higher.
1313

14-
To use Metafacture on the commandline we can download the latest runner of Metafacture Fix:
14+
To use Metafacture on the commandline we can download the latest distribution e.g.: `metafacture-core-7.0.0-dist.zip`:
1515

16-
[https://github.com/metafacture/metafacture-fix/releases](https://github.com/metafacture/metafacture-fix/releases)
16+
[https://github.com/metafacture/metafacture-core/releases](https://github.com/metafacture/metafacture-core/releases)
1717

18-
Unzip the downloaded metafix-runner distribution to your choosen folder
18+
Unzip the downloaded metafacture distribution to your choosen folder
1919

2020
## How to run Metafacture via CLI
2121

@@ -24,13 +24,13 @@ You can run your workflows:
2424
Unix:
2525

2626
```bash
27-
./bin/metafix-runner path/to/your.flux
27+
./metafacture-core-.../flux.sh path/to/your.flux
2828
```
2929

3030
or Windows:
3131

3232
```bash
33-
./bin/metafix-runner.bat path/to/your.flux
33+
./metafacture-core-.../flux.bat path/to/your.flux
3434
```
3535

3636
(Hint: You need to know the path to your file to run the function.)
@@ -47,13 +47,13 @@ Export the workflow with the Export Button and lets run the flux.
4747
Linux:
4848

4949
```bash
50-
./bin/metafix-runner downloads/playground.flux
50+
./metafacture-core-.../flux.sh downloads/playground.flux
5151
```
5252

5353
or Windows:
5454

5555
```bash
56-
./bin/metafix-runner.bat downloads/playground.flux
56+
./metafacture-core-.../flux.bat downloads/playground.flux
5757
```
5858

5959
The result of running the Flux-Script via CLI should be the same as with the Playground.
@@ -148,15 +148,16 @@ FILE
148148
You could use:
149149

150150
```bash
151-
./bin/metafix-runner path/to/your.flux FILE="path/to/your/file.json"
151+
./metafacture-core-.../flux.sh path/to/your.flux FILE="path/to/your/file.json"
152152
```
153153

154154

155+
Excercise: Download the following folder with three test examples and run them. Adjust them if needed:
155156

156-
TODO: Give homework:
157-
- Provide a file or a file-folder.
158-
- Give a homework.
159-
- Give the solution.
160-
157+
- Run example script locally.
158+
- Adjust example script so that all json files but no other in the folder are read. Get inspired by https://github.com/metafacture/metafacture-core/blob/master/metafacture-runner/src/main/dist/examples/misc/reading-dirs/read-dirs.flux.
159+
- Change the FLUX script so that you write the output in the local file instead of stoudt.
160+
- Add a fix file and add the fix module in the flux. With `nothing()` as content.
161+
- Add some transformations to the fix e.g. add fields.
161162

162163
Next lesson: [07 Processing MARC](./07_Processing_MARC.md)

07_Processing_MARC.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,8 @@ join_field(isbn,",")
304304
retain("id","title","isbn")
305305
```
306306

307+
HINT: Sometimes it makes sense to create an empty array by `add_array` or an empty hash/object by `add_hash` before adding content to the array or hash. The is depending to the use-cases. In our case we need empty values if no field is mapped for the csv.
308+
307309
Step 2, create the flux workflow and execute this worklow either with CLI or the playground:
308310

309311
```default
@@ -337,15 +339,14 @@ You will see this as output:
337339
"1080278184","Renfro Valley Kentucky Rainer H. Schmeissner",""
338340
```
339341

340-
In the fix above we mapped the 245-field to the title, and iterated over every subfield with the help of the list-bind and the `?`- wildcard.
341-
. The ISBN is in the 020-field. Because MARC records can contain one or more 020 fields we created an isbn array with add_arrayy and added the values using the isbn.$append syntax. Next we turned the isbn array back into a comma separated string using the join_field fix. As last step we deleted all the fields we didn’t need in the output with the `retain` syntax.
342+
In the fix above we mapped the 245-field to the title, and iterated over every subfield with the help of the list-bind and the `?`- wildcard. The ISBN is in the 020-field. Because MARC records can contain one or more 020 fields we created an isbn array with add_arrayy and added the values using the isbn.$append syntax. Next we turned the isbn array back into a comma separated string using the join_field fix. As last step we deleted all the fields we didn’t need in the output with the `retain` syntax.
343+
344+
Different versions of MARC-Serialization need different workflows: e.g. h[ere see an example of Aseq-Marc Files that are transformed to marcxml.](https://test.metafacture.org/playground/?flux=%22https%3A//raw.githubusercontent.com/LibreCat/Catmandu-MARC/dev/t/rug01.aleph%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-aseq%0A%7C+merge-same-ids%0A%7C+encode-marcxml%0A%7C+print%0A%3B)
342345

343346
In this post we demonstrated how to process MARC data. In the next post we will show some examples how catmandu typically can be used to process library data.
344347

345348
## Excercise.
346349

347-
348-
349-
# TODO_ Add example that transforms aleph sequential. Also open ticket, that enables the transformation.
350+
TODO: ADD some examples for MARC, e.g. the examples from our last workshop.
350351

351352
Next lesson: [08 Harvest data with OAI-PMH](./08_Harvest_data_with_OAI-PMH.md)

08_Harvest_data_with_OAI-PMH.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,28 +31,25 @@ To get some Dublin Core records from the collection of Ghent University Library
3131

3232
But if you just want to use the specific metadata records and not the oai-pmh specific metadata wrappers then specify the xml handler like this: `| handle-generic-xml(recordtagname="dc")`
3333

34-
You can also harvest MARC data and store it in a file:
34+
You can also harvest MARC data, serialze it to marc-binary and store it in a file:
3535

3636
```default
3737
"https://lib.ugent.be/oai"
3838
| open-oaipmh(metadataPrefix="marcxml", setSpec="flandrica")
3939
| decode-xml
4040
| handle-marcxml
41-
| encode-json(prettyPrinting="true")
42-
| print
41+
| encode-marc21
42+
| write("ugent.mrc")
4343
;
4444
```
4545

46-
> TODO: Revisit this example when https://github.com/metafacture/metafacture-core/issues/454 is fixed.
47-
4846
You can also transform incoming data and immediately store/index it with MongoDB or Elasticsearch. For the transformation you need to create a fix (see Lesson 3) in the playground or in a text editor:
4947

5048
Add the following fixes to the file:
5149

5250
```PEARL
5351
copy_field("001","_id")
5452
copy_field("245??.a","title")
55-
add_arrayy("creator[]")
5653
copy_field("100??.a","creator[].$append")
5754
copy_field("260??.c","date")
5855
retain("_id","title","creator[]","date")
@@ -66,12 +63,13 @@ Now you can run an ETL process (extract, transform, load) with this worklflow:
6663
| decode-xml
6764
| handle-marcxml
6865
| fix(transformationFile)
69-
| encode-json(prettyPrinting="true")
66+
| encode-json
7067
| json-to-elasticsearch-bulk(idkey="_id", type="resource", index="resources-alma-fix-staging")
7168
| print
7269
;
7370
```
7471

72+
Excercise: Try to fetch data from a OAI-PMH you know. (Add an example for people who do not know.)
7573

7674
Next lesson:
7775
[09 Working with CSV and TSV](./09_Working_with_CSV.md)

10_Working_with_XML.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,33 @@ Another important thing, when working with xml data sets is to specify the recor
9595

9696
https://metafacture.org/playground/?flux=%22http%3A//www.lido-schema.org/documents/examples/LIDO-v1.1-Example_FMobj00154983-LaPrimavera.xml%22%0A%7C+open-http%0A%7C+decode-xml%0A%7C+handle-generic-xml%28recordtagname%3D%22lido%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B
9797

98-
> TODO: Add namespace handling.
98+
99+
## Bonus: Working with namespaces
100+
101+
XML elements often come with namespaces. By default namespaces are not emitted, only the element names are provided.
102+
When elements have the name but belong to different namespaces, or you want to emit the incoming namespaces you can use
103+
the option `emitnamespace="true"` for the `handle-generic-xml` command.
104+
105+
Add this option to the previous example and see that there are elements belonging to lido as well as skos.
106+
107+
See this in the Playground [here](https://metafacture.org/playground/?flux=%22http%3A//www.lido-schema.org/documents/examples/LIDO-v1.1-Example_FMobj00154983-LaPrimavera.xml%22%0A%7C+open-http%0A%7C+decode-xml%0A%7C+handle-generic-xml%28recordtagname%3D%22lido%22%2C+emitnamespace%3D%22true%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B).
108+
109+
When you want to add the namespace definition to the output metafacture does not know that by itself but you have to tell metafacture
110+
the new namespace when `encoding-xml` either by a file with the option `namespacefile` or in the flux with the option `namespaces`.
111+
112+
See here an example for adding namespaces in the flux:
113+
114+
```
115+
inputFile
116+
| open-file
117+
| as-lines
118+
| decode-formeta
119+
| fix(transformationFile)
120+
| encode-xml(rootTag="collection",namespaces="__default=http://www.w3.org/TR/html4/\ndcterms=http://purl.org/dc/terms/\nschema=http://schema.org/")
121+
| print
122+
;
123+
```
124+
99125
> Add excercises.
100126

101127
Next lesson: [11 Mapping Marc to Dublin Core](./11_MARC_to_Dublin_Core.md)

data/example1/example1.flux

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
FLUX_DIR + "data/example1/input/input.json"
2+
| open-file
3+
| decode-json
4+
| encode-yaml
5+
| print
6+
;

data/example1/input/input.json

Whitespace-only changes.

data/example1/input/placeholder

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Here example inputdata.
2+
Put xml and json here.

0 commit comments

Comments
 (0)