Getting started¶
Installation¶
Installing Python and pip¶
Before installing Kinoko, you need to make sure you have Python and pip
– the Python package manager – up and running. You can verify if you're already good to go with the following commands:
python --version
# Python 2.7.13 or above
pip --version
# pip 9.0.1 or above
Or else, you can install them by either of following
- pyenv
curl https://pyenv.run | bash
- Anaconda
curl https://repo.anaconda.com/archive/Anaconda3-2019.03-MacOSX-x86_64.sh | bash
Installing Kinoko¶
using either the PyPI repo or directly from GitHub
- official PyPI repo
pip install kinoko
- directly from GitHub code
pip install git+https://github.com/koyo922/kinoko@master # or other branch
Speedup in mainland China
consider using Aliyun mirror of PyPI for speed up
pip install -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com kinoko
Usage¶
A few demos below:
chasing HTTP-redirection(3xx)¶
$ chaseurl --help
Usage:
chase_url [options]
Options:
-i INPUT --input=INPUT input file [default: /dev/stdin]
BETTER TO USE FILE THAN PIPE, for a meaningful progressbar
-o OUTPUT --output=OUTPUT output file [default: /dev/stdout]
-m MAX_DEPTH --max_depth=MAX_DEPTH max depth of redirection [default: 5]
-t TEMPLATE --template=TEMPLATE output template [default: {n_jumps} {url} {tgt_url}]
supported elements: (n_jumps, url, tgt_url, all_jumps, exception)
NOTE: curly braces are needed, <tab> need to be bash-escaped via $'\t'
$ echo 'http://www.jingdong.com' | chaseurl 2>/dev/null
2 http://www.jingdong.com https://www.jd.com/
$ echo -e 'http://www.jingdong.com\nhttp://www.baidu.com' > url.txt
$ chaseurl -i url.txt -o 'redir.txt' --template '{tgt_url}'
$ cat redir.txt
https://www.jd.com/
http://www.baidu.com
Note
2>/dev/null
suppress the progress bar output- default output template is
'{n_jumps}\t{url}\t{tgt_url}'
get logger
object¶
from kinoko.misc.log_writer import init_log
logger = init_log(__name__)
...
logger.info('msg: %s', msg)
LEVEL=DEBUG python script.py
# default LEVEL=INFO
Caution
The default logger in python logging
module is not multiprocess-rotation-safe; we are planing to fix it in version 1.1.0
bash utils¶
$ colormsg "some message" WARNING # default LEVEL=INFO
==> some message # in yellow color
Warning
colormsg
does not work on Mac Bash
# turn on 64GB of virtual memory at /home/work/swap
vmem.sh -a on -s 64 -f /home/work/swap
csv utils¶
aggregate a tsv/csv file
cat <<'EOF' > infile.tsv
21 male 永强 1
22 male 永强 2
20 female 刘英 3
20 female 刘英 4
EOF
# aggregation by first 3 columns, summing the last column
aggtsv --infile infile.tsv --sep $'\t' \
-k 0 1 2 -r 3 -a sum
21 male 永强 1
22 male 永强 2
20 female 刘英 7
patch a tsv file via one or more reference files
$ cat <<'EOF' > ref.tsv
jiaose 角色 juese
xxx 色情词 <DEL>
EOF
$ cat <<'EOF' > in.tsv
field1 field2 角色 jiaose field4
field1 field2 角色 jiaose field4
field1 field2 色情词 xxx field4
EOF
$ patchtsv -r ref.tsv -d $'\t' \
-i in.tsv -o out.tsv \
-k 3 2 -v 3
$ cat out.tsv
field1 field2 角色 juese field4
field1 field2 角色 juese field4
functional utils¶
sliding window of any sequence
>>> from kinoko.func import sliding
>>> for grp in sliding(range(10), size=5 , step=3):
... print(grp)
...
[0, 1, 2, 3, 4]
[3, 4, 5, 6, 7]
>>> for grp in sliding(range(10), size=5 , step=3, skip_non_full=False):
... print(grp)
...
[0, 1, 2, 3, 4]
[3, 4, 5, 6, 7]
[6, 7, 8, 9]
[9]
C-equivalent static vars of function
>>> from kinoko.func import static_vars
>>> @static_vars(counter=0)
... def foo():
... foo.counter += 1
... print(foo.counter * 10)
...
... foo()
10
... foo()
20
... foo()
30