Archive

Archive for the ‘Software Center’ Category

Linux Commands for Installing Software


Wireshark on Fedora:

su —
yum install wireshark
yum install wireshark-gnome


Categories: Software Center Tags:

Clone VM between Different ESXi Servers

Sometime the clone of a fully configured VM is convenient for us. There are two cases for cloning VM:

  1. Clone a VM inside a ESXi server;
  2. Clone between different ESXi servers.

For the 1st case, there is GUI guide inside ESXi server to lead you do that. However, it is also very simple to do the same job without a GUI, for both case 1 and 2.

Instructions:

  1. From the original folder, copy the .vmx and .vmdk files to your destination folder.
  2. Right-click on the .vmx configuration file. Select “Add to Inventory”.

More details:

http://www.dedoimedo.com/computers/vmware-esxi-clone-machine.html

Categories: Software Center Tags: ,

Install A New Linux OS in VMware ESXi Server

P(`7$HA7WP7B@{HNQZ)2A~6

  1. Download the Linux ISO distribution to your local computer.
  2. Create a new VM in VSphere Client. The default setting only has one hard disk. To make /home occupying a single disk, you’d better add a new hard disk in “Resource Allocation” tab of the VM you just created.
  3. In “Summary”->”Resources”->"datastore”, right click and browse into the directory with the name of VM you have created. Upload ISO file from your local computer to that directory.
    image
  4. “Resource Allocation”->”Edit … VM Properties” –> CD/DVD Drive 1 –> connect at power on, connect ISO image in the datastore –> OK. Power on to start VM.
  5. Install your new OS as usual.

VMWare Operations:

Ctrl + Alt: release your mouse

Ctrl + Alt + Insert: restart

Categories: Software Center Tags: , ,

Get Started with DEX Graph Database

DEX claimed itself as a high-performance and scalable graph database, which is very attractive for NoSQL database applications. View here for the impressive comparison between DEX and peering products. I also wrote a post about graph databases to compare DEX with others, and the result shows DEX is among the best.

However, the example in Java API for DEX version 4.3 is not updated — they use old examples in old version, which are no longer compatible. A migration manual is useful when you want to write your code based on new version; but not always. This post will show you how to create DEX applications based on new version 4.3.

  1. Download DEX here. The free version can support up to 1 million nodes, which is a constraint compared with Neo4J.
  2. An instruction for use can be found here, but the example in Appendix A is for old version. To deploy, unpack it and add /lib/dexjava.jar to your Java project. Really neat.
  3. The following is the Java code to create new node types, new edge types, select nodes from a specific node type, select nodes from a specific property, and get neighbors of a node. It is straight-forward and should be not hard to read. It works under new version 4.2 & 4.3!

(To understand the Java code, you need to know DEX is based on property graph model)

import java.io.FileNotFoundException;
import java.util.Date;
import com.sparsity.dex.gdb.AttributeKind;
import com.sparsity.dex.gdb.Condition;
import com.sparsity.dex.gdb.DataType;
import com.sparsity.dex.gdb.Database;
import com.sparsity.dex.gdb.Dex;
import com.sparsity.dex.gdb.DexConfig;
import com.sparsity.dex.gdb.EdgesDirection;
import com.sparsity.dex.gdb.Graph;
import com.sparsity.dex.gdb.Objects;
import com.sparsity.dex.gdb.ObjectsIterator;
import com.sparsity.dex.gdb.Session;
import com.sparsity.dex.gdb.Value;

public class example {

	public static void main(String[] args)
			throws FileNotFoundException {
		Dex dex = new Dex(new DexConfig());
		Database gpool = dex.create("example.dex",
				"DEXEXAMPLE");
		Session sess = gpool.newSession();

		// node types
		sess.begin();
		Graph dbg = sess.getGraph();
		int person = dbg.newNodeType("PERSON");
		int name = dbg.newAttribute(person, "NAME",
				DataType.String, AttributeKind.Indexed);
		int age = dbg.newAttribute(person, "AGE",
				DataType.Integer, AttributeKind.Basic);
		long p1 = dbg.newNode(person);
		dbg.setAttribute(p1, name,
				new Value().setString("JOHN"));
		dbg.setAttribute(p1, age,
				new Value().setInteger(18));
		long p2 = dbg.newNode(person);
		dbg.setAttribute(p2, name,
				new Value().setString("KELLY"));
		long p3 = dbg.newNode(person);
		dbg.setAttribute(p3, name,
				new Value().setString("MARY"));
		sess.commit();

		// edge types
		sess.begin();
		int phones = dbg.newEdgeType("PHONES", true, true);
		int when = dbg.newAttribute(phones, "WHEN",
				DataType.Timestamp, AttributeKind.Basic);
		long e4 = dbg.newEdge(phones, p1, p3);
		dbg.setAttribute(e4, when,
				new Value().setTimestamp(new Date()));
		long e5 = dbg.newEdge(phones, p1, p3);
		dbg.setAttribute(e5, when,
				new Value().setTimestamp(new Date()));
		long e6 = dbg.newEdge(phones, p3, p2);
		dbg.setAttribute(e6, when,
				new Value().setTimestamp(new Date()));
		sess.commit();

		// Select all objects from a specific node type
		sess.begin();
		Objects persons = dbg.select(person);
		ObjectsIterator it = persons.iterator();
		while (it.hasNext()) {
			long p = it.next();
			Value v = new Value();
			dbg.getAttribute(p, name, v);
			System.out.println(v.getString());
		}
		it.close();
		persons.close();
		sess.commit();

		sess.begin();
		// get nodes from a specific property
		persons = dbg.select(name, Condition.Equal,
				new Value().setString("JOHN"));
		it = persons.iterator();
		while (it.hasNext()) {
			long p = it.next();
			Value v = new Value();
			dbg.getAttribute(p, name, v);
			System.out.println(v.getString());
		}

		// get neighbors
		persons = dbg.explode(p1, phones,
				EdgesDirection.Outgoing);
		it = persons.iterator();
		it.close();
		persons.close();
		sess.commit();

		sess.close();
		gpool.close();
		dex.close();
	}
}

Setup Hadoop on Ubuntu 11.04 64-bit

Hadoop documentation page has provided a clear statement for hadoop setup on Linux. However, in this entry I want to make the same process simpler and shorter, tailored to suit Ubuntu 11.04 64-bit OS.

1. Install Sun JDK

Sun JDK is unavailable in the official repository of Ubuntu Software Center. What a shame! Let’s resort to an external PPA (Personal Package Archives). Launch the Terminal and run the following commands:

sudo add-apt-repository ppa:ferramroberto/java
sudo apt-get update
sudo apt-get install sun-java6-bin
sudo apt-get install sun-java6-jdk

Add JAVA_HOME variable:

sudo gedit /etc/environment

Append a new line in the file:

export JAVA_HOME="/usr/lib/jvm/java-6-sun-1.6.0.26"

Test the success of installation in Terminal:

java -version

2. Check SSH Setting

ssh localhost

If it says “connection refused”, you’d better reinstall SSH:

sudo apt-get install openssh-server openssh-client

If you cannot ssh to localhost without a passphrase, execute the following commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3. Setup Hadoop

Download a recent stable release and unpack it. Edit conf/hadoop-env.sh to define JAVA_HOME as "/usr/lib/jvm/java-6-sun-1.6.0.26":

# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26

Pseudo-Distributed Operation:

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

Switch to hadoop root directory and format a new distributed file system:

bin/hadoop namenode -format

You’ll get info like “Storage directory /tmp/hadoop-jasper/dfs/name has been successfully formatted.” Remember this path is the HDFS home directory of namenode.

Start and stop hadoop daemons:

bin/start-all.sh 
bin/stop-all.sh

Web interfaces for the NameNode and the JobTracker:

4. Deploy An Example Map-Reduce Job

Let’s run the WordCount example job, which is already embedded in hadoop release. In your local directory, e.g., “/home/jasper/mapreduce/wordcount/”, put some text files. Then copy these files from local directory to HDFS directory and list them:

bin/hadoop dfs -copyFromLocal /home/jasper/mapreduce/wordcount /tmp/hadoop-jasper/dfs/name/wordcount

bin/hadoop dfs -ls /tmp/hadoop-jasper/dfs/name/wordcount

Run the job:

bin/hadoop jar hadoop*examples*.jar wordcount /tmp/hadoop-jasper/dfs/name/wordcount /tmp/hadoop-jasper/dfs/name/wordcount-output

If the output info looks no problem, copy the output file from HDFS to local directory:

bin/hadoop dfs -getmerge /tmp/hadoop-jasper/dfs/name/wordcount-output /home/jasper/mapreduce/wordcount/

Now you can open the output file in your local directory to view the results.

A Survey on Graph Databases

Graph Databases were also discussed in my previous entry about NoSQL databases. Two other valuable surveys about graph databases are a post in ReadWriteWeb and a page in DBPedias. While they are from the top view by conceptual and framework sides respectively, here I mainly start from the bottom view by looking at their manipulations and functions. In addition, this entry covers more products than them.

In Graph Theory, a simple graph is a set of nodes and edges. While this definition is fundamental, graph databases usually add types and attributes on both nodes and edges to make themselves more descriptive and practical in use. At least, graph databases are expected to support fast traversal — this is the reason why we do not simply use tabular databases like HBase or Cassandra to store all the edges (join operation is expensive).

In the previous entry we say graph databases are one of four major categories of NoSQL databases. Also, seven products are listed in the category of graph store: Neo4J, Infinite Graph, DEX, InfoGrid, HyperGraphDB, Trinity and AllegroGraph. We discuss each of them in detail in this entry, mainly from the perspective of how to use them as a Java programmer.

1. Neo4J (Neo Technology)

Neo4J may be the most popular graph database. From the name we know Neo4J is particularly developed for Java applications, but it also supports Python. Neo4J is an open source project available in a GPLv3 Community edition, with Advanced and Enterprise editions available under both the AGPLv3 as well as a commercial license.

The graph model in Neo4J is shown in Figure 1. In simple words,

  • Property (key-value pair) can be added to both node and edge;
  • Only edges can be associated with a type, e.g., “KNOWS”;
  • Edges can be specified as directed or undirected.

image
Figure 1

Given the name of a node, if you want to locate this node in the graph, then you need the help of an index. Neo4J uses the following index mechanism: a super referenceNode is connected all the nodes by a special edge type “REFERENCE”. This actually allows you to create multiple indexes if you distinguish them by different edge types. The index structure is illustrated in Figure 2.

image

Figure 2

Neo4J also provides functions such as getting the neighbors of a specific node or all the shortest distance paths between two nodes. Notice that for all of these “traverse” functions, Neo4J needs you to specify the edge types along paths, which is handy.

There is no need to install Neo4J as a software. We can simply import the JAR file to build an embedded graph database, which will be persisted in your disk as a directory. The documentation of Neo4J looks complete. There is no limit about the maximum number of supported nodes in free version.

Weakness:

  • Although we can manually add a property on nodes with key “type” to annotate the type of node, it is better to provide native support for node types in API to make the graph model more general. Also the problem comes when a node has multiple types.
  • The index mechanism by adding new edges manually by user seems strange and not convenient. It is better to follow what the current relational DB does: users say “create index on a group of nodes”, then done.

Here is another entry about how to get started with Neo4J in Java: https://jasperpeilee.wordpress.com/2011/11/22/neo4j-the-first-cup-of-tea/

2. Infinite Graph (Objectivity Inc.)

InfiniteGraph is a graph database from Objectivity, the company behind the object database of the same name. The free license can only support up to 1 million nodes and edges. InfiniteGraph needs to be installed as a service, which behaves like traditional DB such as MySQL. InfiniteGraph borrows the object-oriented concepts from Objectivity/DB, so each node and edge in InfiniteGraph are objects. Specially,

  • All node classes will extend the base class BaseVertex;
  • All edge classes will extend the base class BaseEdge.

In the example page shown in http://wiki.infinitegraph.com/w/index.php?title=Tutorial:_Hello_Graph!, suppose Person is a node class and Meeting is an edge class. This is the code for adding an edge between two nodes:

Person john = new Person("John", "Hello ");
helloGraphDB.addVertex(john);
Person dana = new Person("Dana", "Database!");
helloGraphDB.addVertex(dana);
Meeting meeting1 = new Meeting("NY", "Graph");

image

Figure 3

InfiniteGraph also provides the visualization tool to view the data. The edge generated by the above code is visualized in Figure 3. Compared with the graph model of Neo4J in Figure 1, InfiniteGraph supports nodes with different types/classes. Note that the property key-value pairs in Neo4J can correspond to member variables in classes of InfiniteGraph.

Weakness:

  • It is fine to install as a service, but should make the configuration simple.
  • Since nodes and edges can be user-customized objects, I suspect the performance will be harmed for huge graphs when we enjoy the flexibility. Remember NoSQL databases should always keep high performance to make themselves compelling.

Note: My experience of getting started with InfiniteGraph on Win 7 64-bit OS is not smooth. The configuration shown in http://wiki.infinitegraph.com/w/index.php?title=InfiniteGraph_Installation seems not complete, which makes the Java programs keep throwing “….dll: Can’t find dependent libraries” error. Then I checked the dependency of that DLL file using Dependency Walker, the error “Modules with different CPU types were found” tells me probably InfiniteGraph does not support 64 bit OS. Finally, I switch to Ubuntu 64-bit OS, finding that InfiniteGraph only provides versions for Redhat/SUSE Linux OS.

3. DEX (Sparsity Technologies)

DEX is said to be a high-performance and scalable graph database, which is attractive for NoSQL applications. The personal evaluation version can support up to 1 million nodes. The current version is 4.2 and it supports both Java and .NET programming. Note that the old version 4.1 only supports Java and is not compatible with the new version. Until today Nov. 24, 2011, the documentation for new version 4.2 is not complete yet, and it is very hard to find a start example for the new version on the web. The migration file here will be very helpful to write programs based on old version examples.

image

Figure 4

Figure 4 shows the architecture of DEX, which explains why DEX can achieve a high performance. The native C++ DEX Core is the key. In the event page, the team shows some exciting applications based on DEX:

DEX is also portable, and you only need a JAR file to run. Not like Neo4J, the persisted database of DEX is only a single file. DEX Java API is easy to use, and Class Graph can provide nearly all the operations you need. To make DEX stronger, following weak points are expected to be eliminated:

  • Better to raise the limit for personal version to 1 billion nodes;
  • More complete documentation with fine examples;
  • Transplant the graph algorithms on old version to the new version in the near future.

Here is a new entry about how to deploy your graph with DEX.

4. InfoGrid (Netmesh Inc.)

InfoGrid calls itself as a “web graph database”, so some of its functions are oriented to web applications. Figure 5 shows the whole framework of InfoGrid, and Graph DB seems not a dominating component. InfoGrid has some applications in OpenID project, which is supported by the same company. I suspect InfoGrid is only used in the internal of Netmesh, because of the following weakness:

  • The newest Java API at here is incomplete and sometimes confused;
  • The tutorial at here is not written in a clear and formal way.

image

Figure 5

For the first step example at http://infogrid.org/wiki/Examples/FirstStep, while it is not hard to read overall, but the enums such as TAGLIBRARY, TAG, TAG_LABEL and TAGLIBRARY_COLLECTS_TAG make me really confused. These enums seems embedded in the model, and why is that? It looks like this example is used in the internal projects of Netmesh to serve for some specific application but who knows.

5. HyperGraphDB (Kobrix Inc.)

HyperGraphDB is an open source data storage mechanism with its implementation based on BerkeleyDB database. The graph model of HyperGraphDB is known as direct hypergraphs. In mathematics, a hypergraph allows its edge pointing to more than two nodes. HyperGraphDB extends this further by allowing edges to point to other edges, so HyperGraphDB offers more generality than other graph databases. Figure 6 shows a hypergraph example with four edges, distinguished by different colors.

image

Figure 6

The tutorial of HyperGraphDB looks complete. Each node in HyperGraphDB is called an atom, and operations like indexing and traversals are supported.

Note: Although the tutorial is written in a nice form, the same error “….dll: Can’t find dependent libraries” occurs on Win 7 OS. After I switch to Ubuntu 64-bit, the sample program throws exception “ELFCLASS32 (possible cause: architecture word width mismatch)”. That’s probably because HyperGraphDB only supports Linux 32-bit.

6. Trinity (Microsoft)

Microsoft joins the competition just recently and the first release V0.1 of Trinity only allows for intranet access. From the introduction, Trinity is a memory-based graph store with rich database features, including highly concurrent online query processing, ACI transaction support, etc. Trinity only provides C# APIs to the user for graph processing.

Since Trinity package is not open to the outside of Microsoft, we cannot know too much details of it. But at least, the key features of Trinity are listed below:

  • Use hypergraph as data model;
  • Applicable to be deployed in distributed mode.

The system architecture can be found here. Overall, it is hard to find any distinct advantages currently when we compare Trinity with other open source graph databases. However, since Trinity is still in its prototype stage, it is worth being noticed. In addition, Probase is an ongoing project that looks like an ontology/taxonomy knowledge bases built on top of Trinity. Here links to a nice article about Probase and Trinity.

7. AllegroGraph (Franz Inc.)

AllegroGraph is a persistent graph database that purportedly scales to “billions of RDF triples while maintaining high performance”. Although a RDF triple can be viewed as an edge, AllegroGraph is intended to build RDF-centric semantic web applications and supports SPARQL, RDFS++, and Prolog reasoning from client applications including Java programs. A free version of AllegroGraph RDFStore supports up to 50 million triples. 

image

Figure 7

Figure 7 shows an example of RDF graph. AllegroGraph appends an additional slot called “named graph” for each triple to make them as quads (but still call them triples for convenience). Here are some assertions from Figure 7.

subject  predicate   object   graph   
robbie   petOf       jans     jans's home page  
petOf    inverseOf   hasPet   english grammar  
Dog      subClassOf  Mammal   science 

To add a bunch of triples into RDF graph, AllegroGraph has facilities to bulk load from both N-triples and RDF/XML files. Overall, AllegroGraph is ideal for RDF storage, but not for general graphs. The documentation looks great. Find introduction here and for Java API tutorial, the Sesame version here and the Jena version here.

Overall comparison:

The overall comparison is shown in the table below. High-performance and distributed deploy are supposed to be supported by all products. “1M” means the corresponding graph databases can support 1 million nodes for free. RDF graphs can be viewed as a special kind of property graph. Since hypergraph is the most generic form of graphs, a graph database supporting hypergraph should also support property graphs theoretically.

 

1

2

3

4

5

6

7

Documentation?

Good Good Fair Bad Good Bad Good

Portable?

Y N Y Y Y N N
Java? Y Y Y Y Y N Y
Free? Y < 1M < 1M Y Y N < 50 M
Property Graph? Y Y Y Y Y Y RDF
Hypergraph? N N N N Y Y N

Tentative Ranking:

Which one is the best? The answer is usually “it depends”. Although it is always controversial to rank products with different characteristics, sometimes we need to make a hard decision. I show the following general rules based on my personal understanding:

  • If you need to store RDF triples, go to AllegroGraph;
  • For property graph, make Neo4J and DEX the first class citizen;
  • For Hypergraph, go to HyperGraphDB.

Graph Visualization Tool — Gephi

I would like to visualize a graph. The goal is simple: given a graph file, return me a graphic representation for nodes and edges described in the file. The first tool I try to use is Graphviz. It is released by AT&T Research,  but after a 5 minutes click-and-try, I still cannot get the idea. Note that my goal is simple, so may be Graphviz is powerful in functionality, but it is not easy to get started. Then I switch to Gephi, a so-called “Like Photoshop for Graphs”. In fact I like it.

1. Follow the quick start at http://gephi.org/users/quick-start/ to know the basics.

2. Example 1: draw a simple unlabeled graph.

Start with a simple edge file with postfix “*.CSV” (to find the detailed format description for CSV, click https://gephi.org/users/supported-graph-formats/csv-format/):

2 3
1 0
5 9
9 3

Open this file in Gephi, you can view the graph now. Click “T” in the bottom bar to show all node labels. Select “Force Atlas” in Layout panel to format the graph by adjusting some options:
Repulsion Strength: the distance between nodes with links
Gravity: the distance between disconnected components

Normally you will see a graph like this:

image

Really handy, right?

3. Example 2: draw an edge labeled graph.

CSV cannot describe edge labels. To enable edge description, you need a specific graph format GDF. See the description at https://gephi.org/users/supported-graph-formats/gdf-format/. As an example, we write a GDF file “a.GDF” with plain text content:

nodedef>name VARCHAR,label VARCHAR
s1,Site number 1
s2,Site number 2
s3,Site number 3
edgedef>node1 VARCHAR,node2 VARCHAR, label VARCHAR
s1,s2,1.2341
s2,s3,0.453
s3,s2, 2.34
s3,s1, 0.871

Open this file in Gephi and click both “T”s in the bottom bar to show both node and edge labels. You’ll view the graph now. Really cool!

image

4. Refer to http://gephi.org/users/quick-start/ to learn the advanced functions of Gephi like Ranking, Community Detection, Partition and Filtering.